Open Stack

Mon Dec 9 18:57:02 UTC 2013

Hi, ALL

Recently I've been working on two blueprints[1][2], both involved with
recording scheduling information. And would like to hear some comments for
several design choices.

Problem Statement
--
* NoValidHost exception might masked out real failure reason to spin up an
instance.

Consider following event sequence, "run_instance" on host1 failed to spin
up an instance due to port allocation failure in neutron. The request
casted back to scheduler to pick next available host. It failed again on
host2 for the same reason of port allocation error. After Maximum 3 times
to retry, instance is set in "ERROR" state with a NoValidHost exception.
And there's no easy way to find out what is really going wrong.

* Current scheduling information are recorded in several different log
items, which is difficult to lookup when debugging.

Design Proposal
--
1. Blueprint internal-scheduler[1] will try to address problem #1. After
conductor retrieved selected destination hosts from scheduler, it will
create a "scheduler_records_allocations" item in database, for each
allocated instance/host allocation.

Design choices:
a) Correlate this scheduler_records_allocations with the 'create' instance
action, and generate a combined view with instance-action events.
b) Add separate new API to retrieve this information.

I prefer the choice #a, because instance action events perfectly fits such
usage case. And allocation records will supplement necessary information
when viewing 'create' action events of an instance.

Thoughts?

NOTE: Please find the following chart in link[3], in case of any
format/display issue.

                                  scheduler_records_allocations
                                  +-----------------------------+
                                  |allocation_id: 9001          |
                                  |instance_uuid: inst1_uuid    |
 scheduler_records                |scheduler_record_id: 1210    |
 +------------------------------+ |host: host1                  |
 |scheduler_record_id: 1210     | |weight: 197.0                |
+---------------+
 |user_id: 'u_fakeid'           | |result: Failed               |
|instance1      |
 |project_id: 'p_fakeid'        | |reason: 'No more IP addresses|
+---------------+
 |request_id: 'req-xxx'         | +-----------------------------+
 |instance_uuids: [             | +-----------------------------+
+---------------+
 |    'inst1_uuid',             | |allocation_id: 9002          |
|instance2      |
 |    'inst2_uuid']             | |instance_uuid: inst2_uuid    |
+---------------+
 |request_spec: {...}           | |scheduler_record_id: 1210    |
 |filter_properties: {...}      | |host: host2                  |
 |scheduler_records_allocations:| |weight: 128.0                |
 |    [9001, 9002]              | |result: Success              |
 |start_time: ...               | |reason:                      |
 |finish_time: ...              | +-----------------------------+
 +------------------------------+ +-----------------------------+
                                  |allocation_id: 9003          |
                                  |instance_uuid: inst1_uuid    |
                                  |scheduler_record_id: 1210    |
                                  |host: host2                  |
                                  |weight: 64.0                 |
                                  |result: Failed               |
                                  |reason: 'No more IP addresses|
                                  +-----------------------------+

2. Blueprint record-scheduler-information[2] will try to solve the problem
#2, to generate a structured information for each scheduler run.

Design choices:
a) Record 'scheduler_records' info in database, which is easy to query, but
introduce a great burden in terms of performance, extra database space
usage, clean up/archiving policy, security relate issue[4], etc.
b) Record 'scheduler_records' into a separate log file, in JSON format, and
each line for a single record of each scheduler run. And then add a new API
extension to retrieve last n (as a query parameter) scheduler records. The
benefit of this approach avoided database issue, and plays well with
external tooling, as well as provide a central place to view the log. But
as a compromise, we won't be able to query logs for specific request_id.

So the problem here is, is database storage solution still desirable? Or...
implement backend driver which deployer could choose? However, in such
case, API would be the minimum set to support both.

Any comments or thoughts are highly appreciated.

[1] https://blueprints.launchpad.net/nova/+spec/internal-scheduler
[2] https://blueprints.launchpad.net/nova/+spec/record-scheduler-information
[3]
https://docs.google.com/document/d/1EsSNeq_tD-3NiX4IphCrQj4ii0_dO-8-Jn7NWHRJPNg/edit?usp=sharing
[4] https://bugs.launchpad.net/nova/+bug/1175193

Thanks,
--
Qiu Yu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131210/17a03942/attachment.html>

Open Stack

[openstack-dev] [Nova] Design proposals for blueprints to record scheduler information

OpenStack

Community

Documentation

Branding & Legal