Open Stack

Tue Sep 6 09:04:17 UTC 2016

On Fri, 2 Sep 2016, Dan Smith wrote:

> We should try to get this figured out before newton ships if possible. I
> don't think I see it locally, but I have a large dev machine, so I'll
> have to try to poke it harder.

I'm not clear on where we left this issue with allocation violations
on Friday?

Other things that have happened since Friday morning:

* We realized that the logging wasn't logging all requests correctly
   and was sometimes suggesting a request was a 200 when it was not.
* We decided that it was important when writing inventory for it to
   be able to violate existing allocations, otherwise a reconfigured
   compute node would not be able to manage its inventory and would
   get in a stuck state and the placement service would not have
   correct information.
* We optimized writing allocations so that if an exact allocation is
   already stored, we don't write it, we just return okay. This is
   above beyond the earlier decision to allow allocations to be
   updated.
* We only write inventories when we know they have changed since the
   last time we wrote them.
* We fixed the serialization of the response to a GET
   /resource_providers/{uuid}/inventories. Some refactoring caused
   that to violate the spec.
* That fix allowed the writing of inventories on the resource
   tracker side to be less complex.
* Further adjustments to logging so it is a little more clear what's
   going on (further fixes here required).

All of the above is in the stack starting at

     https://review.openstack.org/#/c/365015/

That may seem like a lot but I think it reflects that we are
iterating well.

In some testing with devstack yesterday allocations were been
written and destroyed as expected during server create, delete, and
error. Testing Friday was showing that allocations associated with
resizes were working[1], but only because the updates were happening
on the periodic heal job, not in direct response to the resize, so
there was some latency.

Some of the changes above are pragmatic, made in response to the
testing, and may not be aligned with the vision™, so I suspect Dan
and Jay and I will need to get together to figure out what needs to
happen.

There have been several ancillary changes are well, based on bugs
found while doing the above work or validating it:

* Misleading comment in wsgi.py:
   https://review.openstack.org/#/c/365638/
* Remove the script that wsgi.py replaced:
   https://review.openstack.org/#/c/363705/
* Remove another misleading comment:
   https://review.openstack.org/#/c/363039/

And finally there is some pending work that it would be good to get
in now instead of later so that the service is as complete as
possible:

* Clean up json handling
   https://review.openstack.org/#/c/361422/
* Implementation of associating aggregates with resource providers
   https://review.openstack.org/#/c/362863/
* Optional placement database
   https://review.openstack.org/#/c/362766/

The last two were mooted as "punt to Ocata" but I think we should
consider them for now, if possible. The first one is a cleanup.

[1] I used this to do some of my testing:
https://github.com/cdent/placement-exercise

-- 
Chris Dent               ┬─┬ノ( º _ ºノ)        https://anticdent.org/
freenode: cdent                                         tw: @anticdent

Open Stack

[openstack-dev] [nova] Next steps for resource providers work

OpenStack

Community

Documentation

Branding & Legal