[openstack-dev] [Glance] Concurrent update issue in Glance v2 API

Alexander Tivelkov ativelkov at mirantis.com
Thu Sep 25 16:21:44 UTC 2014


Hi folks!

There is a serious issue [0] in the v2 API of Glance which may lead to race
conditions during the concurrent updates of Images' metadata.
It can be fixed in a number of ways, but we need to have some solution
soon, as we are approaching rc1 release, and the race in image updates
looks like a serious problem which has to be fixed in J, imho.

A quick description of the problem:
When the image-update is called (PUT /v2/images/%image_id%/) we get the
image from the repository, which fetches a record from the DB and forms its
content into an Image Domain Object ([1]), which is then modified (has its
attributes updated) and passed through all the layers of our domain model.
This object is not managed by the SQLAlchemy's session, so the
modifications of its attributes are not tracked anywhere.
When all the processing is done and the updated object is passed back to
the DB repository, it serializes all the attributes of the image into a
dict ([2]) and then this dict is used to create an UPDATE query for the
database.
As this serialization includes all the attribute of the object (rather then
only the modified ones), the update query updates all the columns of the
appropriate database row, putting there the values which were originally
fetched when the processing began. This may obviously overwrite the values
which could be written there by some other concurent request.

There are two possible solutions to fix this problem.
First, known as the optimistic concurrency control, checks if the
appropriate database row was modified between the data fetching and data
updates. In case of such modification the update operation reports a
"conflict" and fails (and may be retried based on the updated data if
needed). Modification detection is usually based on the timstamps, i.e. the
query updates the row in database only if the timestamp there matches the
timestamp of initially fetched data.
I've introduced this approach in this patch [3], however it has a major
flaw: I used the 'updated_at' attribute as a timestamp, and this attribute
is mapped to a DateTime-typed column. In many RDBMS's (including
MySql<5.6.4) this column stores values with per-second precision and does
not store fractions of seconds. So, even if patch [3] is merged the race
conditions may still occur if there are many updates happening at the same
moment of time.
A better approach would be to add a new column with int (or longint) type
to store millisecond-based (or even microsecond-based) timestamps instead
of (or additionally to) date-time based updated_at. But data model
modification will require to add new migration etc, which is a major step
and I don't know if we want to make it so close to the release.

The second solution is to keep track of the changed attributes and
properties for the image and do not include the unchanged ones into the
UPDATE query, so nothing gets overwritten. This dramatically reduces the
threat of races, as the updates of different properties do not interfere
with each other. Also this is a usefull change regardless of the race
itself: being able to differentiate between changed and unchanged
attributes may have its own value for other purposes; the DB performance
will also be better when updating just the needed fields instead of all of
them.
I've submitted a patch with this approach as well [4], but it still breaks
some unittests and I am working to fix them right now.

So, we need to decide which of these approaches (or their combination) to
take: we may stick with optimistic locking on timestamp (and then decide if
we are ok with a per-second timestamps or we need to add a new column),
choose to track state of attributes or combine them together. So, could you
folks please review patches [3] and [4] and come up with some ideas on them?

Also, probably we should consider targeting [0] to juno-rc1 milestone to
make sure that this bug is fixed in J. Do you guys think it is possible at
this stage?

Thanks!


[0] https://bugs.launchpad.net/glance/+bug/1371728
[1]
https://github.com/openstack/glance/blob/master/glance/db/__init__.py#L74
[2]
https://github.com/openstack/glance/blob/master/glance/db/__init__.py#L169
[3] https://review.openstack.org/#/c/122814/
[4] https://review.openstack.org/#/c/123722/

--
Regards,
Alexander Tivelkov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140925/e8fd7327/attachment-0001.html>


More information about the OpenStack-dev mailing list