Open Stack

Wed May 25 18:26:20 UTC 2011

Also, using marker/limit rather than offset/limit makes it much easier to shard for scale if desired. One of the reasons we chose that route with Swift. Regular databases with indexes work better with marker/limit than offset/limit too.

On May 25, 2011, at 12:38 PM, Justin Santa Barbara wrote:

> I definitely agree that the paging logic is questionable; we definitely should be specifying the sort order if we're relying on it, as I believe a database is otherwise free to return results however it sees fit.
> 
> I'm not convinced that the proposed logic around the queries and update timestamps is typo-free.  But even if it was correct, the problem with timestamps is that I don't think you can get a sensible set of results.  If you order by increasing timestamp, then I think you see items repeatedly if they are concurrently updated.  If you order by decreasing timestamp, I think you can miss an item altogether if it's updated in between page fetches.
> 
> I think the 'normal' way to make this work is to order by primary key, and to have the marker be the PK of the 'last seen row'.  Because the client can't know what the pk is, the marker is normally just an opaque value.  It's also (normally) much easier for the DB to sort by primary key than by any other value.
> 
> The behaviour of an iteration using the PK is fairly nice.  You never see a row twice, so you don't need to de-dup client side.  You are guaranteed to see all rows, unless they are concurrently inserted or deleted.  If a row is inserted you'll see it if you haven't yet fetched the page it would be on.  If a row is deleted you don't see it unless you've already fetched the page it was on.
> 
> Zones may complicate any scheme, though I think both proposed methods could be made to work.  I think in general the next N rows have to be fetched from each child zone and then combined (top N), throwing away the extra records.   One way around this would be to define an order on the child zones, and then show the records from zone 1, then those from zone 2 etc etc.  You just need to map from the marker to a child zone, and if you don't get the full complement of rows you need to merge in the first results from the next zone as well.
> 
> The SQL query using the PK would look like this (and there's no need for an OFFSET clause):
> 
> SELECT *
> FROM <table>
> ORDER BY <pk>
> WHERE <pk> > @marker
> AND (<additional filters>)
> LIMIT <pagesize>
> 
> Without additional filters, that's a _very_ efficient query for a sensible database.
> 
> Justin
> 
> 
> 
> On Wed, May 25, 2011 at 9:28 AM, Jay Pipes <jaypipes at gmail.com> wrote:
> When a user issues a GET request for a resource collection, it's clear
> that you don't want to return ALL results in the collection. Returning
> hundreds of thousands of records per request isn't particularly a good
> idea.
> 
> Therefore, we use the concept of "pagination" -- returning a subset of
> records matching the search criteria.
> 
> In Glance's API, we've added this ability (currently in code review),
> but our approach is a bit at odds with the way that the OpenStack
> Compute API is structured. I'd like to explain why I think our
> approach is better, both in terms of usability and in terms of
> efficiency.
> 
> First, here is how Nova does its pagination in the OpenStack API.
> 
> 1) Look for a "marker" query parameter in the request
> 2) Look for a "limit" query parameter in the request, otherwise use a
> default max limit.
> 3) If marker parameter is found, the value of the "marker" query
> parameter is assumed to be an *integer* and assumed to be the
> *identifier of the last object on the previous "page" of results*
> 4) Request *all* results
> 5) In the code, find the "page" of results by looking for the object
> in the set of results that matches the marker parameter
> 6) When found, grab the next object in the result set, plus X objects,
> where X is the limit parameter
> 
> The relevant code is in /nova/api/openstack/common.py, and looks like so:
> 
> def limited_by_marker(items, request, max_limit=FLAGS.osapi_max_limit):
>    """Return a slice of items according to the requested marker and limit."""
> 
> <snipped for brevity... >
> 
>    limit = min(max_limit, limit)
>    start_index = 0
>    if marker:
>        start_index = -1
>        for i, item in enumerate(items):
>            if item['id'] == marker:
>                start_index = i + 1
>                break
>        if start_index < 0:
>            raise webob.exc.HTTPBadRequest(_('marker [%s] not found' % marker))
>    range_end = start_index + limit
>    return items[start_index:range_end]
> 
> There are three MAJOR problems with the above method of pagination:
> 
> 1) Very inefficient
> 
> The Python code does the job of determining the page of results to
> find. This means that the database query returns EVERY row in the
> resultset, instead of only returning a small subset of the results
> using the LIMIT X OFFSET Y SQL construct. With hundreds of thousands
> of records in the various databases, this method is simply not
> scalable.
> 
> 2) There is no ordering of the results in the queries the Nova
> database API makes
> 
> The only queries in the Nova database API that order results is the
> instance_type_get_all() method.
> 
> Without an order to the search results, paging is almost meaningless.
> A page of results depends on the order of the list of results. The
> reason the existing Nova code works at all is because of the fact that
> all objects have an autoincrementing integer primary key, and by
> nature, this autoincrementing integer primary key increases over time.
> If the identifier was switched to, say, UUID, the results ordering
> would fall apart completely, and "pages" of results would get skewed
> and corrupted as records were inserted over time.
> 
> For instance, assume a character primary key (like UUID keys would be
> stored in hex string format). Assume the following simplified set of
> instances:
> 
> PK         Created_at
> ABC       2011-05-25 12:00:00
> DEF       2011-05-25 12:00:01
> GHJ       2011-05-25 12:00:02
> KLM       2011-05-25 12:00:03
> 
> Assume you want to get a list of instances with GET /servers?limit=2.
> With the existing code, you'd get this:
> 
> ABC       2011-05-25 12:00:00
> DEF       2011-05-25 12:00:01
> 
> With GET /servers?limit=2?marker=DEF, you'd get this:
> 
> GHJ       2011-05-25 12:00:00
> KLM       2011-05-25 12:00:03
> 
> However, let's say a new instance is added:
> 
> BCD       2011-05-25 12:00:04
> 
> Because there is no sort order specified in the database queries, the
> order of results is indeterminate. Page 1 may now contain the BCD
> record, or it may not. It totally depends on the implementation of the
> relational database, because the SQL standard does not mandate a
> default sort order when one is not specified in the ORDER BY clause.
> 
> 3) The marker parameter assumes an autoincrementing primary key
> 
> Because the marker parameter assumes an autoincrementing primary key,
> the paging of results currently DOES solve the problem of new records
> messing up paging.
> 
> As an example, let's assume I've fixed #2 above and put a default sort
> order on the primary key, descending. Also assume a simple resultset
> like so:
> 
> PK         Created_at
> 1      2011-05-25 12:00:00
> 2      2011-05-25 12:00:01
> 3      2011-05-25 12:00:02
> 4      2011-05-25 12:00:03
> 
> Assume I make a request for GET /servers?limit=2. I would get:
> 
> 4      2011-05-25 12:00:03
> 3      2011-05-25 12:00:02
> 
> The results returned would be the "last" 10 server instances created.
> If I did a request for GET /servers?limit=2?marker=3, I would get the
> results:
> 
> 2      2011-05-25 12:00:01
> 1      2011-05-25 12:00:00
> 
> If I add a new instance, say:
> 
> 5      2011-05-25 12:00:05
> 
> and did the request for GET /servers?limit=2?marker=3, I would still
> get the same results:
> 
> 2      2011-05-25 12:00:01
> 1      2011-05-25 12:00:00
> 
> However, the ONLY reason this works is because the primary key is
> auto-incrementing, and new records always come at the "end" of the
> primary key. If I change the primary key to a UUID, the call to GET
> /servers?limit=2?marker=3 would no longer be guaranteed to return the
> same results. The newly-added instance's UUID may come between the
> values of the primary keys in the original "page 2", and therefore
> messed up the consistency of the pagination of my original query.
> 
> The way to avoid this inconsistency is to ensure that you have a
> consistent view of the set of results of your original query. To do
> that, you must always use a WHERE condition on a temporal,
> always-incrementing (timestamp) column.
> 
> In SQL, this is what is necessary to have a consistent view of pages
> of a result set (assuming 10 elements to a page):
> 
> Page 1:
> 
> SELECT i.* FROM instances i
> ORDER BY i.id DESC;
> LIMIT 10;
> 
> SET @marker = <ID of *first* record>
> SET @updated_at_marker = SELECT updated_at FROM instances WHERE id = @marker;
> 
> Page 2:
> 
> SELECT i.* FROM instances i
> WHERE updated_at <= @updated_at_marker
> ORDER BY i.id DESC
> LIMIT 10 OFFSET 10;
> 
> 
> SUMMARY:
> 
> My proposal is to overhaul the way that Nova does pagination so that we:
> 
> 1) Pass the LIMIT and OFFSET parameters down into the database API queries
> 2) Add a default ORDER BY for all queries returning result sets
> 3) Add a "page" query parameter that would be used to calculate the
> OFFSET programmatically
> 4) Change the concept of the "marker" parameter to refer to *the ID of
> the FIRST record returned in the original query*
> 
> Thoughts?
> 
> -jay
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20110525/d030a9f4/attachment.html>

Open Stack

[Openstack] Getting pagination right

OpenStack

Community

Documentation

Branding & Legal