[openstack-dev] [nova] Boston Forum session recap - searchlight integration

Belmiro Moreira moreira.belmiro.email.lists at gmail.com
Mon May 22 12:51:57 UTC 2017

Hi Matt,
if by "incomplete results" you mean retrieve the instances UUIDs (in the
cell_api) for the cells that failed to answer,
I would prefer to have incomplete results than a failed operation.


On Mon, May 22, 2017 at 11:39 AM, Matthew Booth <mbooth at redhat.com> wrote:

> On 19 May 2017 at 20:07, Mike Bayer <mbayer at redhat.com> wrote:
>> On 05/19/2017 02:46 AM, joehuang wrote:
>>> Support sort and pagination together will be the biggest challenge: it's
>>> up to how many cells will be involved in the query, 3,5 may be OK, you can
>>> search each cells, and cached data. But how about 20, 50 or more, and how
>>> many data will be cached?
>> I've talked to Matthew in Boston and I am also a little concerned about
>> this.    The approach involves trying to fetch just the smallest number of
>> records possible from each backend, merging them as they come in, and then
>> discarding the rest (unfetched) once there's enough for a page. But there
>> is latency around invoking query before any results are received, and the
>> database driver really wants to send out all the rows as well, not to
>> mention the ORM (with configurability) wants to convert the whole set of
>> rows received to objects, all has overhead.
> There was always going to come a point where there are too many cells for
> this approach to be viable. After our chat, I now think that point is
> considerably lower than I thought before, as I didn't appreciate that the
> ORM is also doing its own batching.
>> To at least handle the problem of 50 connections that have all executed a
>> statement and waiting on results, to parallelize that means there needs to
>> be a threadpool , greenlet pool, or explicit non-blocking approach put in
>> place.  The "thread pool" would be the approach that's possible, which with
>> eventlet monkeypatching transparently becomes a greenlet pool.  But that's
>> where this starts getting a little intense for something you want to do in
>> the context of "a web request".   So I think the DB-based solution here is
>> feasible but I'm a little skeptical of it at higher scale.   Usually, the
>> search engine would be something pluggable, like, "SQL" or "searchlight".
> I'm not overly concerned about the threading aspect. I understood from our
> chat that the remote query overhead (being the only part we can actually
> parallelise anyway) is incurred entirely before returning the first row
> from SQLA. My plan is simply to fetch the first row of each query using
> concurrent.futures to allow all the remote queries to run in parallel, and
> all subsequent rows with blocking IO in the main thread. This will be
> relatively uncomplicated, and after the initial queries have run won't
> involve a whole lot of thread switching.
> There are also a couple of optimisations to make which I won't bother with
> up front. Dan suggested in his CellsV2 talk that we would only query cells
> where the user actually has instances. If we find users tend to clump in a
> small number of cells this would be a significant optimisation, although
> the overhead on the api node for a query returning no rows is probably very
> little. Also, I think you mentioned that there's an option to tell SQLA not
> to batch-process rows, but that it is less efficient for total throughput?
> I suspect there would be a point at which we'd want that. If there's a
> reasonable way to calculate a tipping point, that might give us some
> additional life.
> Bear in mind that the principal advantages to not using Searchlight are:
> * It is simpler to implement
> * It is simpler to manage
> * It will return accurate results
> Following the principal of 'as simple as possible, but no simpler', I
> think there's enormous benefit to this much simpler approach for anybody
> who doesn't need a more complex approach. However, while it reduces the
> urgency of something like the Searchlight solution, I expect there are
> going to be deployments which need that.
>>> More over, during the query there are instances operation( create,
>>> delete)  in parallel during the pagination/sort query, there is situation
>>> some cells may not provide response in time, or network connection broken,
>>> etc, many abnormal cases may happen. How to deal with some of cells
>>> abnormal query response is also one great factor to be considered.
> Aside: For a query operation, what's the better user experience when a
> single cell is failing:
> 1. The whole query fails.
> 2. The user gets incomplete results.
> Either of these are simple to implement. Incomplete results would also
> additionally be logged as an ERROR, but I can't think of any way to also
> return to the user that there's a problem with the data we returned without
> throwing an error.
> Thoughts?
> Matt
>>> It's not good idea to support pagination and sort at the same time (may
>>> not provide exactly the result end user want) if searchlight should not be
>>> integrated.
>>> In fact in Tricircle, when query ports from neutron where tricircle
>>> central plugin is installed, the tricircle central plugin do the similar
>>> cross local Neutron ports query, and not support pagination/sort together.
>>> Best Regards
>>> Chaoyi Huang (joehuang)
>>> ________________________________________
>>> From: Matt Riedemann [mriedemos at gmail.com]
>>> Sent: 19 May 2017 5:21
>>> To: openstack-dev at lists.openstack.org
>>> Subject: [openstack-dev] [nova] Boston Forum session recap -
>>> searchlight        integration
>>> Hi everyone,
>>> After previous summits where we had vertical tracks for Nova sessions I
>>> would provide a recap for each session.
>>> The Forum in Boston was a bit different, so here I'm only attempting to
>>> recap the Forum sessions that I ran. Dan Smith led a session on Cells
>>> v2, John Garbutt led several sessions on the VM and Baremetal platform
>>> concept, and Sean Dague led sessions on hierarchical quotas and API
>>> microversions, and I'm going to leave recaps for those sessions to them.
>>> I'll do these one at a time in separate emails.
>>> Using Searchlight to list instances across cells in nova-api
>>> ------------------------------------------------------------
>>> The etherpad for this session is here [1]. The goal for this session was
>>> to explain the problem and proposed plan from the spec [2] to the
>>> operators in the room and get feedback.
>>> Polling the room we found that not many people are deploying Searchlight
>>> but most everyone was using ElasticSearch.
>>> An immediate concern that came up was the complexity involved with
>>> integrating Searchlight, especially around issues with latency for state
>>> changes and questioning how this does not redo the top-level cells v1
>>> sync issue. It admittedly does to an extent, but we don't have all of
>>> the weird side code paths with cells v1 and it should be self-healing.
>>> Kris Lindgren noted that the instance.usage.exists periodic notification
>>> from the computes hammers their notification bus; we suggested he report
>>> a bug so we can fix that.
>>> It was also noted that if data is corrupted in ElasticSearch or is out
>>> of sync, you could re-sync that from nova to searchlight, however,
>>> searchlight syncs up with nova via the compute REST API, which if the
>>> compute REST API is using searchlight in the backend, you end up getting
>>> into an infinite loop of broken. This could probably be fixed with
>>> bypass query options in the compute API, but it's not a fun problem.
>>> It was also suggested that we store a minimal set of data about
>>> instances in the top-level nova API database's instance_mappings table,
>>> where all we have today is the uuid. Anything that is set in the API
>>> would probably be OK for this, but operators in the room noted that they
>>> frequently need to filter instances by an IP, which is set in the
>>> compute. So this option turns into a slippery slope, and is potentially
>>> not inter-operable across clouds.
>>> Matt Booth is also skeptical that we can't have a multi-cell query
>>> perform well, and he's proposed a POC here [3]. If that works out, then
>>> it defeats the main purpose for using Searchlight for listing instances
>>> in the compute API.
>>> Since sorting instances across cells is the main issue, it was also
>>> suggested that we allow a config option to disable sorting in the API.
>>> It was stated this would be without a microversion, and filtering/paging
>>> would still be supported. I'm personally skeptical about how this could
>>> be consider inter-operable or discoverable for API users, and would need
>>> more thought and input from users like Monty Taylor and Clark Boylan.
>>> Next steps are going to be fleshing out Matt Booth's POC for efficiently
>>> listing instances across cells. I think we can still continue working on
>>> the versioned notifications changes we're making for searchlight as
>>> those are useful on their own. And we should still work on enabling
>>> searchlight in the nova-next CI job so we can get an idea for how the
>>> versioned notifications are working by a consumer. However, any major
>>> development for actually integrating searchlight into Nova is probably
>>> on hold at the moment until we know how Matt's POC works.
>>> [1]
>>> https://etherpad.openstack.org/p/BOS-forum-using-searchlight
>>> -to-list-instances
>>> [2]
>>> https://specs.openstack.org/openstack/nova-specs/specs/pike/
>>> approved/list-instances-using-searchlight.html
>>> [3] https://review.openstack.org/#/c/463618/
>>> --
>>> Thanks,
>>> Matt
>>> ____________________________________________________________
>>> ______________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.op
>>> enstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> ____________________________________________________________
>>> ______________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.op
>>> enstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> --
> Matthew Booth
> Red Hat Engineering, Virtualisation Team
> Phone: +442070094448 <+44%2020%207009%204448> (UK)
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170522/af49eedd/attachment.html>

More information about the OpenStack-dev mailing list