<div dir="ltr">Hi Matt,<div>if by "incomplete results" you mean retrieve the instances UUIDs (in the cell_api) for the cells that failed to answer,</div><div>I would prefer to have incomplete results than a failed operation.</div><div><br></div><div>Belmiro</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, May 22, 2017 at 11:39 AM, Matthew Booth <span dir="ltr"><<a href="mailto:mbooth@redhat.com" target="_blank">mbooth@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span class="">On 19 May 2017 at 20:07, Mike Bayer <span dir="ltr"><<a href="mailto:mbayer@redhat.com" target="_blank">mbayer@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

<br>

On 05/19/2017 02:46 AM, joehuang wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Support sort and pagination together will be the biggest challenge: it's up to how many cells will be involved in the query, 3,5 may be OK, you can search each cells, and cached data. But how about 20, 50 or more, and how many data will be cached?<br>

</blockquote>

<br>

<br>

I've talked to Matthew in Boston and I am also a little concerned about this.    The approach involves trying to fetch just the smallest number of records possible from each backend, merging them as they come in, and then discarding the rest (unfetched) once there's enough for a page. But there is latency around invoking query before any results are received, and the database driver really wants to send out all the rows as well, not to mention the ORM (with configurability) wants to convert the whole set of rows received to objects, all has overhead.<br></blockquote><div><br></div></span><div>There was always going to come a point where there are too many cells for this approach to be viable. After our chat, I now think that point is considerably lower than I thought before, as I didn't appreciate that the ORM is also doing its own batching.</div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

To at least handle the problem of 50 connections that have all executed a statement and waiting on results, to parallelize that means there needs to be a threadpool , greenlet pool, or explicit non-blocking approach put in place.  The "thread pool" would be the approach that's possible, which with eventlet monkeypatching transparently becomes a greenlet pool.  But that's where this starts getting a little intense for something you want to do in the context of "a web request".   So I think the DB-based solution here is feasible but I'm a little skeptical of it at higher scale.   Usually, the search engine would be something pluggable, like, "SQL" or "searchlight".<br></blockquote><div><br></div></span><div>I'm not overly concerned about the threading aspect. I understood from our chat that the remote query overhead (being the only part we can actually parallelise anyway) is incurred entirely before returning the first row from SQLA. My plan is simply to fetch the first row of each query using concurrent.futures to allow all the remote queries to run in parallel, and all subsequent rows with blocking IO in the main thread. This will be relatively uncomplicated, and after the initial queries have run won't involve a whole lot of thread switching.</div><div><br></div><div>There are also a couple of optimisations to make which I won't bother with up front. Dan suggested in his CellsV2 talk that we would only query cells where the user actually has instances. If we find users tend to clump in a small number of cells this would be a significant optimisation, although the overhead on the api node for a query returning no rows is probably very little. Also, I think you mentioned that there's an option to tell SQLA not to batch-process rows, but that it is less efficient for total throughput? I suspect there would be a point at which we'd want that. If there's a reasonable way to calculate a tipping point, that might give us some additional life.</div><div><br></div><div>Bear in mind that the principal advantages to not using Searchlight are:</div><div><br></div><div>* It is simpler to implement</div><div>* It is simpler to manage</div><div>* It will return accurate results</div><div><br></div><div>Following the principal of 'as simple as possible, but no simpler', I think there's enormous benefit to this much simpler approach for anybody who doesn't need a more complex approach. However, while it reduces the urgency of something like the Searchlight solution, I expect there are going to be deployments which need that.</div><span class=""><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

More over, during the query there are instances operation( create, delete)  in parallel during the pagination/sort query, there is situation some cells may not provide response in time, or network connection broken, etc, many abnormal cases may happen. How to deal with some of cells abnormal query response is also one great factor to be considered.<br></blockquote></blockquote><div><br></div></span><div>Aside: For a query operation, what's the better user experience when a single cell is failing:</div><div><br></div><div>1. The whole query fails.</div><div>2. The user gets incomplete results.</div><div><br></div><div>Either of these are simple to implement. Incomplete results would also additionally be logged as an ERROR, but I can't think of any way to also return to the user that there's a problem with the data we returned without throwing an error.</div><div><br></div><div>Thoughts?</div><div><br></div><div>Matt</div><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

It's not good idea to support pagination and sort at the same time (may not provide exactly the result end user want) if searchlight should not be integrated.<br>

<br>

In fact in Tricircle, when query ports from neutron where tricircle central plugin is installed, the tricircle central plugin do the similar cross local Neutron ports query, and not support pagination/sort together.<br>

<br>

Best Regards<br>

Chaoyi Huang (joehuang)<br>

<br>

______________________________<wbr>__________<br>

From: Matt Riedemann [<a href="mailto:mriedemos@gmail.com" target="_blank">mriedemos@gmail.com</a>]<br>

Sent: 19 May 2017 5:21<br>

To: <a href="mailto:openstack-dev@lists.openstack.org" target="_blank">openstack-dev@lists.openstack.<wbr>org</a><br>

Subject: [openstack-dev] [nova] Boston Forum session recap - searchlight        integration<br>

<br>

Hi everyone,<br>

<br>

After previous summits where we had vertical tracks for Nova sessions I<br>

would provide a recap for each session.<br>

<br>

The Forum in Boston was a bit different, so here I'm only attempting to<br>

recap the Forum sessions that I ran. Dan Smith led a session on Cells<br>

v2, John Garbutt led several sessions on the VM and Baremetal platform<br>

concept, and Sean Dague led sessions on hierarchical quotas and API<br>

microversions, and I'm going to leave recaps for those sessions to them.<br>

<br>

I'll do these one at a time in separate emails.<br>

<br>

<br>

Using Searchlight to list instances across cells in nova-api<br>

------------------------------<wbr>------------------------------<br>

<br>

The etherpad for this session is here [1]. The goal for this session was<br>

to explain the problem and proposed plan from the spec [2] to the<br>

operators in the room and get feedback.<br>

<br>

Polling the room we found that not many people are deploying Searchlight<br>

but most everyone was using ElasticSearch.<br>

<br>

An immediate concern that came up was the complexity involved with<br>

integrating Searchlight, especially around issues with latency for state<br>

changes and questioning how this does not redo the top-level cells v1<br>

sync issue. It admittedly does to an extent, but we don't have all of<br>

the weird side code paths with cells v1 and it should be self-healing.<br>

Kris Lindgren noted that the instance.usage.exists periodic notification<br>

from the computes hammers their notification bus; we suggested he report<br>

a bug so we can fix that.<br>

<br>

It was also noted that if data is corrupted in ElasticSearch or is out<br>

of sync, you could re-sync that from nova to searchlight, however,<br>

searchlight syncs up with nova via the compute REST API, which if the<br>

compute REST API is using searchlight in the backend, you end up getting<br>

into an infinite loop of broken. This could probably be fixed with<br>

bypass query options in the compute API, but it's not a fun problem.<br>

<br>

It was also suggested that we store a minimal set of data about<br>

instances in the top-level nova API database's instance_mappings table,<br>

where all we have today is the uuid. Anything that is set in the API<br>

would probably be OK for this, but operators in the room noted that they<br>

frequently need to filter instances by an IP, which is set in the<br>

compute. So this option turns into a slippery slope, and is potentially<br>

not inter-operable across clouds.<br>

<br>

Matt Booth is also skeptical that we can't have a multi-cell query<br>

perform well, and he's proposed a POC here [3]. If that works out, then<br>

it defeats the main purpose for using Searchlight for listing instances<br>

in the compute API.<br>

<br>

Since sorting instances across cells is the main issue, it was also<br>

suggested that we allow a config option to disable sorting in the API.<br>

It was stated this would be without a microversion, and filtering/paging<br>

would still be supported. I'm personally skeptical about how this could<br>

be consider inter-operable or discoverable for API users, and would need<br>

more thought and input from users like Monty Taylor and Clark Boylan.<br>

<br>

Next steps are going to be fleshing out Matt Booth's POC for efficiently<br>

listing instances across cells. I think we can still continue working on<br>

the versioned notifications changes we're making for searchlight as<br>

those are useful on their own. And we should still work on enabling<br>

searchlight in the nova-next CI job so we can get an idea for how the<br>

versioned notifications are working by a consumer. However, any major<br>

development for actually integrating searchlight into Nova is probably<br>

on hold at the moment until we know how Matt's POC works.<br>

<br>

[1]<br>

<a href="https://etherpad.openstack.org/p/BOS-forum-using-searchlight-to-list-instances" rel="noreferrer" target="_blank">https://etherpad.openstack.org<wbr>/p/BOS-forum-using-searchlight<wbr>-to-list-instances</a><br>

[2]<br>

<a href="https://specs.openstack.org/openstack/nova-specs/specs/pike/approved/list-instances-using-searchlight.html" rel="noreferrer" target="_blank">https://specs.openstack.org/op<wbr>enstack/nova-specs/specs/pike/<wbr>approved/list-instances-using-<wbr>searchlight.html</a><br>

[3] <a href="https://review.openstack.org/#/c/463618/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/463618/</a><br>

<br>

--<br>

<br>

Thanks,<br>

<br>

Matt<br>

<br>

______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>

<br>

______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>

<br>

</blockquote>

<br>

______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>

</blockquote></div></div></div><span class="HOEnZb"><font color="#888888"><br><br clear="all"><div><br></div>-- <br><div class="m_-4146757993357102951gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><span style="font-size:12.8px">Matthew Booth</span><br></div><div>Red Hat Engineering, Virtualisation Team</div><div><br></div><div>Phone: <a href="tel:+44%2020%207009%204448" value="+442070094448" target="_blank">+442070094448</a> (UK)</div><div><br></div></div></div></div></div>

</font></span></div></div>

<br>______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.<wbr>openstack.org?subject:<wbr>unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-dev</a><br>

<br></blockquote></div><br></div>