[OpenStack-Infra] On the subject of HTTP interfaces and Zuul

Robyn Bergeron rbergero at redhat.com
Sun Jun 11 17:47:37 UTC 2017


On Jun 9, 2017 1:14 PM, "Monty Taylor" <mordred at inaugust.com> wrote:

On 06/09/2017 02:35 PM, Clark Boylan wrote:

> On Fri, Jun 9, 2017, at 09:22 AM, Monty Taylor wrote:
>
>> Hey all!
>>
>> Tristan has recently pushed up some patches related to providing a Web
>> Dashboard for Zuul. We have a web app for nodepool. We already have the
>> Github webhook receiver which is inbound http. There have been folks who
>> have expressed interest in adding active-REST abilities for performing
>> actions. AND we have the new websocket-based log streaming.
>>
>> We're currently using Paste for HTTP serving (which is effectively
>> dead), autobahn for websockets and WebOB for request/response processing.
>>
>> This means that before we get too far down the road, it's probably time
>> to pick how we're going to do those things in general. There are 2
>> questions on the table:
>>
>> * HTTP serving
>> * REST framework
>>
>> They may or may not be related, and one of the options on the table
>> implies an answer for both. I'm going to start with the answer I think
>> we should pick:
>>
>> *** tl;dr ***
>>
>> We should use aiohttp with no extra REST framework.
>>
>> Meaning:
>>
>> - aiohttp serving REST and websocket streaming in a scale-out tier
>> - talking RPC to the scheduler over gear or zk
>> - possible in-process aiohttp endpoints for k8s style health endpoints
>>
>
Hiya: Two things, which may or may not be useful -- some of this is over my
head, but I like to at least make sure information that *might* be useful
is seen:

1: just read this blog from the nice humans at datadog this morning, which
mentions a bunch of things relating to http, logs, events, streaming,
protobuf, k8s, etc. -- oh, and the word python, which is nice. :) Obviously
the datadog folks have different end goals than we do, but it seems like
there might be useful thinking fodder or overlap to consider peeking at.
Plus I always like to remember that other tools may want to ingest data in
different ways for other purposes, and thinking about that from other
points of view can be nice (although not something we necessarily want to
block on to get a thing out the door.... :D)

https://engineering.datadoghq.com/protobuf-parsing-in-python/

2: The Github API is making some changes in moving from V3 to V4. Not clear
to me if (a) they will live in coexistence or not, either forever-ish or
otherwise, (b) if otherwise, when v3 will be eol'd, but might be worth
considering if v4 helps the particular cases discussed in this thread in
any way (i dont count "annoying jlk endlessly since he just finished all
that github work," *hugs* to jlk tho.)

If none of that is useful... sorry for the noise :)

-r



>> Since we're talking about a web scale-out tier that we should just have
>> a single web tier for zuul and nodepool. This continues the thinking
>> that nodepool is a component of Zuul.
>>
>
> I'm not sure that this is a great idea. We've already seen that people
> have wanted to use nodepool without a Zuul and even without performing
> CI. IIRC paul wanted to use it to keep a set of asterisks floating
> around for example. We've also seen that people want to use
> subcomponents of nodepool to build and manage a set of images for clouds
> without making instances.
>

Excellent point.


In the past we have been careful to keep logical tools separate which
> has made it easy for us to add new tools and remove old ones.
> Operationally this may be perceived as making things more difficult to a
> newcomer, but it makes life much much better 3-6 months down the road.
>
>
>> In order to write zuul jobs, end-users must know what node labels are
>> available. A zuul client that says "please get me a list of available
>> node labels" could make sense to a user. As we get more non-OpenStack
>> users, those people may not have any concept that there is a separate
>> thing called "nodepool".
>>
>> *** The MUCH more verbose version ***
>>
>> I'm now going to outline all of the thoughts and options I've had or
>> have heard other people say. It's an extra complete list - there are
>> ideas in here you might find silly/bad. But since we're picking a
>> direction, I think it's important we consider the options in front of us.
>>
>> This will cover 3 http serving options:
>>
>> - WSGI
>> - aiohttp
>> - gRPC
>>
>> and 3 REST framework options:
>>
>> - pecan
>> - flask-restplus
>> - apistar
>>
>> ** HTTP Serving **
>>
>> WSGI
>>
>> The WSGI approach is one we're all familiar with and it works with
>> pretty much every existing Python REST framework. For us I believe if we
>> go this route we'd want to serve it with something like uwsgi and
>> Apache. That adds the need for an Apache layer and/or management uwsgi
>> process. However, it means we can make use of normal tools we all likely
>> know at least to some degree.
>>
>
> FWIW I don't think Apache would be required. uWSGI is a fairly capable
> http server aiui. You can also pip install uwsgi so the simple case
> remains fairly simple I think.
>

Also good point.



>> A downside is that we'll need to continue to handle our Websockets work
>> independently (which is what we're doing now)
>>
>> Because it's in a separate process, the API tier will need to make
>> requests of the scheduler over a bus, which could be either gearman or
>> zk.
>>
>>
> Note that OpenStack has decided that this is a better solution than
> using web servers in the python process. That doesn't necessarily mean
> it is the best choice for Zuul, but it seems like there is a lot we can
> learn from the choice to switch to WSGI in OpenStack.
>

Yah. I definitely more strongly lean towards external.


aiohttp
>>
>> Zuul v3 is Python3, which means we can use aiohttp. aiohttp isn't
>> particularly compatible with the REST frameworks, but it has built-in
>> route support and helpers for receiving and returning JSON. We don't
>> need ORM mapping support, so the only thing we'd really be MISSING from
>> REST frameworks is auto-generated documentation.
>>
>> aiohttp also supports websockets directly, so we could port the autobahn
>> work to use aiohttp.
>>
>> aiohttp can be run in-process in a thread. However, websocket
>> log-streaming is already a separate process for scaling purposes, so if
>> we decide that one impl backend is a value, it probably makes sense to
>> just stick the web tier in the websocket scaleout process anyway.
>>
>> However, we could probably write a facade layer with a gear backend and
>> an in-memory backend so that simple users could just run the in-process
>> version but scale-out was possible for larger installs (like us)
>>
>> Since aiohttp can be in-process, it also allows us to easily add some
>> '/health' endpoints to all of our services directly, even if they aren't
>> intended to be publicly consumable. That's important for running richly
>> inside of things like kubernetes that like to check in on health status
>> of services to know about rescheduling them. This way we could add a
>> simple thread to the scheduler and the executors and the mergers and the
>> nodepool launchers and builders that adds a '/health' endpoint.
>>
>>
> See above. OpenStack has decided this is the wrong route to take
> (granted with eventlet and python2.7 not asyncio and python3.5). There
> are scaling and debugging challenges faced when you try to run an in
> process web server.
>

Well - two different things here.

Actual endpoint = external.

But for the k8s/prometheus request for /health endpoints, the endpoint
needs to actually be per service that is running in a k8s container,
because it's how k8s knows what to do. So for openstack, we'll be starting
with a top-level API service provided /health endpoint that'll work like
things work. But then eventually each of the services, nova-compute,
nova-scheduler, nova-conductor, etc - would each get their own simple
/health endpoint that the controlling service manager can hit for status of
the process in question.

That's what I'm mostly advocating we can support with the ability to run
"in process". It's an "in process as well" that's important.


gRPC / gRPC-REST gateway
>>
>> This is a curve-ball. We could define our APIs using gRPC. That gets us
>> a story for an API that is easily consumable by all sorts of clients,
>> and that supports exciting things like bi-directional streaming
>> channels. gRPC isn't (yet) consumable directly in browsers, nor does
>> Github send gRPC webhooks. BUT - there is a REST Gateway for gRPC:
>>
>> https://github.com/grpc-ecosystem/grpc-gateway
>>
>> that generates HTTP/1.1+JSON interfaces from the gRPC descriptions and
>> translates between protobuf and json automatically. The "REST" interface
>> it produces does not support url-based parameters, so everything is done
>> in payload bodies, so it's:
>>
>>     GET /nodes
>>     {
>>       'id': 1234
>>     }
>>
>> rather than
>>
>>     GET /nodes/1234
>>
>> but that's still totally fine - and totally works for both status.json
>> and GH webhooks.
>>
>> The catch is - grpc-gateway is a grpc compiler plugin that generates
>> golang code. So we'd either have to write our own plugin that does the
>> same thing but for generating python code, or we'd have to write our
>> gRPC/REST layer in go. I betcha folks would appreciate if we implemented
>> the plugin for python, but that's a long tent-pole for this purpose so I
>> don't honestly think we should consider it. Therefore, we should
>> consider that using gRPC + gRPC-REST implies writing the web-tier in go.
>> That obviously implies an additional process that needs to talk over an
>> RPC bus.
>>
>> There are clear complexity costs involved with adding a second language
>> component, especially WRT deployment. (pip install zuul would not be
>> sufficient) OTOH - it would open the door to using protobuf-based
>> objects for internal communication, and would open the door for rich
>> client apps without REST polling and also potentially nice Android apps
>> (gRPC works great for mobile apps) I think that makes it a hard sell.
>>
>> THAT SAID - there are only 2 things that explicitly need REST over HTTP
>> 1.1 - thats the github webhooks and status.json. We could write
>> everything in gRPC except those two. Browser support for gRPC is coming
>> soon (they've moved from "someone is working on it" to "contact us about
>> early access") so status.json could move to being pure gRPC as well ...
>> and the webhook endpoint is pretty simple, so just having it be an
>> in-process aiohttp handler isn't a terrible cost. So if we thought
>> "screw it, let's just gRPC and not have an HTTP/1.1 REST interface at
>> all" - we can stay all in python and gRPC isn't a huge cost at that
>> point.
>>
>> gRPC doesn't handle websockets - but we could still run the gRPC serving
>> and the websocket serving out of the same scale-out web tier.
>>
>>
> Another data point for gRPC is that the etcd3 work in OpenStack found
> that the existing python lib(s) for grpc don't play nice with eventlet
> or asyncio or anything that isn't Thread()
> (https://github.com/grpc/grpc/issues/6046 is the bug tracking that I
> think). This would potentially make the use of asyncio elsewhere
> (websockets) more complicated.
>

Ah - thanks! I didn't realize it touched asyncio too - I thought it was
just eventlet related.

Nevermind then. :)


** Summary
>>
>> Based on the three above, it seems like we need to think about separate
>> web-tier regardless of choice. The one option that doesn't strictly
>> require a separate tier is the one that lets us align on websockets, so
>> it seems that co-location there would be simple.
>>
>> aiohttp seems like the cleanest forward path. It'll require reworking
>> the autobahn code (sorry Shrews) - but is nicely aligned with our
>> Python3 state. It's new - but it's not as totally new as gRPC is. And
>> since we'll already have some websockets stuff, we could also write
>> streaming websockets APIs for the things where we'd want that from gRPC.
>>
>> * REST Framework
>>
>> If we decide to go the WSGI route, then we need to talk REST frameworks
>> (and it's possible we decide to go WSGI because we want to use a REST
>> framework)
>>
>>
> I'm not sure I understand why the WSGI and REST frameworks are being
> conflated. You can do one or the other or both and whichever you choose
> shouldn't affect the other too much aiui. There is even a flask-aiohttp
> lib.
>

Well, because WSGI + gRPC is obviously out. :)

But from flask-aiohttp's readme:

"""
I made this project for testing compatability between WSGI & Async IO.

Since WSGI has no consideration of Async IO, Flask-aiohttp cannot be
perfect.

So, I don't recommend you to use this library for production. Libraries
that was made for Async IO would be better choice (Like gevent, Tornado or
AioHTTP).
"""

I mostly am considering that if we pick aiohttp, since it has routing and
request mashaling already baked in, there is no real value in adding an
additional lib on top of it, as least not at this point. Maybe in the
future someone will write a $something that layers on top of aiohttp and
makes it better - but aiohttp itself seems pretty complete.

So you're right - it's not strictly necessary to conflate, but the only
http-layer scenario where we'd be considering these seriously is the WSGI
choice.


The assumption in this case is that the websocket layer is a separate
>> entity.
>>
>> There are three 'reasonable' options available:
>>
>> - pecan
>> - flask-restplus
>> - apistar
>>
>> pecan
>>
>> pecan is used in a lot of OpenStack services and is also used by
>> Storyboard, so it's well known. Tristan's patches so far use Pecan, so
>> we've got example code.
>>
>> On the other hand, Pecan seems to be mostly only used in OpenStack land
>> and hasn't gotten much adoption elsewhere.
>>
>> flask-restplus
>>
>> https://flask-restplus.readthedocs.io/en/stable/
>>
>> flask is extremely popular for folks doing REST in Python.
>> flask-restplus is a flask extension that also produces Swagger Docs for
>> the REST api, and provides for serving an interactive swagger-ui based
>> browseable interface to the API. You can also define models using
>> JSONSchema. Those are not needed for simple cases like status.json, but
>> for fuller REST API might be nice.
>>
>> Of course, in all cases we could simply document our API using swagger
>> and get the same thing - but that does involve maintaining model/api
>> descriptions and documentation separately.
>>
>> apistar
>>
>> https://github.com/tomchristie/apistar
>>
>> apistar is BRAND NEW and was announced at this year's PyCon. It's from
>> the Django folks and is aimed at writing REST separate from Django.
>>
>> It's python3 from scratch - although it's SO python3 focused that it
>> requires python 3.6. This is because it makes use of type annotations:
>>
>
> Type hinting is in python 3.5 and apistar's trove identifer things
> mention 3.5 support (not sure if actually the case though). But if so
> 3.5 is far easier to use since it is in more distros than Arch and
> Tumbleweed (like with 3.6).
>

Ah - neat. Their code examples are all showing f'' strings which are 3.6
only, so I was assuming 3.6 required.



>>     def show_request(request: http.Request):
>>         return {
>>             'method': request.method,
>>             'url': request.url,
>>             'headers': dict(request.headers)
>>         }
>>
>>     def create_project() -> Response:
>>         data = {'name': 'new project', 'id': 123}
>>         headers = {'Location': 'http://example.com/project/123/'}
>>         return Response(data, status=201, headers=headers)
>>
>> and f'' strings:
>>
>>     def echo_username(username):
>>       return {'message': f'Welcome, {username}!'}
>>
>> Python folks seem to be excited about apistar so far - but I think
>> python 3.6 is a bridge too far - it honestly introduces more deployment
>> issues as doing a golang-gRPC layer.
>>
>> ** Summary
>>
>> I don't think the REST frameworks offer enough benefit to justify their
>> use and adopting WSGI as our path forward.
>>
>
> Yesterday SpamapS mentioned wanting to be able to grow the Zuul
> community. Just based on looking at the choices OpenStack is making
> (moving TO wsgi) and the general populatity of Flask in the python
> community I think that you may want to consider both wsgi and flask
> simply because they are tools that are known to scale reasonably well
> and many people are familiar with them.
>

If we decided to do WSGI I would totally strongly advocate we use Flask.



>> ** Thoughts on RPC Bus **
>>
>> gearman is a simple way to add RPC calls between an API tier and the
>> scheduler. However, we got rid of gear from nodepool already, and we
>> intend on getting rid of gearman in v4 anyway.
>>
>> If we use zk, we'll have to do a little bit more thinking about how to
>> do the RPC calls which will make this take more work. BUT - it means we
>> can define one API that covers both Zuul and Nodepool and will be
>> forward compatible with a v4 no-gearman world.
>>
>> We *could* use gearman in zuul and run an API in-process in nodepool.
>> Then we could take a page out of early Nova and do a proxy-layer in zuul
>> that makes requests of nodepool's API.
>>
>> We could just assume that there's gonna be an Apache fronting this stuff
>> and suggest deployment with routing to zuul and nodepool apis with
>> mod_proxy rules.
>>
>> Finally, as clarkb pointed out in response to the ingestors spec, we
>> could introduce MQTT and use it. I'm wary of doing that for this because
>> it introduces a totally required new tech stack at a late stage.
>>
>
> Mostly I was just pointing out that I think the vast majority of the
> infastructure work to have something like a zuul ingestor is done. You
> just have to read from an mqtt connection instead of a gerrit ssh
> connection. Granted this does require running more services (mqtt server
> and the event stream handler) and doesn't handle entities like Github.
>
> That said MQTT unlike Gearman seems to be seeing quite a bit of
> development activity due to the popularity of IoT. Gearman has worked
> reasonably well for us though so I don't think we need to just replace
> it to get in on the IoT bandwagon.
>

The concerns I have wrt ingestors aren't protocol based so much but about
missing events.

It's a pretty poor user experience when Zuul misses an event. Because the
events from gerrit come from the ssh event stream, it means ultimately that
you need to have more than one listener attached to the stream and then for
the events to go through a de-dup phase so that jobs don't get triggered
more than once. This is mostly thinking about updating zuul - which is a
scenario in which we miss events while we're down.

If we use MQTT, I believe it gives us the ability to pick up somewhere when
we connect - but I'm not certain if ingesting the events into the MQTT
service itself is HA enough that we can restart the MQTT services.

But the same principles could be applied - more than one event stream
handler, potentially just using zk in a leader-election kind of manner if
needed, writing the events themselves to MQTT server. I'll definitely be
excited to learn more - but at some point in the future I'd like to be able
to frequently roll updates without missing events.



>> Since we're starting fresh, I like the idea of a single API service that
>> RPCs to zuul and nodepool, so I like the idea of using ZK for the RPC
>> layer. BUT - using gear and adding just gear worker threads back to
>> nodepol wouldn't be super-terrible maybe.
>>
>
> Nodepool hasn't had a Gearman less release yet so you don't have to
> worry about backward compat at least.
>

:)



>> ** Final Summary **
>>
>> As I tl;dr'd earlier, I think aiohttp co-located with the scale-out
>> websocket tier talking to the scheduler over zk is the best bet for us.
>> I think it's both simple enough to adopt and gets us a rich set of
>> features. It also lets us implement in-process simple health endpoints
>> on each service with the same tech stack.
>>
>
> I'm wary of this simply because it looks a lot like repeating
> OpenStack's (now failed) decision to stick web servers in a bunch of
> python processes then do cooperative multithreading with them along with
> all your application logic. It just gets complicated. I also think this
> underestimates the value of using tools people are familiar with (wsgi
> and flask) particularly if making it easy to jump in and building
> community is a goal.
>

I must clearly have just said something incorrectly, as that's not what I'm
suggesting AT ALL.

I'm suggesting a zuul-api service that stateless/scale-out able and is
independent of the application. Then that service, like the OpenStack
services, would get its info from the application via $something. In the
followups jeblair points out that we can currently do this via a mix of
methods - gearman or zk depending on what makes the most sense.

I think that approach is what we should do regardless of whether we do WSGI
or aiohttp.

I'm advocating aiohttp because it has enough REST helpers built in and it
does websockets out of the box, so we can use one library rather than two
for our varied HTTP needs.

Sorry if I rambled too much and made that not clear.


_______________________________________________
OpenStack-Infra mailing list
OpenStack-Infra at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20170611/83b14233/attachment-0001.html>


More information about the OpenStack-Infra mailing list