[OpenStack-Infra] On the subject of HTTP interfaces and Zuul

Monty Taylor mordred at inaugust.com
Fri Jun 9 20:15:51 UTC 2017


On 06/09/2017 03:09 PM, Monty Taylor wrote:
> On 06/09/2017 02:35 PM, Clark Boylan wrote:
>> On Fri, Jun 9, 2017, at 09:22 AM, Monty Taylor wrote:
>>> Hey all!
>>>
>>> Tristan has recently pushed up some patches related to providing a Web
>>> Dashboard for Zuul. We have a web app for nodepool. We already have the
>>> Github webhook receiver which is inbound http. There have been folks who
>>> have expressed interest in adding active-REST abilities for performing
>>> actions. AND we have the new websocket-based log streaming.
>>>
>>> We're currently using Paste for HTTP serving (which is effectively
>>> dead), autobahn for websockets and WebOB for request/response 
>>> processing.
>>>
>>> This means that before we get too far down the road, it's probably time
>>> to pick how we're going to do those things in general. There are 2
>>> questions on the table:
>>>
>>> * HTTP serving
>>> * REST framework
>>>
>>> They may or may not be related, and one of the options on the table
>>> implies an answer for both. I'm going to start with the answer I think
>>> we should pick:
>>>
>>> *** tl;dr ***
>>>
>>> We should use aiohttp with no extra REST framework.
>>>
>>> Meaning:
>>>
>>> - aiohttp serving REST and websocket streaming in a scale-out tier
>>> - talking RPC to the scheduler over gear or zk
>>> - possible in-process aiohttp endpoints for k8s style health endpoints
>>>
>>> Since we're talking about a web scale-out tier that we should just have
>>> a single web tier for zuul and nodepool. This continues the thinking
>>> that nodepool is a component of Zuul.
>>
>> I'm not sure that this is a great idea. We've already seen that people
>> have wanted to use nodepool without a Zuul and even without performing
>> CI. IIRC paul wanted to use it to keep a set of asterisks floating
>> around for example. We've also seen that people want to use
>> subcomponents of nodepool to build and manage a set of images for clouds
>> without making instances.
> 
> Excellent point.
> 
>> In the past we have been careful to keep logical tools separate which
>> has made it easy for us to add new tools and remove old ones.
>> Operationally this may be perceived as making things more difficult to a
>> newcomer, but it makes life much much better 3-6 months down the road.
>>
>>>
>>> In order to write zuul jobs, end-users must know what node labels are
>>> available. A zuul client that says "please get me a list of available
>>> node labels" could make sense to a user. As we get more non-OpenStack
>>> users, those people may not have any concept that there is a separate
>>> thing called "nodepool".
>>>
>>> *** The MUCH more verbose version ***
>>>
>>> I'm now going to outline all of the thoughts and options I've had or
>>> have heard other people say. It's an extra complete list - there are
>>> ideas in here you might find silly/bad. But since we're picking a
>>> direction, I think it's important we consider the options in front of 
>>> us.
>>>
>>> This will cover 3 http serving options:
>>>
>>> - WSGI
>>> - aiohttp
>>> - gRPC
>>>
>>> and 3 REST framework options:
>>>
>>> - pecan
>>> - flask-restplus
>>> - apistar
>>>
>>> ** HTTP Serving **
>>>
>>> WSGI
>>>
>>> The WSGI approach is one we're all familiar with and it works with
>>> pretty much every existing Python REST framework. For us I believe if we
>>> go this route we'd want to serve it with something like uwsgi and
>>> Apache. That adds the need for an Apache layer and/or management uwsgi
>>> process. However, it means we can make use of normal tools we all likely
>>> know at least to some degree.
>>
>> FWIW I don't think Apache would be required. uWSGI is a fairly capable
>> http server aiui. You can also pip install uwsgi so the simple case
>> remains fairly simple I think.
> 
> Also good point.
> 
>>>
>>> A downside is that we'll need to continue to handle our Websockets work
>>> independently (which is what we're doing now)
>>>
>>> Because it's in a separate process, the API tier will need to make
>>> requests of the scheduler over a bus, which could be either gearman or
>>> zk.
>>>
>>
>> Note that OpenStack has decided that this is a better solution than
>> using web servers in the python process. That doesn't necessarily mean
>> it is the best choice for Zuul, but it seems like there is a lot we can
>> learn from the choice to switch to WSGI in OpenStack.
> 
> Yah. I definitely more strongly lean towards external.
> 
>>> aiohttp
>>>
>>> Zuul v3 is Python3, which means we can use aiohttp. aiohttp isn't
>>> particularly compatible with the REST frameworks, but it has built-in
>>> route support and helpers for receiving and returning JSON. We don't
>>> need ORM mapping support, so the only thing we'd really be MISSING from
>>> REST frameworks is auto-generated documentation.
>>>
>>> aiohttp also supports websockets directly, so we could port the autobahn
>>> work to use aiohttp.
>>>
>>> aiohttp can be run in-process in a thread. However, websocket
>>> log-streaming is already a separate process for scaling purposes, so if
>>> we decide that one impl backend is a value, it probably makes sense to
>>> just stick the web tier in the websocket scaleout process anyway.
>>>
>>> However, we could probably write a facade layer with a gear backend and
>>> an in-memory backend so that simple users could just run the in-process
>>> version but scale-out was possible for larger installs (like us)
>>>
>>> Since aiohttp can be in-process, it also allows us to easily add some
>>> '/health' endpoints to all of our services directly, even if they aren't
>>> intended to be publicly consumable. That's important for running richly
>>> inside of things like kubernetes that like to check in on health status
>>> of services to know about rescheduling them. This way we could add a
>>> simple thread to the scheduler and the executors and the mergers and the
>>> nodepool launchers and builders that adds a '/health' endpoint.
>>>
>>
>> See above. OpenStack has decided this is the wrong route to take
>> (granted with eventlet and python2.7 not asyncio and python3.5). There
>> are scaling and debugging challenges faced when you try to run an in
>> process web server.
> 
> Well - two different things here.
> 
> Actual endpoint = external.
> 
> But for the k8s/prometheus request for /health endpoints, the endpoint 
> needs to actually be per service that is running in a k8s container, 
> because it's how k8s knows what to do. So for openstack, we'll be 
> starting with a top-level API service provided /health endpoint that'll 
> work like things work. But then eventually each of the services, 
> nova-compute, nova-scheduler, nova-conductor, etc - would each get their 
> own simple /health endpoint that the controlling service manager can hit 
> for status of the process in question.
> 
> That's what I'm mostly advocating we can support with the ability to run 
> "in process". It's an "in process as well" that's important. 
>>> gRPC / gRPC-REST gateway
>>>
>>> This is a curve-ball. We could define our APIs using gRPC. That gets us
>>> a story for an API that is easily consumable by all sorts of clients,
>>> and that supports exciting things like bi-directional streaming
>>> channels. gRPC isn't (yet) consumable directly in browsers, nor does
>>> Github send gRPC webhooks. BUT - there is a REST Gateway for gRPC:
>>>
>>> https://github.com/grpc-ecosystem/grpc-gateway
>>>
>>> that generates HTTP/1.1+JSON interfaces from the gRPC descriptions and
>>> translates between protobuf and json automatically. The "REST" interface
>>> it produces does not support url-based parameters, so everything is done
>>> in payload bodies, so it's:
>>>
>>>     GET /nodes
>>>     {
>>>       'id': 1234
>>>     }
>>>
>>> rather than
>>>
>>>     GET /nodes/1234
>>>
>>> but that's still totally fine - and totally works for both status.json
>>> and GH webhooks.
>>>
>>> The catch is - grpc-gateway is a grpc compiler plugin that generates
>>> golang code. So we'd either have to write our own plugin that does the
>>> same thing but for generating python code, or we'd have to write our
>>> gRPC/REST layer in go. I betcha folks would appreciate if we implemented
>>> the plugin for python, but that's a long tent-pole for this purpose so I
>>> don't honestly think we should consider it. Therefore, we should
>>> consider that using gRPC + gRPC-REST implies writing the web-tier in go.
>>> That obviously implies an additional process that needs to talk over an
>>> RPC bus.
>>>
>>> There are clear complexity costs involved with adding a second language
>>> component, especially WRT deployment. (pip install zuul would not be
>>> sufficient) OTOH - it would open the door to using protobuf-based
>>> objects for internal communication, and would open the door for rich
>>> client apps without REST polling and also potentially nice Android apps
>>> (gRPC works great for mobile apps) I think that makes it a hard sell.
>>>
>>> THAT SAID - there are only 2 things that explicitly need REST over HTTP
>>> 1.1 - thats the github webhooks and status.json. We could write
>>> everything in gRPC except those two. Browser support for gRPC is coming
>>> soon (they've moved from "someone is working on it" to "contact us about
>>> early access") so status.json could move to being pure gRPC as well ...
>>> and the webhook endpoint is pretty simple, so just having it be an
>>> in-process aiohttp handler isn't a terrible cost. So if we thought
>>> "screw it, let's just gRPC and not have an HTTP/1.1 REST interface at
>>> all" - we can stay all in python and gRPC isn't a huge cost at that
>>> point.
>>>
>>> gRPC doesn't handle websockets - but we could still run the gRPC serving
>>> and the websocket serving out of the same scale-out web tier.
>>>
>>
>> Another data point for gRPC is that the etcd3 work in OpenStack found
>> that the existing python lib(s) for grpc don't play nice with eventlet
>> or asyncio or anything that isn't Thread()
>> (https://github.com/grpc/grpc/issues/6046 is the bug tracking that I
>> think). This would potentially make the use of asyncio elsewhere
>> (websockets) more complicated.
> 
> Ah - thanks! I didn't realize it touched asyncio too - I thought it was 
> just eventlet related.
> 
> Nevermind then. :)
> 
>>> ** Summary
>>>
>>> Based on the three above, it seems like we need to think about separate
>>> web-tier regardless of choice. The one option that doesn't strictly
>>> require a separate tier is the one that lets us align on websockets, so
>>> it seems that co-location there would be simple.
>>>
>>> aiohttp seems like the cleanest forward path. It'll require reworking
>>> the autobahn code (sorry Shrews) - but is nicely aligned with our
>>> Python3 state. It's new - but it's not as totally new as gRPC is. And
>>> since we'll already have some websockets stuff, we could also write
>>> streaming websockets APIs for the things where we'd want that from gRPC.
>>>
>>> * REST Framework
>>>
>>> If we decide to go the WSGI route, then we need to talk REST frameworks
>>> (and it's possible we decide to go WSGI because we want to use a REST
>>> framework)
>>>
>>
>> I'm not sure I understand why the WSGI and REST frameworks are being
>> conflated. You can do one or the other or both and whichever you choose
>> shouldn't affect the other too much aiui. There is even a flask-aiohttp
>> lib.
> 
> Well, because WSGI + gRPC is obviously out. :)
> 
> But from flask-aiohttp's readme:
> 
> """
> I made this project for testing compatability between WSGI & Async IO.
> 
> Since WSGI has no consideration of Async IO, Flask-aiohttp cannot be 
> perfect.
> 
> So, I don't recommend you to use this library for production. Libraries 
> that was made for Async IO would be better choice (Like gevent, Tornado 
> or AioHTTP).
> """
> 
> I mostly am considering that if we pick aiohttp, since it has routing 
> and request mashaling already baked in, there is no real value in adding 
> an additional lib on top of it, as least not at this point. Maybe in the 
> future someone will write a $something that layers on top of aiohttp and 
> makes it better - but aiohttp itself seems pretty complete.
> 
> So you're right - it's not strictly necessary to conflate, but the only 
> http-layer scenario where we'd be considering these seriously is the 
> WSGI choice.
> 
>>> The assumption in this case is that the websocket layer is a separate
>>> entity.
>>>
>>> There are three 'reasonable' options available:
>>>
>>> - pecan
>>> - flask-restplus
>>> - apistar
>>>
>>> pecan
>>>
>>> pecan is used in a lot of OpenStack services and is also used by
>>> Storyboard, so it's well known. Tristan's patches so far use Pecan, so
>>> we've got example code.
>>>
>>> On the other hand, Pecan seems to be mostly only used in OpenStack land
>>> and hasn't gotten much adoption elsewhere.
>>>
>>> flask-restplus
>>>
>>> https://flask-restplus.readthedocs.io/en/stable/
>>>
>>> flask is extremely popular for folks doing REST in Python.
>>> flask-restplus is a flask extension that also produces Swagger Docs for
>>> the REST api, and provides for serving an interactive swagger-ui based
>>> browseable interface to the API. You can also define models using
>>> JSONSchema. Those are not needed for simple cases like status.json, but
>>> for fuller REST API might be nice.
>>>
>>> Of course, in all cases we could simply document our API using swagger
>>> and get the same thing - but that does involve maintaining model/api
>>> descriptions and documentation separately.

It's worth noting (I just found it myself) that we can do swagger with 
aiohttp too:

http://aiohttp-swagger.readthedocs.io/en/latest/

>>> apistar
>>>
>>> https://github.com/tomchristie/apistar
>>>
>>> apistar is BRAND NEW and was announced at this year's PyCon. It's from
>>> the Django folks and is aimed at writing REST separate from Django.
>>>
>>> It's python3 from scratch - although it's SO python3 focused that it
>>> requires python 3.6. This is because it makes use of type annotations:
>>
>> Type hinting is in python 3.5 and apistar's trove identifer things
>> mention 3.5 support (not sure if actually the case though). But if so
>> 3.5 is far easier to use since it is in more distros than Arch and
>> Tumbleweed (like with 3.6).
> 
> Ah - neat. Their code examples are all showing f'' strings which are 3.6 
> only, so I was assuming 3.6 required.
> 
>>>
>>>     def show_request(request: http.Request):
>>>         return {
>>>             'method': request.method,
>>>             'url': request.url,
>>>             'headers': dict(request.headers)
>>>         }
>>>
>>>     def create_project() -> Response:
>>>         data = {'name': 'new project', 'id': 123}
>>>         headers = {'Location': 'http://example.com/project/123/'}
>>>         return Response(data, status=201, headers=headers)
>>>
>>> and f'' strings:
>>>
>>>     def echo_username(username):
>>>       return {'message': f'Welcome, {username}!'}
>>>
>>> Python folks seem to be excited about apistar so far - but I think
>>> python 3.6 is a bridge too far - it honestly introduces more deployment
>>> issues as doing a golang-gRPC layer.
>>>
>>> ** Summary
>>>
>>> I don't think the REST frameworks offer enough benefit to justify their
>>> use and adopting WSGI as our path forward.
>>
>> Yesterday SpamapS mentioned wanting to be able to grow the Zuul
>> community. Just based on looking at the choices OpenStack is making
>> (moving TO wsgi) and the general populatity of Flask in the python
>> community I think that you may want to consider both wsgi and flask
>> simply because they are tools that are known to scale reasonably well
>> and many people are familiar with them.
> 
> If we decided to do WSGI I would totally strongly advocate we use Flask.
> 
>>>
>>> ** Thoughts on RPC Bus **
>>>
>>> gearman is a simple way to add RPC calls between an API tier and the
>>> scheduler. However, we got rid of gear from nodepool already, and we
>>> intend on getting rid of gearman in v4 anyway.
>>>
>>> If we use zk, we'll have to do a little bit more thinking about how to
>>> do the RPC calls which will make this take more work. BUT - it means we
>>> can define one API that covers both Zuul and Nodepool and will be
>>> forward compatible with a v4 no-gearman world.
>>>
>>> We *could* use gearman in zuul and run an API in-process in nodepool.
>>> Then we could take a page out of early Nova and do a proxy-layer in zuul
>>> that makes requests of nodepool's API.
>>>
>>> We could just assume that there's gonna be an Apache fronting this stuff
>>> and suggest deployment with routing to zuul and nodepool apis with
>>> mod_proxy rules.
>>>
>>> Finally, as clarkb pointed out in response to the ingestors spec, we
>>> could introduce MQTT and use it. I'm wary of doing that for this because
>>> it introduces a totally required new tech stack at a late stage.
>>
>> Mostly I was just pointing out that I think the vast majority of the
>> infastructure work to have something like a zuul ingestor is done. You
>> just have to read from an mqtt connection instead of a gerrit ssh
>> connection. Granted this does require running more services (mqtt server
>> and the event stream handler) and doesn't handle entities like Github.
>>
>> That said MQTT unlike Gearman seems to be seeing quite a bit of
>> development activity due to the popularity of IoT. Gearman has worked
>> reasonably well for us though so I don't think we need to just replace
>> it to get in on the IoT bandwagon.
> 
> The concerns I have wrt ingestors aren't protocol based so much but 
> about missing events.
> 
> It's a pretty poor user experience when Zuul misses an event. Because 
> the events from gerrit come from the ssh event stream, it means 
> ultimately that you need to have more than one listener attached to the 
> stream and then for the events to go through a de-dup phase so that jobs 
> don't get triggered more than once. This is mostly thinking about 
> updating zuul - which is a scenario in which we miss events while we're 
> down.
> 
> If we use MQTT, I believe it gives us the ability to pick up somewhere 
> when we connect - but I'm not certain if ingesting the events into the 
> MQTT service itself is HA enough that we can restart the MQTT services.
> 
> But the same principles could be applied - more than one event stream 
> handler, potentially just using zk in a leader-election kind of manner 
> if needed, writing the events themselves to MQTT server. I'll definitely 
> be excited to learn more - but at some point in the future I'd like to 
> be able to frequently roll updates without missing events.
> 
>>>
>>> Since we're starting fresh, I like the idea of a single API service that
>>> RPCs to zuul and nodepool, so I like the idea of using ZK for the RPC
>>> layer. BUT - using gear and adding just gear worker threads back to
>>> nodepol wouldn't be super-terrible maybe.
>>
>> Nodepool hasn't had a Gearman less release yet so you don't have to
>> worry about backward compat at least.
> 
> :)
> 
>>>
>>> ** Final Summary **
>>>
>>> As I tl;dr'd earlier, I think aiohttp co-located with the scale-out
>>> websocket tier talking to the scheduler over zk is the best bet for us.
>>> I think it's both simple enough to adopt and gets us a rich set of
>>> features. It also lets us implement in-process simple health endpoints
>>> on each service with the same tech stack.
>>
>> I'm wary of this simply because it looks a lot like repeating
>> OpenStack's (now failed) decision to stick web servers in a bunch of
>> python processes then do cooperative multithreading with them along with
>> all your application logic. It just gets complicated. I also think this
>> underestimates the value of using tools people are familiar with (wsgi
>> and flask) particularly if making it easy to jump in and building
>> community is a goal.
> 
> I must clearly have just said something incorrectly, as that's not what 
> I'm suggesting AT ALL.
> 
> I'm suggesting a zuul-api service that stateless/scale-out able and is 
> independent of the application. Then that service, like the OpenStack 
> services, would get its info from the application via $something. In the 
> followups jeblair points out that we can currently do this via a mix of 
> methods - gearman or zk depending on what makes the most sense.
> 
> I think that approach is what we should do regardless of whether we do 
> WSGI or aiohttp.
> 
> I'm advocating aiohttp because it has enough REST helpers built in and 
> it does websockets out of the box, so we can use one library rather than 
> two for our varied HTTP needs.
> 
> Sorry if I rambled too much and made that not clear.
> 
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra




More information about the OpenStack-Infra mailing list