[OpenStack-Infra] On the subject of HTTP interfaces and Zuul

Monty Taylor mordred at inaugust.com
Fri Jun 9 16:22:03 UTC 2017


Hey all!

Tristan has recently pushed up some patches related to providing a Web 
Dashboard for Zuul. We have a web app for nodepool. We already have the 
Github webhook receiver which is inbound http. There have been folks who 
have expressed interest in adding active-REST abilities for performing 
actions. AND we have the new websocket-based log streaming.

We're currently using Paste for HTTP serving (which is effectively 
dead), autobahn for websockets and WebOB for request/response processing.

This means that before we get too far down the road, it's probably time 
to pick how we're going to do those things in general. There are 2 
questions on the table:

* HTTP serving
* REST framework

They may or may not be related, and one of the options on the table 
implies an answer for both. I'm going to start with the answer I think 
we should pick:

*** tl;dr ***

We should use aiohttp with no extra REST framework.

Meaning:

- aiohttp serving REST and websocket streaming in a scale-out tier
- talking RPC to the scheduler over gear or zk
- possible in-process aiohttp endpoints for k8s style health endpoints

Since we're talking about a web scale-out tier that we should just have 
a single web tier for zuul and nodepool. This continues the thinking 
that nodepool is a component of Zuul.

In order to write zuul jobs, end-users must know what node labels are 
available. A zuul client that says "please get me a list of available 
node labels" could make sense to a user. As we get more non-OpenStack 
users, those people may not have any concept that there is a separate 
thing called "nodepool".

*** The MUCH more verbose version ***

I'm now going to outline all of the thoughts and options I've had or 
have heard other people say. It's an extra complete list - there are 
ideas in here you might find silly/bad. But since we're picking a 
direction, I think it's important we consider the options in front of us.

This will cover 3 http serving options:

- WSGI
- aiohttp
- gRPC

and 3 REST framework options:

- pecan
- flask-restplus
- apistar

** HTTP Serving **

WSGI

The WSGI approach is one we're all familiar with and it works with 
pretty much every existing Python REST framework. For us I believe if we 
go this route we'd want to serve it with something like uwsgi and 
Apache. That adds the need for an Apache layer and/or management uwsgi 
process. However, it means we can make use of normal tools we all likely 
know at least to some degree.

A downside is that we'll need to continue to handle our Websockets work 
independently (which is what we're doing now)

Because it's in a separate process, the API tier will need to make 
requests of the scheduler over a bus, which could be either gearman or zk.

aiohttp

Zuul v3 is Python3, which means we can use aiohttp. aiohttp isn't 
particularly compatible with the REST frameworks, but it has built-in 
route support and helpers for receiving and returning JSON. We don't 
need ORM mapping support, so the only thing we'd really be MISSING from 
REST frameworks is auto-generated documentation.

aiohttp also supports websockets directly, so we could port the autobahn 
work to use aiohttp.

aiohttp can be run in-process in a thread. However, websocket 
log-streaming is already a separate process for scaling purposes, so if 
we decide that one impl backend is a value, it probably makes sense to 
just stick the web tier in the websocket scaleout process anyway.

However, we could probably write a facade layer with a gear backend and 
an in-memory backend so that simple users could just run the in-process 
version but scale-out was possible for larger installs (like us)

Since aiohttp can be in-process, it also allows us to easily add some 
'/health' endpoints to all of our services directly, even if they aren't 
intended to be publicly consumable. That's important for running richly 
inside of things like kubernetes that like to check in on health status 
of services to know about rescheduling them. This way we could add a 
simple thread to the scheduler and the executors and the mergers and the 
nodepool launchers and builders that adds a '/health' endpoint.

gRPC / gRPC-REST gateway

This is a curve-ball. We could define our APIs using gRPC. That gets us 
a story for an API that is easily consumable by all sorts of clients, 
and that supports exciting things like bi-directional streaming 
channels. gRPC isn't (yet) consumable directly in browsers, nor does 
Github send gRPC webhooks. BUT - there is a REST Gateway for gRPC:

https://github.com/grpc-ecosystem/grpc-gateway

that generates HTTP/1.1+JSON interfaces from the gRPC descriptions and 
translates between protobuf and json automatically. The "REST" interface 
it produces does not support url-based parameters, so everything is done 
in payload bodies, so it's:

   GET /nodes
   {
     'id': 1234
   }

rather than

   GET /nodes/1234

but that's still totally fine - and totally works for both status.json 
and GH webhooks.

The catch is - grpc-gateway is a grpc compiler plugin that generates 
golang code. So we'd either have to write our own plugin that does the 
same thing but for generating python code, or we'd have to write our 
gRPC/REST layer in go. I betcha folks would appreciate if we implemented 
the plugin for python, but that's a long tent-pole for this purpose so I 
don't honestly think we should consider it. Therefore, we should 
consider that using gRPC + gRPC-REST implies writing the web-tier in go. 
That obviously implies an additional process that needs to talk over an 
RPC bus.

There are clear complexity costs involved with adding a second language 
component, especially WRT deployment. (pip install zuul would not be 
sufficient) OTOH - it would open the door to using protobuf-based 
objects for internal communication, and would open the door for rich 
client apps without REST polling and also potentially nice Android apps 
(gRPC works great for mobile apps) I think that makes it a hard sell.

THAT SAID - there are only 2 things that explicitly need REST over HTTP 
1.1 - thats the github webhooks and status.json. We could write 
everything in gRPC except those two. Browser support for gRPC is coming 
soon (they've moved from "someone is working on it" to "contact us about 
early access") so status.json could move to being pure gRPC as well ... 
and the webhook endpoint is pretty simple, so just having it be an 
in-process aiohttp handler isn't a terrible cost. So if we thought 
"screw it, let's just gRPC and not have an HTTP/1.1 REST interface at 
all" - we can stay all in python and gRPC isn't a huge cost at that point.

gRPC doesn't handle websockets - but we could still run the gRPC serving 
and the websocket serving out of the same scale-out web tier.

** Summary

Based on the three above, it seems like we need to think about separate 
web-tier regardless of choice. The one option that doesn't strictly 
require a separate tier is the one that lets us align on websockets, so 
it seems that co-location there would be simple.

aiohttp seems like the cleanest forward path. It'll require reworking 
the autobahn code (sorry Shrews) - but is nicely aligned with our 
Python3 state. It's new - but it's not as totally new as gRPC is. And 
since we'll already have some websockets stuff, we could also write 
streaming websockets APIs for the things where we'd want that from gRPC.

* REST Framework

If we decide to go the WSGI route, then we need to talk REST frameworks 
(and it's possible we decide to go WSGI because we want to use a REST 
framework)

The assumption in this case is that the websocket layer is a separate 
entity.

There are three 'reasonable' options available:

- pecan
- flask-restplus
- apistar

pecan

pecan is used in a lot of OpenStack services and is also used by 
Storyboard, so it's well known. Tristan's patches so far use Pecan, so 
we've got example code.

On the other hand, Pecan seems to be mostly only used in OpenStack land 
and hasn't gotten much adoption elsewhere.

flask-restplus

https://flask-restplus.readthedocs.io/en/stable/

flask is extremely popular for folks doing REST in Python. 
flask-restplus is a flask extension that also produces Swagger Docs for 
the REST api, and provides for serving an interactive swagger-ui based 
browseable interface to the API. You can also define models using 
JSONSchema. Those are not needed for simple cases like status.json, but 
for fuller REST API might be nice.

Of course, in all cases we could simply document our API using swagger 
and get the same thing - but that does involve maintaining model/api 
descriptions and documentation separately.

apistar

https://github.com/tomchristie/apistar

apistar is BRAND NEW and was announced at this year's PyCon. It's from 
the Django folks and is aimed at writing REST separate from Django.

It's python3 from scratch - although it's SO python3 focused that it 
requires python 3.6. This is because it makes use of type annotations:

   def show_request(request: http.Request):
       return {
           'method': request.method,
           'url': request.url,
           'headers': dict(request.headers)
       }

   def create_project() -> Response:
       data = {'name': 'new project', 'id': 123}
       headers = {'Location': 'http://example.com/project/123/'}
       return Response(data, status=201, headers=headers)

and f'' strings:

   def echo_username(username):
     return {'message': f'Welcome, {username}!'}

Python folks seem to be excited about apistar so far - but I think 
python 3.6 is a bridge too far - it honestly introduces more deployment 
issues as doing a golang-gRPC layer.

** Summary

I don't think the REST frameworks offer enough benefit to justify their 
use and adopting WSGI as our path forward.

** Thoughts on RPC Bus **

gearman is a simple way to add RPC calls between an API tier and the 
scheduler. However, we got rid of gear from nodepool already, and we 
intend on getting rid of gearman in v4 anyway.

If we use zk, we'll have to do a little bit more thinking about how to 
do the RPC calls which will make this take more work. BUT - it means we 
can define one API that covers both Zuul and Nodepool and will be 
forward compatible with a v4 no-gearman world.

We *could* use gearman in zuul and run an API in-process in nodepool. 
Then we could take a page out of early Nova and do a proxy-layer in zuul 
that makes requests of nodepool's API.

We could just assume that there's gonna be an Apache fronting this stuff 
and suggest deployment with routing to zuul and nodepool apis with 
mod_proxy rules.

Finally, as clarkb pointed out in response to the ingestors spec, we 
could introduce MQTT and use it. I'm wary of doing that for this because 
it introduces a totally required new tech stack at a late stage.

Since we're starting fresh, I like the idea of a single API service that 
RPCs to zuul and nodepool, so I like the idea of using ZK for the RPC 
layer. BUT - using gear and adding just gear worker threads back to 
nodepol wouldn't be super-terrible maybe.

** Final Summary **

As I tl;dr'd earlier, I think aiohttp co-located with the scale-out 
websocket tier talking to the scheduler over zk is the best bet for us. 
I think it's both simple enough to adopt and gets us a rich set of 
features. It also lets us implement in-process simple health endpoints 
on each service with the same tech stack.



More information about the OpenStack-Infra mailing list