Open Stack

Fri Jun 9 19:54:31 UTC 2017

On Fri, Jun 09, 2017 at 12:35:59PM -0700, Clark Boylan wrote:
> On Fri, Jun 9, 2017, at 09:22 AM, Monty Taylor wrote:
> > Hey all!
> > 
> > Tristan has recently pushed up some patches related to providing a Web 
> > Dashboard for Zuul. We have a web app for nodepool. We already have the 
> > Github webhook receiver which is inbound http. There have been folks who 
> > have expressed interest in adding active-REST abilities for performing 
> > actions. AND we have the new websocket-based log streaming.
> > 
> > We're currently using Paste for HTTP serving (which is effectively 
> > dead), autobahn for websockets and WebOB for request/response processing.
> > 
> > This means that before we get too far down the road, it's probably time 
> > to pick how we're going to do those things in general. There are 2 
> > questions on the table:
> > 
> > * HTTP serving
> > * REST framework
> > 
> > They may or may not be related, and one of the options on the table 
> > implies an answer for both. I'm going to start with the answer I think 
> > we should pick:
> > 
> > *** tl;dr ***
> > 
> > We should use aiohttp with no extra REST framework.
> > 
> > Meaning:
> > 
> > - aiohttp serving REST and websocket streaming in a scale-out tier
> > - talking RPC to the scheduler over gear or zk
> > - possible in-process aiohttp endpoints for k8s style health endpoints
> > 
> > Since we're talking about a web scale-out tier that we should just have 
> > a single web tier for zuul and nodepool. This continues the thinking 
> > that nodepool is a component of Zuul.
> 
> I'm not sure that this is a great idea. We've already seen that people
> have wanted to use nodepool without a Zuul and even without performing
> CI. IIRC paul wanted to use it to keep a set of asterisks floating
> around for example. We've also seen that people want to use
> subcomponents of nodepool to build and manage a set of images for clouds
> without making instances.
> 
Ya, asterisk use case aside, I think image build as a service is a prime example
of something nodepool could be great at on its own.  Especially now that
nodepool-builder is scaling out very well with zookeeper.

> In the past we have been careful to keep logical tools separate which
> has made it easy for us to add new tools and remove old ones.
> Operationally this may be perceived as making things more difficult to a
> newcomer, but it makes life much much better 3-6 months down the road.
> 
> > 
> > In order to write zuul jobs, end-users must know what node labels are 
> > available. A zuul client that says "please get me a list of available 
> > node labels" could make sense to a user. As we get more non-OpenStack 
> > users, those people may not have any concept that there is a separate 
> > thing called "nodepool".
> > 
> > *** The MUCH more verbose version ***
> > 
> > I'm now going to outline all of the thoughts and options I've had or 
> > have heard other people say. It's an extra complete list - there are 
> > ideas in here you might find silly/bad. But since we're picking a 
> > direction, I think it's important we consider the options in front of us.
> > 
> > This will cover 3 http serving options:
> > 
> > - WSGI
> > - aiohttp
> > - gRPC
> > 
> > and 3 REST framework options:
> > 
> > - pecan
> > - flask-restplus
> > - apistar
> > 
> > ** HTTP Serving **
> > 
> > WSGI
> > 
> > The WSGI approach is one we're all familiar with and it works with 
> > pretty much every existing Python REST framework. For us I believe if we 
> > go this route we'd want to serve it with something like uwsgi and 
> > Apache. That adds the need for an Apache layer and/or management uwsgi 
> > process. However, it means we can make use of normal tools we all likely 
> > know at least to some degree.
> 
> FWIW I don't think Apache would be required. uWSGI is a fairly capable
> http server aiui. You can also pip install uwsgi so the simple case
> remains fairly simple I think.
> 
> > 
> > A downside is that we'll need to continue to handle our Websockets work 
> > independently (which is what we're doing now)
> > 
> > Because it's in a separate process, the API tier will need to make 
> > requests of the scheduler over a bus, which could be either gearman or
> > zk.
> > 
> 
> Note that OpenStack has decided that this is a better solution than
> using web servers in the python process. That doesn't necessarily mean
> it is the best choice for Zuul, but it seems like there is a lot we can
> learn from the choice to switch to WSGI in OpenStack.
> 
> > aiohttp
> > 
> > Zuul v3 is Python3, which means we can use aiohttp. aiohttp isn't 
> > particularly compatible with the REST frameworks, but it has built-in 
> > route support and helpers for receiving and returning JSON. We don't 
> > need ORM mapping support, so the only thing we'd really be MISSING from 
> > REST frameworks is auto-generated documentation.
> > 
> > aiohttp also supports websockets directly, so we could port the autobahn 
> > work to use aiohttp.
> > 
> > aiohttp can be run in-process in a thread. However, websocket 
> > log-streaming is already a separate process for scaling purposes, so if 
> > we decide that one impl backend is a value, it probably makes sense to 
> > just stick the web tier in the websocket scaleout process anyway.
> > 
> > However, we could probably write a facade layer with a gear backend and 
> > an in-memory backend so that simple users could just run the in-process 
> > version but scale-out was possible for larger installs (like us)
> > 
> > Since aiohttp can be in-process, it also allows us to easily add some 
> > '/health' endpoints to all of our services directly, even if they aren't 
> > intended to be publicly consumable. That's important for running richly 
> > inside of things like kubernetes that like to check in on health status 
> > of services to know about rescheduling them. This way we could add a 
> > simple thread to the scheduler and the executors and the mergers and the 
> > nodepool launchers and builders that adds a '/health' endpoint.
> > 
> 
> See above. OpenStack has decided this is the wrong route to take
> (granted with eventlet and python2.7 not asyncio and python3.5). There
> are scaling and debugging challenges faced when you try to run an in
> process web server.
> 
> > gRPC / gRPC-REST gateway
> > 
> > This is a curve-ball. We could define our APIs using gRPC. That gets us 
> > a story for an API that is easily consumable by all sorts of clients, 
> > and that supports exciting things like bi-directional streaming 
> > channels. gRPC isn't (yet) consumable directly in browsers, nor does 
> > Github send gRPC webhooks. BUT - there is a REST Gateway for gRPC:
> > 
> > https://github.com/grpc-ecosystem/grpc-gateway
> > 
> > that generates HTTP/1.1+JSON interfaces from the gRPC descriptions and 
> > translates between protobuf and json automatically. The "REST" interface 
> > it produces does not support url-based parameters, so everything is done 
> > in payload bodies, so it's:
> > 
> >    GET /nodes
> >    {
> >      'id': 1234
> >    }
> > 
> > rather than
> > 
> >    GET /nodes/1234
> > 
> > but that's still totally fine - and totally works for both status.json 
> > and GH webhooks.
> > 
> > The catch is - grpc-gateway is a grpc compiler plugin that generates 
> > golang code. So we'd either have to write our own plugin that does the 
> > same thing but for generating python code, or we'd have to write our 
> > gRPC/REST layer in go. I betcha folks would appreciate if we implemented 
> > the plugin for python, but that's a long tent-pole for this purpose so I 
> > don't honestly think we should consider it. Therefore, we should 
> > consider that using gRPC + gRPC-REST implies writing the web-tier in go. 
> > That obviously implies an additional process that needs to talk over an 
> > RPC bus.
> > 
> > There are clear complexity costs involved with adding a second language 
> > component, especially WRT deployment. (pip install zuul would not be 
> > sufficient) OTOH - it would open the door to using protobuf-based 
> > objects for internal communication, and would open the door for rich 
> > client apps without REST polling and also potentially nice Android apps 
> > (gRPC works great for mobile apps) I think that makes it a hard sell.
> > 
> > THAT SAID - there are only 2 things that explicitly need REST over HTTP 
> > 1.1 - thats the github webhooks and status.json. We could write 
> > everything in gRPC except those two. Browser support for gRPC is coming 
> > soon (they've moved from "someone is working on it" to "contact us about 
> > early access") so status.json could move to being pure gRPC as well ... 
> > and the webhook endpoint is pretty simple, so just having it be an 
> > in-process aiohttp handler isn't a terrible cost. So if we thought 
> > "screw it, let's just gRPC and not have an HTTP/1.1 REST interface at 
> > all" - we can stay all in python and gRPC isn't a huge cost at that
> > point.
> > 
> > gRPC doesn't handle websockets - but we could still run the gRPC serving 
> > and the websocket serving out of the same scale-out web tier.
> > 
> 
> Another data point for gRPC is that the etcd3 work in OpenStack found
> that the existing python lib(s) for grpc don't play nice with eventlet
> or asyncio or anything that isn't Thread()
> (https://github.com/grpc/grpc/issues/6046 is the bug tracking that I
> think). This would potentially make the use of asyncio elsewhere
> (websockets) more complicated.
> 
> > ** Summary
> > 
> > Based on the three above, it seems like we need to think about separate 
> > web-tier regardless of choice. The one option that doesn't strictly 
> > require a separate tier is the one that lets us align on websockets, so 
> > it seems that co-location there would be simple.
> > 
> > aiohttp seems like the cleanest forward path. It'll require reworking 
> > the autobahn code (sorry Shrews) - but is nicely aligned with our 
> > Python3 state. It's new - but it's not as totally new as gRPC is. And 
> > since we'll already have some websockets stuff, we could also write 
> > streaming websockets APIs for the things where we'd want that from gRPC.
> > 
> > * REST Framework
> > 
> > If we decide to go the WSGI route, then we need to talk REST frameworks 
> > (and it's possible we decide to go WSGI because we want to use a REST 
> > framework)
> > 
> 
> I'm not sure I understand why the WSGI and REST frameworks are being
> conflated. You can do one or the other or both and whichever you choose
> shouldn't affect the other too much aiui. There is even a flask-aiohttp
> lib.
> 
> > The assumption in this case is that the websocket layer is a separate 
> > entity.
> > 
> > There are three 'reasonable' options available:
> > 
> > - pecan
> > - flask-restplus
> > - apistar
> > 
> > pecan
> > 
> > pecan is used in a lot of OpenStack services and is also used by 
> > Storyboard, so it's well known. Tristan's patches so far use Pecan, so 
> > we've got example code.
> > 
> > On the other hand, Pecan seems to be mostly only used in OpenStack land 
> > and hasn't gotten much adoption elsewhere.
> > 
> > flask-restplus
> > 
> > https://flask-restplus.readthedocs.io/en/stable/
> > 
> > flask is extremely popular for folks doing REST in Python. 
> > flask-restplus is a flask extension that also produces Swagger Docs for 
> > the REST api, and provides for serving an interactive swagger-ui based 
> > browseable interface to the API. You can also define models using 
> > JSONSchema. Those are not needed for simple cases like status.json, but 
> > for fuller REST API might be nice.
> > 
> > Of course, in all cases we could simply document our API using swagger 
> > and get the same thing - but that does involve maintaining model/api 
> > descriptions and documentation separately.
> > 
> > apistar
> > 
> > https://github.com/tomchristie/apistar
> > 
> > apistar is BRAND NEW and was announced at this year's PyCon. It's from 
> > the Django folks and is aimed at writing REST separate from Django.
> > 
> > It's python3 from scratch - although it's SO python3 focused that it 
> > requires python 3.6. This is because it makes use of type annotations:
> 
> Type hinting is in python 3.5 and apistar's trove identifer things
> mention 3.5 support (not sure if actually the case though). But if so 
> 3.5 is far easier to use since it is in more distros than Arch and
> Tumbleweed (like with 3.6).
> 
> > 
> >    def show_request(request: http.Request):
> >        return {
> >            'method': request.method,
> >            'url': request.url,
> >            'headers': dict(request.headers)
> >        }
> > 
> >    def create_project() -> Response:
> >        data = {'name': 'new project', 'id': 123}
> >        headers = {'Location': 'http://example.com/project/123/'}
> >        return Response(data, status=201, headers=headers)
> > 
> > and f'' strings:
> > 
> >    def echo_username(username):
> >      return {'message': f'Welcome, {username}!'}
> > 
> > Python folks seem to be excited about apistar so far - but I think 
> > python 3.6 is a bridge too far - it honestly introduces more deployment 
> > issues as doing a golang-gRPC layer.
> > 
> > ** Summary
> > 
> > I don't think the REST frameworks offer enough benefit to justify their 
> > use and adopting WSGI as our path forward.
> 
> Yesterday SpamapS mentioned wanting to be able to grow the Zuul
> community. Just based on looking at the choices OpenStack is making
> (moving TO wsgi) and the general populatity of Flask in the python
> community I think that you may want to consider both wsgi and flask
> simply because they are tools that are known to scale reasonably well
> and many people are familiar with them.
> 
> > 
> > ** Thoughts on RPC Bus **
> > 
> > gearman is a simple way to add RPC calls between an API tier and the 
> > scheduler. However, we got rid of gear from nodepool already, and we 
> > intend on getting rid of gearman in v4 anyway.
> > 
> > If we use zk, we'll have to do a little bit more thinking about how to 
> > do the RPC calls which will make this take more work. BUT - it means we 
> > can define one API that covers both Zuul and Nodepool and will be 
> > forward compatible with a v4 no-gearman world.
> > 
> > We *could* use gearman in zuul and run an API in-process in nodepool. 
> > Then we could take a page out of early Nova and do a proxy-layer in zuul 
> > that makes requests of nodepool's API.
> > 
> > We could just assume that there's gonna be an Apache fronting this stuff 
> > and suggest deployment with routing to zuul and nodepool apis with 
> > mod_proxy rules.
> > 
> > Finally, as clarkb pointed out in response to the ingestors spec, we 
> > could introduce MQTT and use it. I'm wary of doing that for this because 
> > it introduces a totally required new tech stack at a late stage.
> 
> Mostly I was just pointing out that I think the vast majority of the
> infastructure work to have something like a zuul ingestor is done. You
> just have to read from an mqtt connection instead of a gerrit ssh
> connection. Granted this does require running more services (mqtt server
> and the event stream handler) and doesn't handle entities like Github.
> 
> That said MQTT unlike Gearman seems to be seeing quite a bit of
> development activity due to the popularity of IoT. Gearman has worked
> reasonably well for us though so I don't think we need to just replace
> it to get in on the IoT bandwagon.
> 
> > 
> > Since we're starting fresh, I like the idea of a single API service that 
> > RPCs to zuul and nodepool, so I like the idea of using ZK for the RPC 
> > layer. BUT - using gear and adding just gear worker threads back to 
> > nodepol wouldn't be super-terrible maybe.
> 
> Nodepool hasn't had a Gearman less release yet so you don't have to
> worry about backward compat at least.
> 
> > 
> > ** Final Summary **
> > 
> > As I tl;dr'd earlier, I think aiohttp co-located with the scale-out 
> > websocket tier talking to the scheduler over zk is the best bet for us. 
> > I think it's both simple enough to adopt and gets us a rich set of 
> > features. It also lets us implement in-process simple health endpoints 
> > on each service with the same tech stack.
> 
> I'm wary of this simply because it looks a lot like repeating
> OpenStack's (now failed) decision to stick web servers in a bunch of
> python processes then do cooperative multithreading with them along with
> all your application logic. It just gets complicated. I also think this
> underestimates the value of using tools people are familiar with (wsgi
> and flask) particularly if making it easy to jump in and building
> community is a goal.
> 
> Clark
> 
> 
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra

Open Stack

[OpenStack-Infra] On the subject of HTTP interfaces and Zuul

OpenStack

Community

Documentation

Branding & Legal