[OpenStack-Infra] On the subject of HTTP interfaces and Zuul

Monty Taylor mordred at inaugust.com
Fri Jun 9 19:12:39 UTC 2017


On 06/09/2017 12:35 PM, Clint Byrum wrote:
> Excerpts from Monty Taylor's message of 2017-06-09 11:22:03 -0500:
>> Hey all!
>>
>> Tristan has recently pushed up some patches related to providing a Web
>> Dashboard for Zuul. We have a web app for nodepool. We already have the
>> Github webhook receiver which is inbound http. There have been folks who
>> have expressed interest in adding active-REST abilities for performing
>> actions. AND we have the new websocket-based log streaming.
>>
>> We're currently using Paste for HTTP serving (which is effectively
>> dead), autobahn for websockets and WebOB for request/response processing.
>>
>> This means that before we get too far down the road, it's probably time
>> to pick how we're going to do those things in general. There are 2
>> questions on the table:
>>
>> * HTTP serving
>> * REST framework
>>
>> They may or may not be related, and one of the options on the table
>> implies an answer for both. I'm going to start with the answer I think
>> we should pick:
>>
>> *** tl;dr ***
>>
>> We should use aiohttp with no extra REST framework.
>>
> 
> +1 for this. More inline..
> 
>> Meaning:
>>
>> - aiohttp serving REST and websocket streaming in a scale-out tier
>> - talking RPC to the scheduler over gear or zk
>> - possible in-process aiohttp endpoints for k8s style health endpoints
>>
> 
> I think it's worth also discussing REST and gRPC for .. well.. RPC.

++

>> Since we're talking about a web scale-out tier that we should just have
>> a single web tier for zuul and nodepool. This continues the thinking
>> that nodepool is a component of Zuul.
>>
>> In order to write zuul jobs, end-users must know what node labels are
>> available. A zuul client that says "please get me a list of available
>> node labels" could make sense to a user. As we get more non-OpenStack
>> users, those people may not have any concept that there is a separate
>> thing called "nodepool".
>>
> 
> I have these two conflicting desires which is that I want Zuul to be
> useful as much on its own as possible, but I also don't want it to grow
> too rigid and be hard to adapt to new environments. I think infra has
> done a great job of loosely coupling nodepool and zuul without going
> ultra-pedantic about the interfaces, and that helps me satisfy both of
> these conflicting desires to some extent. I think for now, this status
> quo should continue and the thin line should still be respected.
> 
> All of that said, if there was one web tier that had a /nodepool and a
> /zuul, I'd be fine with that.

I'm can live with that, at least for now. However, I really see nodepool 
as an implementation detail of zuul, not a top-class citizen. We talk 
about "running zuul" and I imagine BonnyCI would expose the "Zuul" api 
to its users so that they could use Zuul API clients to talk to the api.

So the /nodepool endpoint just feels - 'icky' to me.

I may be totally wrong on that though.

Maybe if there was no /zuul or /nodepol but instead there were things like:

/nodes
/status
/jobs

etc - and /nodes was the nodepool api served by nodepool potentially via 
  an API proxy, and the others were all served by Zuul, that would be 
less icky feelling?

>> ** Thoughts on RPC Bus **
>>
>> gearman is a simple way to add RPC calls between an API tier and the
>> scheduler. However, we got rid of gear from nodepool already, and we
>> intend on getting rid of gearman in v4 anyway.
>>
> 
> The devil is in the details here IMO.
> 
> We kind of abused gearman to do RPC in nodepool anyway. In particular,
> we were asking gearman about status to make scheduling decisions, which
> is actually a fairly expensive thing and not something gearman was ever
> designed to do.
> 
> ZK is specifically designed to coordinate processes, which is exactly
> what we're using it for in this case.
> 
>> If we use zk, we'll have to do a little bit more thinking about how to
>> do the RPC calls which will make this take more work. BUT - it means we
>> can define one API that covers both Zuul and Nodepool and will be
>> forward compatible with a v4 no-gearman world.
>>
> 
> I don't know if ZK is a great choice for the RPCs we know we have:
> 
> * Github webhooks are async event streams, which will needlessly
>    overtax a synchronous HA ZK cluster. Gearman is built to do this one
>    well.

Yah - I think github webhooks are their own beast in a few ways - but 
well suited to being handled in a few different ways. I don't think we 
should necessarily write zk off here - but I may be wrong and I'm ok 
with that. :)

> * status.json requests would be perfect if we had a ZK already under the
>    scheduler (zuulv4). But that's a long way off.

Yup.

> * log streaming is already basically working as a websocket HTTP
>    endpoint that can be hit directly and is stateless and thus allows
>    simple loadbalancing, right?. I don't feel compelled strongly to align
>    other efforts with it since it is somewhat unique in its scope.

Yah - I don't think it has to be hard-aligned as much as if we start 
using aiohttp for other things, it seems like maybe not a great thing to 
keep around the autobahn depend just for that one thing.

>> We *could* use gearman in zuul and run an API in-process in nodepool.
>> Then we could take a page out of early Nova and do a proxy-layer in zuul
>> that makes requests of nodepool's API.
>>
>> We could just assume that there's gonna be an Apache fronting this stuff
>> and suggest deployment with routing to zuul and nodepool apis with
>> mod_proxy rules.
>>
>> Finally, as clarkb pointed out in response to the ingestors spec, we
>> could introduce MQTT and use it. I'm wary of doing that for this because
>> it introduces a totally required new tech stack at a late stage.
>>
>> Since we're starting fresh, I like the idea of a single API service that
>> RPCs to zuul and nodepool, so I like the idea of using ZK for the RPC
>> layer. BUT - using gear and adding just gear worker threads back to
>> nodepol wouldn't be super-terrible maybe.
>>
> 
> I'm inclined to ask why we don't just have a routing HTTP proxy that
> sends stuff directly to the backends of nodepool, log streaming, or
> zuul-scheduler. That proxy and the backends can all use aiohttp to serve
> responses directly and we can even consider those to be private API's
> between the routing proxy and the backends.

The reason I'm not immediately a fan of this is that it's another thing 
to run which makes the simple small-scale version harder. Like, we're 
running a really big zuul right now and don't actually need a routing 
HTTP proxy.

If we do what jeblair says in the followup and have a web tier that 
talks gearmand or zk as needed to the various backends, then it's no 
harder or more complex to run than the current system (since websockets 
is necessarily its own service)

The only two differences really would be whether we have the status.json 
endpoint be directly on the scheduler and what we do about nodepool.

Nodepool would be simple to include because the info is already in zk 
(that's all the nodepool webapp is doing anyway)

For status.json, if we have it separate, we can have the Web tier 
request the status json over gearman but with coalesce. This would 
reduce the number of CPU cycles the scheduler has to spend serving web 
requests. So the cost there is an addition gearman function, but 
shouldn't be terribly much overhead.

The informational Web dashboard TristanC wrote depends on the mysql 
reporter anyway, so there's no RPC there.

Active REST calls to the scheduler to tell it to do things, like 
"promote 12345", necessarily need to talk to the scheduler. But maybe 
your earlier point about discussing gRPC for RPC is how we should think 
about that user-case one?

I guess the question is - is there any downside to putting the things we 
know about in the same zuul-api web-api process that would be better by 
keeping the things where they are and adding the need for a proxy tier?

> And for the one case of async event ingestion where we need to accept
> things quickly and not necessarily respond with the final result, which
> is github webhooks, we can address scale problems with gearman as it
> is already built into zuul by making a zuul-webhook-worker process
> to receive, if necessary persist, and then push the events into
> zuul-scheduler synchronously as fast as it will take them.
> 
> Funny story, there's a ready-made dynamic HTTP proxy application out
> there we could use if we were java-masochists:
> 
> https://github.com/Netflix/zuul
> 
> But in all seriousness, a straight forward zuul-api that just knows
> to map:
> 
> /zuul/status/ -> zuul-scheduler.example.com/status
> /zuul/log/review.o.o/989811 -> zuul-websocket.example.com/review.o.o/989811
> /nodepool/baz -> nodepool.example.com
> /zuul/webhook/f890ab934185 -> {gearman}webhook
> 
> Seems like a "now" solution that would work well for the scaled down/in
> version and scale up/out as much as zuul's current design allows.



More information about the OpenStack-Infra mailing list