On 2/9/21 6:59 PM, Dan Smith wrote:
This seemed like a good time to finally revisit https://review.opendev.org/c/openstack/devstack/+/676016 (the OSC as a service patch). Turns out it wasn't as much work to reimplement as I had expected, but hopefully this version addresses the concerns with the old one.
In my local env it takes about 3:45 off my devstack run. Not a huge amount by itself, but multiplied by thousands of jobs it could be significant.
I messed with doing this myself, I wish I had seen yours first. I never really got it to be stable enough to consider it usable because of how many places in devstack we use the return code of an osc command. I could get it to trivially work, but re-stacks and other behaviors weren't quite right. Looks like maybe your version does that properly?
It seems to. I had an issue at one point when I wasn't shutting down the systemd service during unstack, but I haven't seen any problems since I fixed that. I've done quite a few devstack runs on the same node with no failures.
Anyway, I moved on to a full parallelization of devstack, which largely lets me run all the non-dependent osc commands in parallel, in addition to all kinds of other stuff (like db syncs and various project setup). So far, that effort is giving me about a 60% performance improvement over baseline, and I can do a minimal stack on my local machine in about five minutes:
Ah, that's nice. The speedup from the parallel execution series alone was pretty comparable to just the client service in my (old and slow) env.
I think we've largely got agreement to get that merged at this point, which as you say, will definitely make some significant improvements purely because of how many times we do that in a day. If your OaaS can support parallel requests, I'd definitely be interested in pursuing that on top, although I think I've largely squeezed out the startup delay we see when we run like eight osc instances in parallel during keystone setup :)
Surprisingly, it does seem to work. I suspect it serializes handling the multiple client calls, but it works and is still faster than just the parallel patch alone (again, in my env). The client service took about a minute off the parallel runtime. Here's the timing I see locally: Vanilla devstack: 775 Client service alone: 529 Parallel execution: 527 Parallel client service: 465 Most of the difference between the last two is shorter async_wait times because the deployment steps are taking less time. So not quite as much as before, but still a decent increase in speed.