[openstack-dev] [oslo] Add a new aiogreen executor for Oslo Messaging

Mike Bayer mbayer at redhat.com
Mon Nov 24 00:21:01 UTC 2014

> On Nov 23, 2014, at 6:13 PM, Robert Collins <robertc at robertcollins.net> wrote:
> So - the technical bits of the plan sound fine.

> On WSGI - if we're in an asyncio world,

*looks around*, we are?   when did that happen?    Assuming we’re talking explicit async.     Rewriting all our code as verbose, “inside out” code, vast library incompatibility, and…some notion of “correctness” that somehow is supposed to be appropriate for a high level scripting language and can’t be achieved though simple, automated means such as gevent.

> I don't think WSGI has any
> relevance today -

if you want async + wsgi, use gevent.wsgi.       It is of course not explicit async but if the whole world decides that we all have to explicitly turn all of our code inside out to appease the concept of “oh no, IO IS ABOUT TO HAPPEN! ARE WE READY! ”,  I am definitely quitting programming to become a cheese maker.   If you’re writing some high performance TCP server thing, fine (…but... why are you writing a high performance server in Python and not something more appropriate like Go?).  If we’re dealing with message queues as I know this thread is about, fine.

But if you’re writing “receive a request, load some data, change some of it around, store it again, and return a result”, I don’t see why this has to be intentionally complicated.   Use implicit async that can interact with the explicit async messaging stuff appropriately.   That’s purportedly one of the goals of asyncIO (which Nick Coghlan had to lobby pretty hard for; source: http://python-notes.curiousefficiency.org/en/latest/pep_ideas/async_programming.html#gevent-and-pep-3156  ).

> it has no async programming model.

neither do a *lot* of things, including all traditional ORMs.    I’m fine with Ceilometer dropping SQLAlchemy support as they prefer MongoDB and their relational database code is fairly wanting.   Per http://aiogreen.readthedocs.org/openstack.html, I’m not sure how else they will drop eventlet support throughout the entire app.   

> While is has
> incremental apis and supports generators, thats not close enough to
> the same thing: so we're going to have to port our glue code to
> whatever container we end up with. As you know I'm pushing on a revamp
> of WSGI right now, and I'd be delighted to help put together a
> WSGI-for-asyncio PEP, but I think its best thought of as a separate
> thing to WSGI per se.

given the push for explicit async, seems like lots of effort will need to be spent on this. 

> It might be a profile of WSGI2 though, since
> there is quite some interest in truely async models.
> However I've a bigger picture concern. OpenStack only relatively
> recently switched away from an explicit async model (Twisted) to
> eventlet.

hooray.   efficient database access for explicit async code would be impossible otherwise as there are no explicit async APIs to MySQL, and only one for Postgresql which is extremely difficult to support.

> I'm worried that this is switching back to something we switched away
> from (in that Twisted and asyncio have much more in common than either
> Twisted and eventlet w/magic, or asyncio and eventlet w/magic).

In the C programming world, when you want to do something as simple as create a list of records, it’s not so simple: you have to explicitly declare memory using malloc(), and organize your program skillfully and carefully such that this memory is ultimately freed using free().   It’s tedious and error prone.   So in the scripting language world, these tedious, low level and entirely predictable steps are automated away for us; memory is declared automatically, and freed automatically.  Even reference cycles are cleaned out for us without us even being aware.  This is why we use “scripting languages” - they are intentionally automated to speed the pace of development and produce code that is far less verbose than low-level C code and much less prone to low-level errors, albeit considerably less efficient.   It’s the payoff we make; predictable bookkeeping of the system’s resources are automated away.    There’s a price; the Python interpreter uses a ton of memory and tends to not free memory once large chunks of it have been used by the application.   The implicit allocation and freeing of memory has a huge tradeoff, in that the Python interpreter uses lots of memory pretty quickly.  However, this tradeoff, Python’s clearly inefficient use of memory because it’s automating the management of it away for us, is one which nobody seems to mind at all.   

But when it comes to IO, the implicit allocation of IO and deferment of execution done by gevent has no side effect anywhere near as harmful as the Python interpreter’s huge memory consumption.  Yet we are so afraid of it, so frightened that our code…written in a *high level scripting language*, might not be “correct”.  We might not know that IO is about to happen!   How is this different from the much more tangible and day-to-day issue of, we might not know this data structure is taking up a crapload of memory, and taking a ton of time to allocate and free it?    

Given that, I’ve yet to understand why a system that implicitly defers CPU use when a routine encounters IO, deferring to other routines, is relegated to the realm of “magic”.   Is Python reference counting and garbage collection “magic”?    How can I be sure that my program is only declaring memory, only as much as I expect, and then freeing it only when I absolutely say so, the way async advocates seem to be about IO?   Why would a high level scripting language enforce this level of low-level bookkeeping of IO calls as explicit, when it is 100% predictable and automatable ?

I often wonder if the appeal of explicit async IO is partially driven by misunderstandings of how computers work.  Here’s a reddit commenter just today, who thinks that because Flask doesn’t use explicit async, it therefore “cannot use multiple cores” (clearly incorrect) and therefore “database access will be 4-8 times slower” http://www.reddit.com/r/Python/comments/2n4tes/does_sqlalchemy_scale_well_with_increased_web/cmalj3q.    Everytime I look for arguments in favor of explicit async, this is what I find - I’ve yet to find an actual argument other than….”gevent is terrible magic!!”,  what is so terrible about the “incorrectness” of using a tools like gevent to manage IO/task deferment automatically, and why other forms of “magic!” like automatic garbage collection are so taken for granted, when their real-world effects are actually much worse.  

> If Twisted was unacceptable to the community, what makes asyncio
> acceptable?

it’s new and modern, and is pushed in a PEP that Guido is very interested in.   The rumor mill also grumbles that the sudden popularity of node.js was a factor in this change of direction.    As long as it integrates with code that is fundamentally reliant upon implicit IO (e.g. gevent / eventlet / other tie in), I am fine with it.

> [Note, I don't really understand why Twisted was moved
> away from, since our problem domain is such a great fit for reactor
> style programming - lots of networking, lots of calling of processes
> that may take some time to complete their work, and occasional DB
> calls [which are equally problematic in eventlet and in
> asyncio/Twisted]. So I'm not arguing against the move, I'm just
> concerned that doing it without addressing whatever the underlying
> thing was, will fail - and I'm also concerned that it will surprise
> folk - since there doesn't seem to be a cross project blueprint
> talking about this fairly fundamental shift in programming model.
> -Rob
> -- 
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologistste
> HP Converged Cloud
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list