[openstack-dev] Kilo Cycle Goals Exercise
Mike Bayer
mbayer at redhat.com
Mon Sep 8 15:48:33 UTC 2014
On Sep 7, 2014, at 8:14 PM, Monty Taylor <mordred at inaugust.com> wrote:
>
>
> 2. Less features, more win
>
> In a perfect world, I'd argue that we should merge exactly zero new features in all of kilo, and instead focus on making the ones we have work well. Some of making the ones we have work well may wind up feeling just like writing features, as I imagine some of our features are probably only half features in the first place.
>
> 3. Deleting things
>
> We should delete a bunch of code. Deleting code is fun, and it makes your life better, because it means you have less code. So we should start doing it. In particular, we should look for places where we wrote something as part of OpenStack because the python community did not have a thing already, but now there is one. In those cases, we should delete ours and use theirs. Or we should contribute to theirs if it's not quite good enough yet. Or we should figure out how to make more of the oslo libraries things that can truly target non-OpenStack things.
>
I have to agree that “Deleting things” is the best, best thing. Anytime you can refactor around things and delete more code, a weight is lifted, your code becomes easier to understand, maintain, and expand upon. Simpler code then gives way to refactorings that you couldn’t even see earlier, and sometimes you can even get a big performance boost once a bunch of supporting code now reveals itself to be superfluous. This is most critical for Openstack as Openstack is written in Python, and for as long as we have to stay on the cPython interpreter, number of function calls is directly proportional to how slow something is. Function calls are enormously expensive in Python.
Something that helps greatly with the goal of “Deleting things” is to reduce dependencies between systems. In SQLAlchemy, the kind of change I’m usually striving for is one where we take a module that does one Main Thing, but then has a bunch of code spread throughout it to do some Other Thing, that is really much less important, but complicates the Main Thing. What we do is reorganize the crap out of it and get the Other Thing out of the core Main Thing, move it out to a totally optional “extension” module that bothers noone, and we essentially forget about it because nobody ever uses it (examples include http://docs.sqlalchemy.org/en/rel_0_9/changelog/migration_08.html#instrumentationmanager-and-alternate-class-instrumentation-is-now-an-extension, http://docs.sqlalchemy.org/en/rel_0_9/changelog/migration_08.html#mutabletype). When we make these kinds of changes, major performance enhancements come right in - the Main Thing no longer has to worry about those switches and left turns introduced by the Other Thing, and tons of superfluous logic can be thrown away. SQLAlchemy’s architecture gains from these kinds of changes with every major release and 1.0 is no exception.
This is not quite the same as “Deleting things” but it has more or less the same effect; you isolate code that everyone uses from code that only some people occasionally use. In SQLAlchemy specifically, we have the issue of individual database dialects that are still packaged along; e.g. there is sqlalchemy.dialects.mysql, sqlalchemy.dialects.postgresql, etc. However, a few years back I went through a lot of effort to modernize the system by which users can provide their own database backends; while you can not only provide your own custom backend using setuptools entry points, I also made a major reorganization of SQLAlchemy’s test suite to produce the “dialect test suite”; so that when you write your custom dialect, you can actually run a large, pre-fabricated test suite out of SQLAlchemy’s core against your dialect, without the need for your dialect to be actually *in* SQLAlchemy. There were many wins from this system, including that it forced me to write lots of tests that were very well focused on testing specifically what a dialect needs to do, in isolation of anything SQLAlchemy itself needs to do. It allowed a whole batch of new third party dialects like that for Amazon Redshift, FoundationDB, MonetDB, and also was a huge boon to IBM’s DB2 driver who I helped to get onto the new system. And since then I’ve been able to go into SQLAlchemy and dump out lots of old dialects that are much better off being maintained separately, at a different level of velocity and hopefully by individual contributors who are interested in them, like MS Access, Informix, MaxDB, and Drizzle. Having all these dialects in one big codebase only served as a weight on the project, and theoretically it wouldn’t be a bad idea for SQLA to have *all* dialects as separate projects, but we’re not there yet.
The only reason I’m rambling on about a SQLAlchemy’s Core/Dialect dichotomy is just that I was very much *reminded* of it by the thread regarding Nova and the various “virt” drivers. I know nothing about that issue as I am totally new to Openstack itself; but I was surprised that it has such an architecture where any number of drivers that only a small percentage of the user base cares about are all packaged into one giant codebase. If the rationale for this is that the API to which these drivers speak to is just too unstable, and changes too frequently such that all the consuming drivers need to be right there so they can all be changed en-masse, that sounds like a major architectural issue to address. But I only say this as a total outsider to that process; I don’t know if this issue is addressable in Kilo, or if it’s just not time yet.
It’s hard for me to speak on goals for Kilo overall as I just got here, and so far my goal has been just to improve the experience that all Openstack developers have with SQLAlchemy. From my POV, I’d like if everyone working on code to just get a little more comfortable with real profiling of perceived speed issues. I’ve had to handle, from multiple fronts (at least three distinct conversations with different groups) the perception of performance of various components being evaluated in terms of testing very isolated bits of database logic. This happens because it’s very easy to write a three line set of query code that illustrates a particular query running differently in different scenarios. But you aren’t going to get a picture of where that small bit of measurement fits in unless you really profile a full use case from end-to-end. The MySQL query that runs 2x as fast as an ORM query might not matter much if it’s a total of 5% of the actual overhead (see http://paste.openstack.org/show/104991/ for an example of this).
I use the Python profiling tools quite extensively, and while I understand that Openstack makes this challenging due to the use of eventlet, we should be building comprehensive ways to really measure performance at a macro scale. I understand that there are tools under development which seek to show this, such as Rally https://github.com/stackforge/rally and OSProfiler https://github.com/stackforge/osprofiler. I haven’t worked with these products, but if one of the goals of Kilo could be to help reduce the gap between perceived performance of an operation vs. the ability to acquire actual measurement at the Python function call level as applies to Openstack projects, that would be enormously helpful. In my world, trying to improve the speed of something does not begin until I can get a Runsnake profile of the entire operation (see http://techspot.zzzeek.org/2010/12/12/a-tale-of-three-profiles/ for an example). Dramatic pro-performance decisions should absolutely be made, and they can be made with great confidence if you’ve done a real profile of the performance situation as a whole.
More information about the OpenStack-dev
mailing list