[placement][ptg] code refactoring constraints and goals
From the etherpad [1]
* We got a lot of value out of the ovo-ectomy and de-list. We're now at a stage where there is plenty more that could be done, but the choices are more subject to personal preference. * How (and should) we further split and shrink objects/resource_provider.py We got a lot of value out of the big refactoring the removed OVO and split objects/resource_provider.py into smaller files. It's also a stated goal that we should continue refactoring and continuously revisit existing code. However, we started to stumble because we had some disagreements on how to proceed. We should resolve those disagreements so we can proceed (he said, obviously). Questions that have come up (plus other ones I can think off the top of my head): * Is it worth reordering methods to some kind of arbitrary ordering? * If a method or module is "long" (by some definition) does that make it bad? * Where is the balance between backport hurting churn and keeping the mix fresh? * Should there be a more cleanly defined boundary between the methods that marshal data (the top-level methods in objects/*.py) and those that talk to the persistence layer (also in objects/*.py)? * Do we need to define (better) or document the layers/boundaries in the system? * How can we encourage a similar refactor/revisit-often attitude to the docs? There are plenty of other refactoring related thoughts too. Have at. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
On 4/8/19 10:43 AM, Chris Dent wrote:
* How (and should) we further split and shrink objects/resource_provider.py
To be specific here: objects/resource_provider.py is currently ~2KLOC. That's not *horrible* (see below), and if it really didn't lend itself to being further subdivided, I would say we shouldn't force it just for the sake of making that number smaller. However, IMO, this file logically deals with several things: * Resource provider inventory/capacity/usage * Resource provider membership in aggregates * Traits on resource providers * Creating, updating, and deleting resource providers * Finding resource providers in various ways If you look at the structure of the file (in its current order - see below) these groupings are pretty clear. Note the presence of "resource provider" in each bullet above. We already have separate modules for dealing with e.g. CRUD of traits in their own right. Though the mapping isn't 1:1 with the above list. So we have the following options, I think: (1) Do nothing. Leave objects/resource_provider.py as is. (2) Move X-related stuff into the base X.py module, so like move things related to "Traits on resource providers" into objects/trait.py. This level of shuffle would probably entail things like creating a new module for aggregate.py (3) Split the above into separate modules while preserving their resource_provider-ness. From the perspective of code consuming the modules, this would look like: from placement.objects import resource_provider as rp_crud from placement.objects.resource_provider import trait as rp_trait from placement.objects.resource_provider import search as rp_search from placement.objects import trait ... # Do CRUD-y stuff with resource providers rp = rp_crud.ResourceProvider(...) rp.create() # Do trait stuff t = trait.Trait(...) t.create() # Do trait-y stuff with resource providers rp_trait.add_traits_to_provider(rp, [t]) # Do search-y stuff with resource providers rp_search.get_all_by_filters(...) To me, this is nice and intuitive. The issue is the module hierarchy. As written above, it would entail putting the CRUD-y stuff into objects/resource_provider/__init__.py, which some folks don't like. I suppose we could have objects/resource_provider/crud.py, but that doesn't match the pattern for other things like objects/trait.py Discuss.
* Is it worth reordering methods to some kind of arbitrary ordering?
My preference on this: keep related methods/classes/etc. near each other in the file. Pros: * Intuitive. * Increases the probability that, when I'm looking at a method that calls another method, both methods will fit on my screen at the same time. (Workaround: multiple editor windows to the same file.) Cons: * Sometimes there's no sensible grouping, which makes this hard to define and therefore hard to follow (and hard to enforce, if that's a thing we want to do). * Ability to know exactly where to look to find something. (Workaround: Use a real IDE, where you can do a project-wide search for a symbol, hotkey from a symbol's use to its definition, etc. Also you can choose to view a module's skeleton in its real order *or* alphabetically.)
* If a method or module is "long" (by some definition) does that make it bad?
Not to the point where we should do something like implement sonar. Having had experience with that, it just ain't worth the hassle. However, there is such a thing as too long. Certain nova libvirt modules come to mind. For those of us who do choose to use real IDEs, these can take a long time to process (in-band syntax checking, symbol correlation highlighting, git blame, etc.). I'm going to go out on a limb and say we should strive really hard never to have a module in the 5-digit LOCs. And it would be really nice to stay under 2KLOC, but only if it doesn't feel forced.
* Where is the balance between backport hurting churn and keeping the mix fresh?
I'm not sure "fresh" is the heaviest hitter on the plus side. Making the code organized and crisp and usable (from a developer standpoint) and maintainable is the main thing to me. On the minus side, in addition to backport pain, there's the difficulty tracking down the change that did a thing. Run git blame today and it looks like Chris wrote virtually the whole thing in a small number of massive monolithic chunks. You have to use bisect or similar to track something down. To me, this is the biggest pain point. So I think we should strive to get to a point where the code structure is "good enough" and then stop doing huge refactors. That way, as we move forward, changes will accumulate on that base and things will look more "normal" history-wise.
* Should there be a more cleanly defined boundary between the methods that marshal data (the top-level methods in objects/*.py) and those that talk to the persistence layer (also in objects/*.py)?
* Do we need to define (better) or document the layers/boundaries in the system?
Sure, if we can find a fairly clean split. I don't have this enough in my head to volunteer to do it, but I'll review it if someone else does :)
* How can we encourage a similar refactor/revisit-often attitude to the docs?
Fostering the culture that way, as we're doing. Bringing it up periodically. Having a "doc sweep day" where members of the/a (sub)team volunteer to each read a chunk of doc/code thoroughly? efried
On Mon, 8 Apr 2019, Eric Fried wrote:
* Where is the balance between backport hurting churn and keeping the mix fresh?
I'm not sure "fresh" is the heaviest hitter on the plus side. Making the code organized and crisp and usable (from a developer standpoint) and maintainable is the main thing to me.
Will come back to the rest of the message (summary: lgtm) another time, but I wanted to clarify "fresh": In my experience, the community that surrounds code is better in a variety of ways when _all_ the code managed by that community is visited regularly. One of the easiest and most rewarding ways to do that visiting is to be intentional about doing regular refactoring. Again, in my experience, this process keeps everybody (old and new) up to date on what's going on and regularly finds bugs, typos and inefficiencies in the code and misapprehensions in the community. All of which probably equates to "organized and crisp and usable" but is more explicit about the mechanics that get it there and the people surrounding it. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
On 04/08/2019 01:05 PM, Eric Fried wrote:
On 4/8/19 10:43 AM, Chris Dent wrote:
* How (and should) we further split and shrink objects/resource_provider.py
To be specific here:
objects/resource_provider.py is currently ~2KLOC. That's not *horrible* (see below), and if it really didn't lend itself to being further subdivided, I would say we shouldn't force it just for the sake of making that number smaller.
However, IMO, this file logically deals with several things:
* Resource provider inventory/capacity/usage * Resource provider membership in aggregates * Traits on resource providers * Creating, updating, and deleting resource providers * Finding resource providers in various ways
If you look at the structure of the file (in its current order - see below) these groupings are pretty clear.
Note the presence of "resource provider" in each bullet above. We already have separate modules for dealing with e.g. CRUD of traits in their own right. Though the mapping isn't 1:1 with the above list.
So we have the following options, I think:
(1) Do nothing. Leave objects/resource_provider.py as is. (2) Move X-related stuff into the base X.py module, so like move things related to "Traits on resource providers" into objects/trait.py. This level of shuffle would probably entail things like creating a new module for aggregate.py (3) Split the above into separate modules while preserving their resource_provider-ness. From the perspective of code consuming the modules, this would look like:
from placement.objects import resource_provider as rp_crud from placement.objects.resource_provider import trait as rp_trait from placement.objects.resource_provider import search as rp_search from placement.objects import trait ... # Do CRUD-y stuff with resource providers rp = rp_crud.ResourceProvider(...) rp.create() # Do trait stuff t = trait.Trait(...) t.create() # Do trait-y stuff with resource providers rp_trait.add_traits_to_provider(rp, [t]) # Do search-y stuff with resource providers rp_search.get_all_by_filters(...)
To me, this is nice and intuitive.
Meh :) I actually think option 1 (Do nothing) is best here. We went through a bunch of churn for the (excellent) refactoring for OVO-ectomy and splitting the resource_provider.py file out into the other related object/*.py files. I don't really see a huge value in further splitting the resource provider functions out into separate modules with a /placement/objects/resource_provider/ module. I actually think this version of the above code is more readable: from placement.objects import resource_provider from placement.objects import trait ... # Do CRUD-y stuff with resource providers rp = resource_provider.ResourceProvider(...) rp.create() # Do trait stuff t = trait.Trait(...) t.create() # Do trait-y stuff with resource providers rp.add_traits([t]) # Do search-y stuff with resource providers resource_provider.get_all_by_filters(...) Note that the "add_traits()" is a method on the ResourceProvider object in the above. I think this is more readable for create/update/delete operations. For read operations, I prefer a consistently-named set of get_by_XXX and get_all_by_XXX methods in module-level scope as opposed to @classmethods on an object.
* Is it worth reordering methods to some kind of arbitrary ordering?
My preference on this: keep related methods/classes/etc. near each other in the file.
Pros: * Intuitive. * Increases the probability that, when I'm looking at a method that calls another method, both methods will fit on my screen at the same time. (Workaround: multiple editor windows to the same file.)
Cons: * Sometimes there's no sensible grouping, which makes this hard to define and therefore hard to follow (and hard to enforce, if that's a thing we want to do). * Ability to know exactly where to look to find something. (Workaround: Use a real IDE, where you can do a project-wide search for a symbol, hotkey from a symbol's use to its definition, etc. Also you can choose to view a module's skeleton in its real order *or* alphabetically.)
My preference is to have related methods grouped together as well, but I don't consider this a priority work item.
* If a method or module is "long" (by some definition) does that make it bad?
Not to the point where we should do something like implement sonar. Having had experience with that, it just ain't worth the hassle. However, there is such a thing as too long. Certain nova libvirt modules come to mind. For those of us who do choose to use real IDEs, these can take a long time to process (in-band syntax checking, symbol correlation highlighting, git blame, etc.). I'm going to go out on a limb and say we should strive really hard never to have a module in the 5-digit LOCs. And it would be really nice to stay under 2KLOC, but only if it doesn't feel forced.
Agreed.
* Where is the balance between backport hurting churn and keeping the mix fresh?
I'm not sure "fresh" is the heaviest hitter on the plus side. Making the code organized and crisp and usable (from a developer standpoint) and maintainable is the main thing to me.
On the minus side, in addition to backport pain, there's the difficulty tracking down the change that did a thing. Run git blame today and it looks like Chris wrote virtually the whole thing in a small number of massive monolithic chunks. You have to use bisect or similar to track something down. To me, this is the biggest pain point.
So I think we should strive to get to a point where the code structure is "good enough" and then stop doing huge refactors. That way, as we move forward, changes will accumulate on that base and things will look more "normal" history-wise.
Agreed.
* Should there be a more cleanly defined boundary between the methods that marshal data (the top-level methods in objects/*.py) and those that talk to the persistence layer (also in objects/*.py)?
* Do we need to define (better) or document the layers/boundaries in the system?
Sure, if we can find a fairly clean split. I don't have this enough in my head to volunteer to do it, but I'll review it if someone else does :)
See above. My preference is to have create/update/delete operations be normal object methods and get operations be module-level functions with consistent names. No @classmethods or @staticmethods would be my preference. Best, -jay
Meh :) I actually think option 1 (Do nothing) is best here.
I can live with that. But I also like what you suggested below, which is not "do nothing":
from placement.objects import resource_provider from placement.objects import trait ... # Do CRUD-y stuff with resource providers rp = resource_provider.ResourceProvider(...) rp.create() # Do trait stuff t = trait.Trait(...) t.create() # Do trait-y stuff with resource providers rp.add_traits([t]) # Do search-y stuff with resource providers resource_provider.get_all_by_filters(...)
Note that the "add_traits()" is a method on the ResourceProvider object in the above. I think this is more readable for create/update/delete operations.
This suggests a change whereby we move a bunch of module-level methods to instance methods. I actually really like that idea. So for example: @db_api.placement_context_manager.writer def _set_traits(context, rp, traits): ... becomes: class ResourceProvider(object): ... @db_api.placement_context_manager.writer def _set_traits(self, traits): ... # the artist formerly known as 'rp' is now 'self' # and context is self._context How do folks feel about us making this change? efried .
On 04/10/2019 07:07 PM, Eric Fried wrote:
Meh :) I actually think option 1 (Do nothing) is best here.
I can live with that. But I also like what you suggested below, which is not "do nothing":
from placement.objects import resource_provider from placement.objects import trait ... # Do CRUD-y stuff with resource providers rp = resource_provider.ResourceProvider(...) rp.create() # Do trait stuff t = trait.Trait(...) t.create() # Do trait-y stuff with resource providers rp.add_traits([t]) # Do search-y stuff with resource providers resource_provider.get_all_by_filters(...)
Note that the "add_traits()" is a method on the ResourceProvider object in the above. I think this is more readable for create/update/delete operations.
This suggests a change whereby we move a bunch of module-level methods to instance methods. I actually really like that idea. So for example:
@db_api.placement_context_manager.writer def _set_traits(context, rp, traits): ...
becomes:
class ResourceProvider(object): ... @db_api.placement_context_manager.writer def _set_traits(self, traits): ... # the artist formerly known as 'rp' is now 'self' # and context is self._context
How do folks feel about us making this change?
++ -jay
While exploring
For a variety of reasons, most of which would be obvious given my previous enthusiasm for exploring alternatives, I’ve proposed a story for refactoring out the DB-specific code from the rest of placement:
https://storyboard.openstack.org/#!/story/2005435
I don’t know what everyone’s preference would be for where the DB code would go: either in private methods of the original module, or in a separate DB module, so I added as the first task coming to a decision on how to make that split.
, I noticed something that will make it a bit icky. Some of the db methods (e.g. ResourceProvider._create_in_db) are true instance methods: they read and change instance attributes. If we move forward with the suggested refactor, we could solve this in a couple of non-great ways: (1) Pass `self` to the db method. This grates on me pretty bad. It screams that the thing should be an instance method. (2) Pass in the attributes that are needed by the method, and return the ones that the caller needs to use to update itself. This may make for some ugly call/return signatures. IMO (2) is the lesser evil. Thoughts? efried .
On Apr 11, 2019, at 12:07 PM, Eric Fried <openstack@fried.cc> wrote:
(1) Pass `self` to the db method. This grates on me pretty bad. It screams that the thing should be an instance method.
That is actually a pretty old design pattern: passing a data object, rather than a slew of parameters. Of course, it would be cleaner to keep them as instance methods, and that would be my preference. I know that you have expressed an interest in reducing the overall size of the individual modules, so this option is only valid if smaller modules is decided on as a Good Thing™,
(2) Pass in the attributes that are needed by the method, and return the ones that the caller needs to use to update itself. This may make for some ugly call/return signatures.
I can’t think of a more brittle approach. -- Ed Leafe
(1) Pass `self` to the db method. This grates on me pretty bad. It screams that the thing should be an instance method.
That is actually a pretty old design pattern: passing a data object, rather than a slew of parameters. Of course, it would be cleaner to keep them as instance methods, and that would be my preference. I know that you have expressed an interest in reducing the overall size of the individual modules, so this option is only valid if smaller modules is decided on as a Good Thing™,
Smaller modules would be nice, but I'm pretty soft on that since earlier in this thread (though I wouldn't want resource_provider.py getting any bigger than it is today). I thought the motivator here was separation/isolation of the db-ish methods into their own space. Looking at resource_provider.py specifically, many of the existing module-level @db_api methods today accept a ResourceProvider as a parameter. So for the sake of consistency, we should either make those into instance methods, or move all the existing @db_api instance methods to the module level (be it this module or a different one). Which I think is more or less where we started. I was ready to be happy with the idea of moving the @db_api methods into their own space until I saw that we would need to pass `self` around. If there's a consensus that that's okay, I'll get over it (and maybe even do the work). Jay, you had expressed in favor of the other suggestion (group methods "intuitively" in the ResourceProvider class, even @db_api ones) before we really opened up this one (move @db_api methods into their own space, even though that means passing `self` to them). What's your preference at this point? efried .
On Wed, 10 Apr 2019, Eric Fried wrote:
This suggests a change whereby we move a bunch of module-level methods to instance methods. I actually really like that idea. So for example:
@db_api.placement_context_manager.writer def _set_traits(context, rp, traits): ...
becomes:
class ResourceProvider(object): ... @db_api.placement_context_manager.writer def _set_traits(self, traits): ... # the artist formerly known as 'rp' is now 'self' # and context is self._context
How do folks feel about us making this change?
In the absence of other ideas I'm fine with this because of what I said before: I care most that the code get visited often. My main reservation with this suggestion is that it moves some things in the opposite direction from what I would prefer: I'd like to see (file-based) separation between those methods which are `db_api.placement_context_manager` and those that are not. Or, in other terms: sql generation in other files. This would allow us to have what amounts to three tiers: * web stuff * mumble (might call it controller, but that's not quite right) * sql stuff which I feel provides good contextual cues and affords some opportunities for more robust testing of the sql generation. However, it might be more pain than it is worth as we currently have quite a bit of set() management mixed with sql management. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
On Apr 11, 2019, at 7:55 AM, Chris Dent <cdent+os@anticdent.org> wrote:
My main reservation with this suggestion is that it moves some things in the opposite direction from what I would prefer: I'd like to see (file-based) separation between those methods which are `db_api.placement_context_manager` and those that are not. Or, in other terms: sql generation in other files.
This would allow us to have what amounts to three tiers:
* web stuff * mumble (might call it controller, but that's not quite right) * sql stuff
which I feel provides good contextual cues and affords some opportunities for more robust testing of the sql generation.
However, it might be more pain than it is worth as we currently have quite a bit of set() management mixed with sql management.
For a variety of reasons, most of which would be obvious given my previous enthusiasm for exploring alternatives, I’ve proposed a story for refactoring out the DB-specific code from the rest of placement: https://storyboard.openstack.org/#!/story/2005435 I don’t know what everyone’s preference would be for where the DB code would go: either in private methods of the original module, or in a separate DB module, so I added as the first task coming to a decision on how to make that split. -- Ed Leafe
I was actually starting to explore
This suggests a change whereby we move a bunch of module-level methods to instance methods. I actually really like that idea. So for example:
@db_api.placement_context_manager.writer def _set_traits(context, rp, traits): ...
becomes:
class ResourceProvider(object): ... @db_api.placement_context_manager.writer def _set_traits(self, traits): ... # the artist formerly known as 'rp' is now 'self' # and context is self._context
and
For a variety of reasons, most of which would be obvious given my previous enthusiasm for exploring alternatives, I’ve proposed a story for refactoring out the DB-specific code from the rest of placement:
https://storyboard.openstack.org/#!/story/2005435
I don’t know what everyone’s preference would be for where the DB code would go: either in private methods of the original module, or in a separate DB module, so I added as the first task coming to a decision on how to make that split.
, planning to propose them both as parallel changes/series so we could see how they look IRL. In the process, I discovered a remaining @classmethod in resource_provider.py, which I figured should be taken care of first, so I moved it. And then looked for other @classmethodZ, because. And ended up with [1], which removes all @classmethodZ from the project, in keeping with what seems to be the majority's desire. I'm going to fork another response to this thread to report on some stuff I discovered on the original two proposals... efried [1] https://review.openstack.org/#/q/status:open+project:openstack/placement+bra...
participants (4)
-
Chris Dent
-
Ed Leafe
-
Eric Fried
-
Jay Pipes