On Wed, Dec 5, 2018 at 9:18 AM Doug Hellmann <doug@doughellmann.com> wrote:
Mike Bayer <mike_mp@zzzcomputing.com> writes:
On Tue, Dec 4, 2018 at 11:42 AM Ben Nemec <openstack@nemebean.com> wrote:
Copying Mike Bayer since he's our resident DB expert. One more comment inline.
so the level of abstraction oslo.db itself provides is fairly light - it steps in for the initial configuration of the database engine, for the job of reworking exceptions into something more locallized, and then for supplying a basic transactional begin/commit pattern that includes concepts that openstack uses a lot. it also has some helpers for things like special datatypes, test frameworks, and stuff like that.
That is, oslo.db is not a full blown "abstraction" layer, it exposes the SQLAlchemy API which is then where you have the major level of abstraction.
Given that, making oslo.db do for etcd3 what it does for SQLAlchemy would be an appropriate place for such a thing. It would be all new code and not really have much overlap with anything that's there right now, but still would be feasible at least at the level of, "get a handle to etcd3, here's the basic persistence / query pattern we use with it, here's a test framework that will allow test suites to use it".
If there's no real overlap, it sounds like maybe a new (or at least different, see below) library would be more appropriate. That would let the authors/reviewers focus on whatever configuration abstraction we need for etcd3, and not worry about the relational database stuff in oslo.db now.
OK, my opinion on that is informed by how oslo.db is organized; in that it has no relational database concepts in the base, which are instead local to oslo_db.sqlalchemy. It originally intended to be abstraction for "databases" in general. There may be some value sharing some concepts across relational and key/value databases, to the extent they are used as the primary data storage service for an application and not just a cache, although this may not be practical right now and we might consider oslo_db to just be slightly mis-named.
At the level of actually reading and writing data to etcd3 as well as querying, that's a bigger task, and certainly that is not a SQLAlchemy thing either. If etcd3's interface is a simple enough "get" / "put" / "query" and then some occasional special operations, those kinds of abstraction APIs are often not too terrible to write.
There are a zillion client libraries for etcd already. Let's see which one has the most momentum, and use that.
Right, but I'm not talking about client libraries I'm talking about an abstraction layer. So that the openstack app that talks to etcd3 and tomorrow might want to talk to FoundationDB wouldn't have to rip all the code out entirely. or more immediately, when the library that has the "most momentum" no longer does, and we need to switch. Openstack's switch from MySQL-python to pymysql is a great example of this, as well as the switch of memcached drivers from python-memcached to pymemcached. consumers of oslo libraries should only have to change a configuration string for changes like this, not any imports or calling conventions. Googling around I'm not seeing much that does this other than dogpile.cache and a few small projects that don't look very polished. This is probably because it's sort of trivial to make a basic one and then sort of hard to expose vendor-specific features once you've done so. but still IMO worthwhile.
Also note that we have a key/value database interface right now in oslo.cache which uses dogpile.cache against both memcached and redis right now. If you really only needed put/get with etcd3, it could do that also, but I would assume we have the need for more of a fine grained interface than that. Haven't studied etcd3 as of yet. But I'd be interested in supporting it in oslo somewhere.
Using oslo.cache might make sense, too.
I think the problems of caching are different than those of primary data store. Caching assumes data is impermanent, that it expires with a given length of time, and that the thing being stored is opaque and can't be queried directly or in the aggregate at least as far as the caching API is concerned (e.g. no "fetch by 'field'", for starters). Whereas a database abstraction API would include support for querying, as well as that it would treat the data as permanent and critical rather than a transitory, stale copy of something. so while I'm 0 on not using oslo.db I'm -1 on using oslo.cache.
Doug