[openstack-dev] [watcher] Api and Decision Engine integration - design question
tomasz.kaczynski at intel.com
Fri Apr 15 08:10:02 UTC 2016
I'm implementing the Watcher Scoring Module. As part of that, I need to expose the information about Scoring Engines through the API/Python CLI.
The scoring engine list might be quite dynamic. Although the scoring engines will be pluggable through the stevedore plug-in model, a single plug-in might contain one or more scoring engines. In some scenarios this list will be static - a plug-in developer will just expose few algorithms and that's it. But in some other scenarios, the scoring engines might be implemented as external web services for example and there might be an on-going development process on data models, which will result in multiple scoring engines in multiple versions, which might change quite frequently (e.g. few times a day).
Of course, the responsibility for handling all of that is entirely on the scoring engine plug-in developer. But it would be good to keep the scoring engine abstraction layer clean and simple, hiding all of these details.
And here comes the problem:
Somehow the dynamic list of scoring engines has to be passed from Decision Engine (where the Scoring Engine abstraction layer will be sitting) to the Api / CLI. There are currently 2 options on the table how this could be done:
Allow Api to call Decision Engine directly through existing RPC Api (currently using messaging transport).
Let Decision Engine keep Scoring Engine information synced in the DB so that Watcher Api can simply query for this information as required.
Pros and cons of each option:
- Good: Simpler implementation and no need for keeping DB in sync.
- Good: No risk of data inconsistency. Nothing is being cached, data is always accurate. Decision Engine is a single source of truth.
- Good: Scoring Engine Plug-in creates a simple stevedore plug-in, implements scoring engine classes, implements a factory class returning scoring engines and that's all.
- Good: Supports also more complicated scenarios with dynamic scoring engine list - encapsulated in the factory class.
- Bad: Dependency on Decision Engine - it needs to be up and running. Can be mitigated by caching the last response from Decision Engine - if DE RCP Api is not responding, the last known data could be returned.
- Bad: Not sure how reliable/performant RPC over messaging transport is. Need to test.
- Bad: Might have scalability issues (I believe there is only one Decision Engine instance, please confirm!). But this might be at least partially mitigated by caching on the Watcher Api level (e.g. if the last data was retrieved less than X minutes ago, no need to query Decision Engine). In the context that this information is only used by Strategy developers to actually implement strategies using some Scoring Engines, it might be perfectly fine to cache data for longer periods of time (1 hour or more).
- Good: Watcher Api decoupled from Decision Engine. Can work even if DE is not working or busy.
- Good: In case of Watcher this option should scale better. Decision Engine typically has only one instance and is not subject to horizontal scalability (please confirm my understanding!).
- Bad: More complicated implementation. For dynamic scenarios (adding scoring engines on the fly) requires some sort of notification mechanism, so that the DB will stay in sync. Can be done by exposing event handling in scoring engine abstraction layer, but it's unnecessary complication for simple cases with static data. But can be mitigated by using helper classes enforcing DB sync without actually exposing any events in the abstract classes (so if plug-in needs to sync DB, it calls some helper method, all others just do nothing).
- Bad: Potential issues with data consistency. If there is a problem or a bug in the sync code, it might be hard to recover from the problem without Watcher redeployment.
- Bad: Any change in the DB structure might require to change all the parties and even the existing plug-ins.
My preference is to go with option 1 because of the simpler implementation and no problems with data consistency. Nothing needs to be purged, synced, data is always accurate. If Decision Engine is not working, there is a bigger problem anyway (but there is a mitigation by caching the DE last response).
I hope I managed to explain the concept and the problem.
I appreciate your opinion about that!
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek
przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by
others is strictly prohibited.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev