[openstack-dev] [Quantum][LBaaS] Architecture: Agents, Drivers, async calls
Youcef Laribi
Youcef.Laribi at eu.citrix.com
Thu Nov 15 20:06:22 UTC 2012
Ilya,
Thanks for your detailed reply. On the Agent process, yes if it runs on the same host as the Quantum service, and we assume that all drivers can configure their devices remotely, then I think this would be best, thanks for clarifying this point.
On the workflow of operations, your diagram is very good, it is me who somehow misread it, apologies, I must have been half-asleep :) To continue on your description of how to handle the various crash conditions, if an object is stuck in a PENDING_* status, do we need to allow the user to delete it? Because currently all operations would be refused while the object is in a PENDING_* status. Or do we need to transform an object into an ERROR status if it gets stuck into PENDING_* status for a configurable time period. And in the same manner, we would need somehow for the user to be able to clean objects that are in ERROR status, so shall we allow users to delete objects that are in ERROR status (both from DB and from device)?
Youcef
From: Ilya Shakhat [mailto:ishakhat at mirantis.com]
Sent: Thursday, November 15, 2012 7:15 AM
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] [Quantum][LBaaS] Architecture: Agents, Drivers, async calls
Youcef,
Please see comments inline.
2012/11/15 Youcef Laribi <Youcef.Laribi at eu.citrix.com<mailto:Youcef.Laribi at eu.citrix.com>>
Ilya,
Both designs are valid (whether the LBaaS plugin implements a uniform asynchronous model, or whether to leave it to each vendor to decide whether their driver is synchronous or not), and I'm fine with choosing the one you described. I have a few questions on it.
In that setup, I'm assuming we will have 2 processes: The Quantum service process and the Agent process. The two communicate thru message queues. No vendor-specific code will be in the Quantum service process. The Agent process is the one that has all the vendors drivers loaded in.
Yes. That's right.
Where are you planning for the agent process to run? On the same host as the Quantum service process? On a separate host? On the device? Imagine in the same agent, you have a case where one driver assumes it is on the same machine as the device (HA-Proxy?), while another driver uses a remote protocol to configure the device (the agent cannot run on the device as the device is a sealed appliance), where does the agent run in this case?
Agent process may run on the same host as the Quantum service. In small setups there should be enough only one agent process per cloud. In large setups there may be several agents each processing messages. The agent is not needed on devices, since it controls them remotely via management interface.
The other question I have is on the LBaaS plugin workflow that you described in this diagram: http://wiki.openstack.org/Quantum/LBaaS/Architecture/ConcurrentRequests. Here it seems that we are returning a response to the user even before we update the database. So, this could mean that if the Quantum service crashes before updating the database, we would have lost all details of the request, even though we have returned a resource ID, and a "PENDING_CREATE" status to the user. So, when the user queries the service (after it has restarted) to get the status of the creation, the resource wouldn't even exist since it has never been saved to the DB. What do you think of updating the DB before returning the response to the user?
Well..(maybe I'm not so good at diagram drawing).. For the case of object creation, I see the following workflow:
1. The request is accepted by Plugin and validated. If not valid the error is returned.
2. The record is added into DB with status "PENDING_CREATE"
3. The message is pushed into queue
4. Plugin responses to user with HTTP 202 reply. Steps 1-4 are done synchronously.
5. The message is processed by Agent, driver and device. The result is pushed into Plugin's queue
6. Plugin retrieves result and updates DB with either "ACTIVE" or "ERROR" status
If crash occurs on 1) or inside 2) then the object will be lost, but user will know about this since this happens during request processing. If the crash happens on 3-6 then the object remains in "PENDING_" state and user will have to decide how long to keep such objects. The worst situation is when failure happens on 6), because the change is applied to device, but not reflected in DB. In both last cases user will want to do clean up and here we need to remove objects in PENDING_ state not only from DB, but also send command to device.
Thanks,
Ilya
Thanks
Youcef
From: Ilya Shakhat [mailto:ishakhat at mirantis.com<mailto:ishakhat at mirantis.com>]
Sent: Tuesday, November 13, 2012 11:45 AM
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] [Quantum][LBaaS] Architecture: Agents, Drivers, async calls
Some reasons why LBaaS core should be responsible for processing requests asynchronously:
* Driver code will be as simple as possible, in most cases it will just translate LBaaS model into device-specific;
* There will be no dependencies between drivers and user requests will take approximately the same time for different drivers. This will avoid a case when some driver take too much time to apply config synchronously and block other requests.
* REST API is already asynchronous.
To summarize what Eugene proposed, LBaaS will consist of (see diagram http://wiki.openstack.org/Quantum/LBaaS/Architecture?action=AttachFile&do=view&target=lbaas_architecture_new.png):
* Extension - it's a front-end of a service
* Plugin - responsible for request processing, persistence and core functionality (scheduling). All operations may be thought as atomic and quick. They are done synchronously.
* Agent - responsible for executing commands on specific devices with help of drivers. It gets requests from Plugin via MQ and process them in asynchronous way
* Driver - translates from unified API to vendor-specific. Its work may be time-consuming.
Among all operations, update look the most complicated; the case of 2 concurrent updates and workflow is shown on http://wiki.openstack.org/Quantum/LBaaS/Architecture/ConcurrentRequests.
Thanks,
Ilya
2012/11/13 Youcef Laribi <Youcef.Laribi at eu.citrix.com<mailto:Youcef.Laribi at eu.citrix.com>>
Eugene,
Another way to look at the workflow is to make sure that the LBaaS Plugin updates the database synchronously (and generates the resource ID before returning to the user), and then let it to the driver implementations to decide whether they want to handle the call synchronously or asynchronously.
I like drawing pictures to avoid misunderstandings, so here are 2 pictures illustrating 2 vendors, one deciding to implement their driver in a synchronous way, and the other one in an asynchronous way.
Synchronous driver implementation: http://wiki.openstack.org/Quantum/LBaaS?action=AttachFile&do=view&target=LBaaS+synchronous+driver+implementation.png
Asynchronous driver implementation: http://wiki.openstack.org/Quantum/LBaaS?action=AttachFile&do=view&target=LBaaS+asynchronous+driver+implementation.png
As far as the plugin is concerned, the calls to drivers are always synchronous in the sense that the driver doesn't have to deal with queues, etc. The plugin should expect the driver to return either a "COMPLETED" status (meaning the call has been executed on the device), ora "PENDING" status (meaning that the driver has started the operation but it is not complete). The plugin updates the database in both cases with the outcome of the call, and returns the result to the user.
This would allow a lot of freedom in driver implementations. A vendor can start with a synchronous implementation because it is quick to implement, and then later on move on to an asynchronous implementation without impacting the LBaaS plugin. Or it can implement some calls synchronously while other calls (which might take a long time to complete) asynchronously. You can also have different vendors using different driver strategies wrt. Synchronicity, or using different queuing mechanisms.
Thanks
Youcef
From: Eugene Nikanorov [mailto:enikanorov at mirantis.com<mailto:enikanorov at mirantis.com>]
Sent: Monday, November 12, 2012 6:24 AM
To: openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>
Subject: [openstack-dev] [Quantum][LBaaS] Architecture: Agents, Drivers, async calls
Hi folks,
In the latest meeting we've mentioned several important architectural points including:
- agents vs direct driver call
- asynchronous execution
- dispatching a generic REST call to a proper driver.
I would like to present how Mirantis team sees this based on our previous experience with LBaaS and other openstack components.
We need asynchronous execution in the sense that client gets immediate response while actual device configuration happens later.
Workflow of such operation could look like following:
1) client makes REST call; receives an object it has created/modified with PENDING status
2) call is dispatched to a plugin, plugin creates/modifies/etc an object in the database
3) plugin calls driver to apply new configuration to specific device
4) driver finishes applying configuration, plugin updates DB object
5) client polls objectID and gets final status of operation.
Now depending on approach we take, (3) could expand into different sequence of operations.
One of the good options to choose could be using agent between plugin and drivers. In this case (3) expands to:
3.1 plugin posts message to mq
3.2. message is consumed by one of the running service agents
3.3. agent calls corresponding driver directly in synchronous way.
3.4. agent posts message upon completion.
3.5. plugin consumes the message and updates DB object with final status
Such approach solves at least two potential problems:
1. plugin may be simplified since it is not required to implement call/work item queuing
2. Applying device configuration is time consuming task which could take seconds.
Both plugin and agent has thread limit for any concurrent operations.
Handling heavy workload in large deployments will be simple with several agents consuming messages from mq.
Also this allows to create synchronous drivers since asyncness will be handled by mq + agent.
Another option could be calling drivers directly without any asyncness at all while preserving above workflow (1-5).
That could work as temporary fast solution while allowing to split it to "plugin + agent approach" relatively easily.
Regarding the dispatching REST calls to proper driver:
In fact, VIP object should contain reference to particular device it is created at.
http://wiki.openstack.org/LBaaS/CoreResourceModel/proposal misses that device management part, I think it was just implied there.
Every balancer-related object references the VIP and hence references the specific device where it was created.
E,g, when a call for any object is made, plugin needs to extract device type from DB following those references and later plugin or agent will use it to call particular driver.
Thanks,
Eugene.
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org<mailto:OpenStack-dev at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org<mailto:OpenStack-dev at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20121115/1e98f71b/attachment.html>
More information about the OpenStack-dev
mailing list