[openstack-dev] [Quantum][LBaaS] Architecture: Agents, Drivers, async calls

Ilya Shakhat ishakhat at mirantis.com
Thu Nov 15 15:15:13 UTC 2012


Youcef,

Please see comments inline.

2012/11/15 Youcef Laribi <Youcef.Laribi at eu.citrix.com>

> Ilya,****
>
> ** **
>
> Both designs are valid (whether the LBaaS plugin implements a uniform
> asynchronous model, or whether to leave it to each vendor to decide whether
> their driver is synchronous or not), and I’m fine with choosing the one you
> described. I have a few questions on it.****
>
> ** **
>
> In that setup, I’m assuming we will have 2 processes: The Quantum service
> process and the Agent process.  The two communicate thru message queues. No
> vendor-specific code will be in the Quantum service process. The Agent
> process is the one that has all the vendors drivers loaded in.
>
Yes. That's right.

**
>
> Where are you planning for the agent process to run? On the same host as
> the Quantum service process? On a separate host? On the device? Imagine in
> the same agent, you have a case where one driver assumes it is on the same
> machine as the device (HA-Proxy?), while another driver uses a remote
> protocol to configure the device (the agent cannot run on the device as the
> device is a sealed appliance), where does the agent run in this case?
>
Agent process may run on the same host as the Quantum service. In small
setups there should be enough only one agent process per cloud. In large
setups there may be several agents each processing messages. The agent is
not needed on devices, since it controls them remotely via management
interface.

>
>
> The other question I have is on the LBaaS plugin workflow that you
> described in this diagram:
> http://wiki.openstack.org/Quantum/LBaaS/Architecture/ConcurrentRequests. Here
> it seems that we are returning a response to the user even before we update
> the database. So, this could mean that if the Quantum service crashes
> before updating the database, we would have lost all details of the
> request, even though we have returned a resource ID, and a “PENDING_CREATE”
> status to the user. So, when the user queries the service (after it has
> restarted) to get the status of the creation, the resource wouldn’t even
> exist since it has never been saved to the DB. What do you think of
> updating the DB before returning the response to the user?
>
 Well..(maybe I'm not so good at diagram drawing).. For the case of object
creation, I see the following workflow:

   1. The request is accepted by Plugin and validated. If not valid the
   error is returned.
   2. The record is added into DB with status "PENDING_CREATE"
   3. The message is pushed into queue
   4. Plugin responses to user with HTTP 202 reply. Steps 1-4 are done
   synchronously.
   5. The message is processed by Agent, driver and device. The result is
   pushed into Plugin's queue
   6. Plugin retrieves result and updates DB with either "ACTIVE" or
   "ERROR" status

If crash occurs on 1) or inside 2) then the object will be lost, but user
will know about this since this happens during request processing. If the
crash happens on 3-6 then the object remains in "PENDING_" state and user
will have to decide how long to keep such objects. The worst situation is
when failure happens on 6), because the change is applied to device, but
not reflected in DB. In both last cases user will want to do clean up and
here we need to remove objects in PENDING_ state not only from DB, but also
send command to device.

Thanks,
Ilya


> **
>
> Thanks****
>
> Youcef****
>
> ** **
>
> *From:* Ilya Shakhat [mailto:ishakhat at mirantis.com]
> *Sent:* Tuesday, November 13, 2012 11:45 AM
> *To:* OpenStack Development Mailing List
> *Subject:* Re: [openstack-dev] [Quantum][LBaaS] Architecture: Agents,
> Drivers, async calls****
>
> ** **
>
> Some reasons why LBaaS core should be responsible for processing requests
> asynchronously: ****
>
>    - Driver code will be as simple as possible, in most cases it will
>    just translate LBaaS model into device-specific;****
>    - There will be no dependencies between drivers and user requests will
>    take approximately the same time for different drivers. This will avoid a
>    case when some driver take too much time to apply config synchronously and
>    block other requests.****
>    - REST API is already asynchronous. ****
>
> To summarize what Eugene proposed, LBaaS will consist of (see diagram
> http://wiki.openstack.org/Quantum/LBaaS/Architecture?action=AttachFile&do=view&target=lbaas_architecture_new.png
> ): ****
>
>    - Extension - it's a front-end of a service****
>    - Plugin - responsible for request processing, persistence and core
>    functionality (scheduling). All operations may be thought as atomic and
>    quick. They are done synchronously.****
>    - Agent - responsible for executing commands on specific devices with
>    help of drivers. It gets requests from Plugin via MQ and process them in
>    asynchronous way****
>    - Driver - translates from unified API to vendor-specific. Its work
>    may be time-consuming.****
>
> Among all operations, update look the most complicated; the case of 2
> concurrent updates and workflow is shown on
> http://wiki.openstack.org/Quantum/LBaaS/Architecture/ConcurrentRequests. *
> ***
>
> ** **
>
> Thanks,****
>
> Ilya****
>
> ** **
>
> 2012/11/13 Youcef Laribi <Youcef.Laribi at eu.citrix.com>****
>
> Eugene,****
>
>  ****
>
> Another way to look at the workflow is to make sure that the LBaaS Plugin
> updates the database synchronously (and generates the resource ID before
> returning to the user), and then let it to the driver implementations to
> decide whether they want to handle the call synchronously or asynchronously.
> ****
>
>  ****
>
> I like drawing pictures to avoid misunderstandings, so here are 2 pictures
> illustrating 2 vendors, one deciding to implement their driver in a
> synchronous way, and the other one in an asynchronous way. ****
>
>  ****
>
> Synchronous driver implementation:
> http://wiki.openstack.org/Quantum/LBaaS?action=AttachFile&do=view&target=LBaaS+synchronous+driver+implementation.png
> ****
>
> Asynchronous driver implementation:
> http://wiki.openstack.org/Quantum/LBaaS?action=AttachFile&do=view&target=LBaaS+asynchronous+driver+implementation.png
> ****
>
>  ****
>
> As far as the plugin is concerned, the calls to drivers are always
> synchronous in the sense that the driver doesn’t have to deal with queues,
> etc. The plugin should expect the driver to return either a “COMPLETED”
> status (meaning the call has been executed on the device), ora “PENDING”
> status (meaning that the driver has started the operation but it is not
> complete). The plugin updates the database in both cases with the outcome
> of the call, and returns the result to the user. ****
>
>  ****
>
> This would allow a lot of freedom in driver implementations. A vendor can
> start with a synchronous implementation because it is quick to implement,
> and then later on move on to an asynchronous implementation without
> impacting the LBaaS plugin. Or it can implement some calls synchronously
> while other calls (which might take a long time to complete)
> asynchronously. You can also have different vendors using different driver
> strategies wrt. Synchronicity, or using different queuing mechanisms.****
>
>  ****
>
> Thanks****
>
> Youcef****
>
>  ****
>
>  ****
>
>  ****
>
> *From:* Eugene Nikanorov [mailto:enikanorov at mirantis.com]
> *Sent:* Monday, November 12, 2012 6:24 AM
> *To:* openstack-dev at lists.openstack.org
> *Subject:* [openstack-dev] [Quantum][LBaaS] Architecture: Agents,
> Drivers, async calls****
>
>  ****
>
> Hi folks,****
>
>  ****
>
> In the latest meeting we've mentioned several important architectural
> points including:****
>
> - agents vs direct driver call****
>
> - asynchronous execution****
>
> - dispatching a generic REST call to a proper driver.****
>
>  ****
>
> I would like to present how Mirantis team sees this based on our previous
> experience with LBaaS and other openstack components.****
>
>  ****
>
> We need asynchronous execution in the sense that client gets immediate
> response while actual device configuration happens later.****
>
> Workflow of such operation could look like following:****
>
> 1) client makes REST call; receives an object it has created/modified with
> PENDING status ****
>
> 2) call is dispatched to a plugin, plugin creates/modifies/etc an object
> in the database****
>
> 3) plugin calls driver to apply new configuration to specific device****
>
> 4) driver finishes applying configuration, plugin updates DB object****
>
> 5) client polls objectID and gets final status of operation.****
>
>  ****
>
> Now depending on approach we take, (3) could expand into different
> sequence of operations.****
>
> One of the good options to choose could be using agent between plugin and
> drivers. In this case (3) expands to:****
>
> 3.1 plugin posts message to mq****
>
> 3.2. message is consumed by one of the running service agents****
>
> 3.3. agent calls corresponding driver directly in synchronous way.****
>
> 3.4. agent posts message upon completion.****
>
> 3.5. plugin consumes the message and updates DB object with final status**
> **
>
>  ****
>
> Such approach solves at least two potential problems:****
>
> 1. plugin may be simplified since it is not required to implement
> call/work item queuing ****
>
> 2. Applying device configuration is time consuming task which could take
> seconds. ****
>
> Both plugin and agent has thread limit for any concurrent operations. ****
>
> Handling heavy workload in large deployments will be simple with several
> agents consuming messages from mq.****
>
>  ****
>
> Also this allows to create synchronous drivers since asyncness will be
> handled by mq + agent.****
>
>  ****
>
> Another option could be calling drivers directly without any asyncness at
> all while preserving above workflow (1-5). ****
>
> That could work as temporary fast solution while allowing to split it to
> "plugin + agent approach" relatively easily.****
>
>  ****
>
> Regarding the dispatching REST calls to proper driver:****
>
> In fact, VIP object should contain reference to particular device it is
> created at. ****
>
> http://wiki.openstack.org/LBaaS/CoreResourceModel/proposal misses that
> device management part, I think it was just implied there.****
>
> Every balancer-related object references the VIP and hence references the
> specific device where it was created. ****
>
> E,g, when a call for any object is made, plugin needs to extract device
> type from DB following those references and later plugin or agent will use
> it to call particular driver.****
>
>  ****
>
> Thanks,****
>
> Eugene.****
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev****
>
> ** **
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20121115/483562fe/attachment.html>


More information about the OpenStack-dev mailing list