[openstack-dev] [Quantum][LBaaS] Architecture: Agents, Drivers, async calls

Ilya Shakhat ishakhat at mirantis.com
Fri Nov 16 15:11:06 UTC 2012


I agree with the ideas to transform stuck objects into ERROR status after
timeout and to allow users delete such objects.

Ilya

2012/11/16 Youcef Laribi <Youcef.Laribi at eu.citrix.com>

> Ilya,****
>
> ** **
>
> Thanks for  your detailed reply. On the Agent process, yes if it runs on
> the same host as the Quantum service, and we assume that all drivers can
> configure their devices remotely, then I think this would be best, thanks
> for clarifying this point. ****
>
> ** **
>
> On the workflow of operations, your diagram is very good, it is me who
> somehow misread it, apologies, I must have been half-asleep J To continue
> on your description of how to handle the various crash conditions, if an
> object is stuck in a PENDING_* status, do we need to allow the user to
> delete it? Because currently all operations would be refused while the
> object is in a PENDING_* status.  Or do we need to transform an object into
> an ERROR status if it gets stuck into PENDING_* status for a configurable
> time period. And in the same manner, we would need somehow for the user to
> be able to clean objects that are in ERROR status, so shall we allow users
> to delete objects that are in ERROR status (both from DB and from device)?
> ****
>
> ** **
>
> Youcef****
>
> ** **
>
> ** **
>
> ** **
>
> *From:* Ilya Shakhat [mailto:ishakhat at mirantis.com]
> *Sent:* Thursday, November 15, 2012 7:15 AM
>
> *To:* OpenStack Development Mailing List
> *Subject:* Re: [openstack-dev] [Quantum][LBaaS] Architecture: Agents,
> Drivers, async calls****
>
> ** **
>
> Youcef, ****
>
> ** **
>
> Please see comments inline.****
>
> 2012/11/15 Youcef Laribi <Youcef.Laribi at eu.citrix.com>****
>
> Ilya,****
>
>  ****
>
> Both designs are valid (whether the LBaaS plugin implements a uniform
> asynchronous model, or whether to leave it to each vendor to decide whether
> their driver is synchronous or not), and I’m fine with choosing the one you
> described. I have a few questions on it.****
>
>  ****
>
> In that setup, I’m assuming we will have 2 processes: The Quantum service
> process and the Agent process.  The two communicate thru message queues. No
> vendor-specific code will be in the Quantum service process. The Agent
> process is the one that has all the vendors drivers loaded in.****
>
> Yes. That's right.****
>
> ** **
>
> Where are you planning for the agent process to run? On the same host as
> the Quantum service process? On a separate host? On the device? Imagine in
> the same agent, you have a case where one driver assumes it is on the same
> machine as the device (HA-Proxy?), while another driver uses a remote
> protocol to configure the device (the agent cannot run on the device as the
> device is a sealed appliance), where does the agent run in this case?****
>
> Agent process may run on the same host as the Quantum service. In small
> setups there should be enough only one agent process per cloud. In large
> setups there may be several agents each processing messages. The agent is
> not needed on devices, since it controls them remotely via management
> interface.****
>
>  ****
>
> The other question I have is on the LBaaS plugin workflow that you
> described in this diagram:
> http://wiki.openstack.org/Quantum/LBaaS/Architecture/ConcurrentRequests. Here
> it seems that we are returning a response to the user even before we update
> the database. So, this could mean that if the Quantum service crashes
> before updating the database, we would have lost all details of the
> request, even though we have returned a resource ID, and a “PENDING_CREATE”
> status to the user. So, when the user queries the service (after it has
> restarted) to get the status of the creation, the resource wouldn’t even
> exist since it has never been saved to the DB. What do you think of
> updating the DB before returning the response to the user?****
>
>  Well..(maybe I'm not so good at diagram drawing).. For the case of object
> creation, I see the following workflow:****
>
>    1. The request is accepted by Plugin and validated. If not valid the
>    error is returned.****
>    2. The record is added into DB with status "PENDING_CREATE"****
>    3. The message is pushed into queue****
>    4. Plugin responses to user with HTTP 202 reply. Steps 1-4 are done
>    synchronously.****
>    5. The message is processed by Agent, driver and device. The result is
>    pushed into Plugin's queue****
>    6. Plugin retrieves result and updates DB with either "ACTIVE" or
>    "ERROR" status****
>
> If crash occurs on 1) or inside 2) then the object will be lost, but user
> will know about this since this happens during request processing. If the
> crash happens on 3-6 then the object remains in "PENDING_" state and user
> will have to decide how long to keep such objects. The worst situation is
> when failure happens on 6), because the change is applied to device, but
> not reflected in DB. In both last cases user will want to do clean up and
> here we need to remove objects in PENDING_ state not only from DB, but also
> send command to device.****
>
> ** **
>
> Thanks,****
>
> Ilya****
>
>  ****
>
> Thanks****
>
> Youcef****
>
>  ****
>
> *From:* Ilya Shakhat [mailto:ishakhat at mirantis.com]
> *Sent:* Tuesday, November 13, 2012 11:45 AM
> *To:* OpenStack Development Mailing List
> *Subject:* Re: [openstack-dev] [Quantum][LBaaS] Architecture: Agents,
> Drivers, async calls****
>
>  ****
>
> Some reasons why LBaaS core should be responsible for processing requests
> asynchronously: ****
>
>    - Driver code will be as simple as possible, in most cases it will
>    just translate LBaaS model into device-specific;****
>    - There will be no dependencies between drivers and user requests will
>    take approximately the same time for different drivers. This will avoid a
>    case when some driver take too much time to apply config synchronously and
>    block other requests.****
>    - REST API is already asynchronous. ****
>
> To summarize what Eugene proposed, LBaaS will consist of (see diagram
> http://wiki.openstack.org/Quantum/LBaaS/Architecture?action=AttachFile&do=view&target=lbaas_architecture_new.png
> ): ****
>
>    - Extension - it's a front-end of a service****
>    - Plugin - responsible for request processing, persistence and core
>    functionality (scheduling). All operations may be thought as atomic and
>    quick. They are done synchronously.****
>    - Agent - responsible for executing commands on specific devices with
>    help of drivers. It gets requests from Plugin via MQ and process them in
>    asynchronous way****
>    - Driver - translates from unified API to vendor-specific. Its work
>    may be time-consuming.****
>
> Among all operations, update look the most complicated; the case of 2
> concurrent updates and workflow is shown on
> http://wiki.openstack.org/Quantum/LBaaS/Architecture/ConcurrentRequests. *
> ***
>
>  ****
>
> Thanks,****
>
> Ilya****
>
>  ****
>
> 2012/11/13 Youcef Laribi <Youcef.Laribi at eu.citrix.com>****
>
> Eugene,****
>
>  ****
>
> Another way to look at the workflow is to make sure that the LBaaS Plugin
> updates the database synchronously (and generates the resource ID before
> returning to the user), and then let it to the driver implementations to
> decide whether they want to handle the call synchronously or asynchronously.
> ****
>
>  ****
>
> I like drawing pictures to avoid misunderstandings, so here are 2 pictures
> illustrating 2 vendors, one deciding to implement their driver in a
> synchronous way, and the other one in an asynchronous way. ****
>
>  ****
>
> Synchronous driver implementation:
> http://wiki.openstack.org/Quantum/LBaaS?action=AttachFile&do=view&target=LBaaS+synchronous+driver+implementation.png
> ****
>
> Asynchronous driver implementation:
> http://wiki.openstack.org/Quantum/LBaaS?action=AttachFile&do=view&target=LBaaS+asynchronous+driver+implementation.png
> ****
>
>  ****
>
> As far as the plugin is concerned, the calls to drivers are always
> synchronous in the sense that the driver doesn’t have to deal with queues,
> etc. The plugin should expect the driver to return either a “COMPLETED”
> status (meaning the call has been executed on the device), ora “PENDING”
> status (meaning that the driver has started the operation but it is not
> complete). The plugin updates the database in both cases with the outcome
> of the call, and returns the result to the user. ****
>
>  ****
>
> This would allow a lot of freedom in driver implementations. A vendor can
> start with a synchronous implementation because it is quick to implement,
> and then later on move on to an asynchronous implementation without
> impacting the LBaaS plugin. Or it can implement some calls synchronously
> while other calls (which might take a long time to complete)
> asynchronously. You can also have different vendors using different driver
> strategies wrt. Synchronicity, or using different queuing mechanisms.****
>
>  ****
>
> Thanks****
>
> Youcef****
>
>  ****
>
>  ****
>
>  ****
>
> *From:* Eugene Nikanorov [mailto:enikanorov at mirantis.com]
> *Sent:* Monday, November 12, 2012 6:24 AM
> *To:* openstack-dev at lists.openstack.org
> *Subject:* [openstack-dev] [Quantum][LBaaS] Architecture: Agents,
> Drivers, async calls****
>
>  ****
>
> Hi folks,****
>
>  ****
>
> In the latest meeting we've mentioned several important architectural
> points including:****
>
> - agents vs direct driver call****
>
> - asynchronous execution****
>
> - dispatching a generic REST call to a proper driver.****
>
>  ****
>
> I would like to present how Mirantis team sees this based on our previous
> experience with LBaaS and other openstack components.****
>
>  ****
>
> We need asynchronous execution in the sense that client gets immediate
> response while actual device configuration happens later.****
>
> Workflow of such operation could look like following:****
>
> 1) client makes REST call; receives an object it has created/modified with
> PENDING status ****
>
> 2) call is dispatched to a plugin, plugin creates/modifies/etc an object
> in the database****
>
> 3) plugin calls driver to apply new configuration to specific device****
>
> 4) driver finishes applying configuration, plugin updates DB object****
>
> 5) client polls objectID and gets final status of operation.****
>
>  ****
>
> Now depending on approach we take, (3) could expand into different
> sequence of operations.****
>
> One of the good options to choose could be using agent between plugin and
> drivers. In this case (3) expands to:****
>
> 3.1 plugin posts message to mq****
>
> 3.2. message is consumed by one of the running service agents****
>
> 3.3. agent calls corresponding driver directly in synchronous way.****
>
> 3.4. agent posts message upon completion.****
>
> 3.5. plugin consumes the message and updates DB object with final status**
> **
>
>  ****
>
> Such approach solves at least two potential problems:****
>
> 1. plugin may be simplified since it is not required to implement
> call/work item queuing ****
>
> 2. Applying device configuration is time consuming task which could take
> seconds. ****
>
> Both plugin and agent has thread limit for any concurrent operations. ****
>
> Handling heavy workload in large deployments will be simple with several
> agents consuming messages from mq.****
>
>  ****
>
> Also this allows to create synchronous drivers since asyncness will be
> handled by mq + agent.****
>
>  ****
>
> Another option could be calling drivers directly without any asyncness at
> all while preserving above workflow (1-5). ****
>
> That could work as temporary fast solution while allowing to split it to
> "plugin + agent approach" relatively easily.****
>
>  ****
>
> Regarding the dispatching REST calls to proper driver:****
>
> In fact, VIP object should contain reference to particular device it is
> created at. ****
>
> http://wiki.openstack.org/LBaaS/CoreResourceModel/proposal misses that
> device management part, I think it was just implied there.****
>
> Every balancer-related object references the VIP and hence references the
> specific device where it was created. ****
>
> E,g, when a call for any object is made, plugin needs to extract device
> type from DB following those references and later plugin or agent will use
> it to call particular driver.****
>
>  ****
>
> Thanks,****
>
> Eugene.****
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev****
>
>  ****
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev****
>
> ** **
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20121116/1a3a106a/attachment.html>


More information about the OpenStack-dev mailing list