[openstack-dev] [tricircle] multiple cascade services

Zhipeng Huang zhipengh512 at gmail.com
Mon Aug 31 02:37:31 UTC 2015


Hi Joe,

I think you misunderstood what Eran proposed.

Eran proposed a "single service/multi-fake-node" scheme not only to enforce
state-sync (as what ZK usually is used for), but also the execution *order*

It means that even if we implement like what PoC did: multiple-services/one
service per bottom node, we still need another upper layer that provide an
ordered view of those cascade services.

I think what Eran proposed is that, to make Tricircle as an independent
service like we envisioned. Therefore Tricircle only represent one cascade
service, which present an state-synced/order-preserved bottom OpenStack
instances to the Top, via one set of API or RPC Call interfaces.

When you deploy Tricircle, like any other OpenStack services, you implement
multiple necessary process. Fake nodes would be spawn like any other
processes, and there is avialable techniques to make these fake nodes
synced/ordered, in a active/passive. These are ,as Eran mentioned,
implementation details.

In essence, fake nodes is just like those multiple cascade services running
in parallel in PoC design. However in order to make Tricircle more like an
OpenStack standard service, and better cooperation with Mistral on task
order, it would be a good idea to let Tricircle provide an abstract
interface all together at the top, and running fake node processes inside.

My 2 cents, not sure if I got it all right :)




On Sat, Aug 29, 2015 at 9:42 AM, joehuang <joehuang at huawei.com> wrote:

> Hi,
>
>
>
> I think you may have some misunderstanding on the PoC design. (the proxy
> node only to listen the RPC to compute-node/cinder-volume/L2/L3 agent…)
>
>
>
> 1)      The cascading layer including the proxy nodes are assumed running
> in VMs but not in physical servers (you can do that). Even in CJK
> intercloud ( China, Japan, Korea ) intercloud, the cascading layer
> including API,messagebus, DB, proxy nodes are running in VMs
>
>
>
> 2)      For proxy nodes running in VMs, it's not strange that  multiple
> proxy nodes running over one physical server. And if the load of one proxy
> nodes increased, it’s easy to move VM from one physical server to another,
> this is quite mature technology and easy to monitor, to deal with. And most
> of virtualization also support hot scale-up for one virtual machine.
>
>
>
> 3)      It's already in some scenario that the ZooKeeper is used to
> manage the proxy node role and membership. And backup node will take over
> the responsibility of the failed node.
>
>
>
> So I did not see that “fake node” mode will bring extra benefit. On the
> other hand, the “fake node” add additional complexity:
>
>
>
> 1 ) the complexity of the code in cascade service, to implement the RPC to
> scheduler and the RPC to compute node/cinder volume.
>
>
>
> 2 ) how to judge the load of a “fake node”.  If all “fake-nodes” will run
> flatly(no special process or thread, just a symbol) in the same process,
> then how can you judge the load of a “fake node”, by message number ? but
> message number does not imply the  load. The load is often measured through
> CPU utilization / memory occupy, so how to calculate the load for each
> “fake node” and then make decision to move which nodes to other physical
> server? How to manage this “fake-node” in Zookeeper like cluster ware. You
> may want to make fake-node run in different process or thread space, then
> you need to manage “fake-node” and process/thread relationship.
>
>
>
> I admit that the proposal 3 is much more complex to make it work for the
> flexible load balance. We have to record relative stamp for each message in
> the queue, pick the message from message bus, and put the message into task
> queue for each site in DB, then execute this task in order.
>
>
>
> As what has been described above that the proposal 2 does not bring extra
> benefit, and if we don’t want to strive for the 3rd direction, we’d
> better fallback to the proposal 1.
>
>
>
> Best Regards
>
> Chaoyi Huang ( Joe Huang )
>
>
>
> *From:* eran at gampel.co.il [mailto:eran at gampel.co.il] *On Behalf Of *Eran
> Gampel
> *Sent:* Thursday, August 27, 2015 7:07 PM
> *To:* joehuang; Irena Berezovsky; Eshed Gal-Or; Ayal Baron; OpenStack
> Development Mailing List (not for usage questions); caizhiyuan (A); Saggi
> Mizrahi; Orran Krieger; Gal Sagie; Orran Krieger; Zhipeng Huang
> *Subject:* Re: [openstack-dev][tricircle] multiple cascade services
>
>
>
> Hi,
>
> Please see my comments inline
>
> BR,
>
> Eran
>
>
>
> Hello,
>
>
>
> As what we discussed in the yesterday’s meeting, the contradict is how to
> scale out cascade services.
>
>
>
> 1)      In PoC, one proxy node will only forward to one bottom openstack,
> the proxy node will be added to a regarding AZ, and multiple proxy nodes
> for one bottom OpenStack is feasible by adding more proxy nodes into this
> AZ, and the proxy node will be scheduled like usual.
>
>
>
> Is this perfect? No. Because the VM’s host attribute is binding to a
> specific proxy node, therefore, these multiple proxy nodes can’t work in
> cluster mode, and each proxy node has to be backup by one slave node.
>
>
>
> [Eran] I agree with this point - In the PoC you had a limitation of single
> active proxy per bottom site.  In addition, each proxy could only support a
> Single bottom site by-design.
>
>
>
> 2)      The fake node introduced in the cascade service.
>
> Because fanout rpc call for Neutron API is assumed, then no multiple fake
> nodes for one bottom openstack is allowed.
>
>
>
> [Eran] In fact, this is not a limitation in the current design.  We could
> have multiple "fake nodes" to handle the same bottom site, but only 1 that
> is Active.  If this Active node becomes unavailable, one of the other
> "Passive" nodes can take over with some leader-election or any other known
> design pattern (it's an implementation decision).
>
> And because the traffic to one bottom OpenStack is un-predictable, and
> move these fake nodes dynamically among cascade service is very
> complicated, therefore we can’t deploy multiple fake nodes in one cascade
> service.
>
>
>
> [Eran] I'm not sure I follow you on this point... as we see it, there are
> 3 places where load is an issue (and potential bottleneck):
>
> 1. API + message queue + database
>
> 2. Cascading Service itself (dependency builder, communication service,
> DAL)
>
> 3. Task execution
>
>
>
> I think you were concerned about #2, which in our design must be a
> single-active per bottom site (to maintain task order of execution).
>
> In our opinion, the heaviest part is actually #3 (task execution), which
> is delegated to a separate execution path (Mistral workflow or otherwise).
>
> In case we have one Cascading Service handling multiple Bottom sites and
> at some point in time we wish to handle just one Bottom site and move the
> rest of them to a different Cascading Service instance, it is possible.
>
> The way we see it, is we have multiple Fake Nodes running in multiple
> Cascading Services, in Active-Passive.  That way, when one Cascading
> Service instance becomes overloaded, it can give up its "Leadership" on
> active fake nodes, and some of the other Cascading Services will take over
> (leader election, or otherwise).  This is a very common design pattern, we
> don't see anything special or complicated here.
>
>
>
> At last, we have to deploy one fake node one cascade service.
>
> And one cascade service one bottom openstack will limit the burst traffic
> to one cascade openstack.
>
> And you have to backup the cascade service.
>
>
>
> [Eran] This is correct.  In the worst case of traffic burst to a single
> bottom site, a single Cascading Service will handle a single Fake Node
> exclusively, and it is not possible to handle a single Bottom Site with
> more than a single Fake Node at any given time.
>
> Having said that, we don't see a scenario where the Fake Node / Cascading
> Service will become a bottleneck.  We think that #3 (task execution) and #1
> (message queue, API and database) will choke before, probably because the
> OpenStack components in the Top and Bottom sites will not be able to handle
> the burst (which is a completely different story).
>
>
>
> 3)      From the beginning, I prefer to run multiple cascade service in
> parallel, and all of them work in load balance cluster mode.
>
>
>
> [Eran] I believe we already discussed this before - It is actually not
> possible.
>
> If you did that, you would have race condition and miss-ordering of
> actions, and an inconsistent state in the Bottom sites.
>
> For example, if the Top user did:
>
> #1 create security group "111"
>
> #2 update security group "111" with "Allow *"
>
> #3 update security group "111" with "Drop *"
>
>
>
> If you have more than a single Cascading service that is responsible for
> site "A", you don't know what will be the order of actions.
>
> In the example I gave, you may end up with site "A" having security group
> "111" with "Allow *" or with "Deny *".
>
> API of (Nova, Cinder, Neutron… ) calling cascade service through RPC, and
> the RPC call will be only forwarded to one of the cascade service ( just
> put the RPC to message bus queue, and if one of the cascade service pick up
> the message, the message will be remove from the queue, and will not be
> consumed by other cascade service ). When the cascade service received a
> message, will start a task to execute the request. If multiple bottom
> openstacks will be involved, for example, networking, then the networking
> request will be forwarded to regarding multiple bottom openstack where
> there is resources (VM, floating IP)  resides ).
>
>
>
> To keep the correct order of operations, all tasks will store necessary
> data in DB to prevent the operation be broken for single site. (if a VM is
> creating, reboot is not allowed, such kind of use cases has already been
> done on API of (Nova.Cinder,Neutron,…) side )
>
>
>
> [Eran] This will not enforce order - Only keep state between non-racing
> actions.  It will not guarantee consistency in common scenarios of multiple
> updates to a specific resource within a short period, as I just gave with
> the security group.
>
> Maybe it will work for a few predictable use cases, but there will always
> be something else that you did not plan for.
>
> It is ultimately an unsafe design.
>
> If you propose to make the database the coordinator of this process (which
> I don't see how), you will end-up with an even worse bottleneck - in the
> database.
>
>
>
>
>
>
>
> Through this way, we can dynamically add cascade service node, and balance
> the  traffic dynamically.
>
>
>
>
>
> Best Regards
>
> Chaoyi Huang ( Joe Huang )
>
>
>
>
>



-- 
Zhipeng (Howard) Huang

Standard Engineer
IT Standard & Patent/IT Prooduct Line
Huawei Technologies Co,. Ltd
Email: huangzhipeng at huawei.com
Office: Huawei Industrial Base, Longgang, Shenzhen

(Previous)
Research Assistant
Mobile Ad-Hoc Network Lab, Calit2
University of California, Irvine
Email: zhipengh at uci.edu
Office: Calit2 Building Room 2402

OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150831/1b42d404/attachment.html>


More information about the OpenStack-dev mailing list