[Openstack] Cells use cases

Mike Wilson geekinutah at gmail.com
Thu Oct 3 19:49:04 UTC 2013


Tim,

We currently run a bit more than 20k hypervisors on a single cell. We had
three major problems with getting this large: RPC, DB and scheduler. RPC is
solvable by getting away from the hub-spoke topology that brokered
messaging forces you into, AKA, use 0mq. DB was overcome by a combination
of tuning mysql for openstack workloads and shoveling off appropriate reads
to mysql replicas, see
https://blueprints.launchpad.net/nova/+spec/db-slave-handle and
https://blueprints.launchpad.net/nova/+spec/periodic-tasks-to-db-slave for
some examples of how this works. Scheduler is a problem that we didn't
really solve within Openstack.  Fortunately, the way we use Openstack
internally makes our scheduling decisions very simple so we basically
skipped that problem by implementing a simplified scheduler that runs O(1).
 The filter scheduler is the most limiting factor in my opinion. This is
what really keeps folks from going larger than around 1k nodes. Sans
filter_scheduler I think the realistic upper limit is somewhere around 3k.

Now that I've said all this, cells does handle these three problems very
nicely by partitioning them all off and coordinating the api. However,
there are some missing features that I think are not trivial to implement.
I'm also not a fan of how cells decentralizes data and messaging, but I
digress. I feel like much more development needs to be done on it and I'm
not sure I really like the structure and requirements. I guess my view of
cells is that it's a good way to partition clouds into a hierarchy and
divide failure domains. I just don't think it's the end all in matters of
scale in it's current state. I'm hopeful that we can flesh this out a bunch
more in Icehouse.

-Mike


On Thu, Oct 3, 2013 at 12:41 PM, Tim Bell <Tim.Bell at cern.ch> wrote:

>  ** **
>
> We’ve got several OpenStack clouds at CERN (details in
> http://openstack-in-production.blogspot.fr/2013/09/a-tale-of-3-openstack-clouds-50000.html
> ).****
>
> ** **
>
> The CMS farm was the further ahead and encountered problems with the
> number of database connections at around 1300 hypervisors. Nova conductor
> helps some of these.****
>
> ** **
>
> Given that we’re heading towards 15K hypervisors for the central instance
> at CERN, I am not sure a single cell would handle it.****
>
> ** **
>
> I’d be happy to hear experiences from others in this area.****
>
> ** **
>
> Belmiro will be giving a summit talk on the deep dive including our
> experiences for those who are able to make it.****
>
> ** **
>
> Tim****
>
> ** **
>
> *From:* Joshua Harlow [mailto:harlowja at yahoo-inc.com]
> *Sent:* 03 October 2013 20:32
> *To:* Subbu Allamaraju; Tim Bell
> *Cc:* openstack at lists.openstack.org
>
> *Subject:* Re: [Openstack] Cells use cases****
>
>  ** **
>
> Hi Tim,****
>
> ** **
>
> I'd also like to know what happens above 1000 hypervisors that u think
> needs cells?****
>
> ** **
>
> From experience at y! we actually start to see the nova-scheduler (and the
> filter scheduler mainly) be the problem (at around ~400 hypervisors) and
> that seems addressable without cells (yes it requires some smart/fast
> coding that the current scheduler is not designed for, but that seems
> manageable and achievable) via reviews like
> https://review.openstack.org/#/c/46588,
> https://review.openstack.org/#/c/45867 (and others that are popping up).
> The filter scheduler appears to scale linearly with the number of
> hypervisors, and this is problematic since the filter-scheduler is also
> single-CPU bound (due to eventlet) so that overall, makes for some nice
> suckage. We haven't seen the RPC layer be a problem at our current scale,
> but maybe u guys have hit this. The other issue that starts to happen
> around ~400 is the nova service group code, which is not exactly performant
> when using the DB backend (we haven't tried the ZK backend yet, WIP!) due
> to frequent and repeated DB calls. ****
>
> ** **
>
> It'd be interesting to hear the kind of limitations u guys hit that cells
> resolved, instead of just fixing the underlying code itself to scale better.
> ****
>
> ** **
>
> -Josh****
>
> ** **
>
> *From: *Subbu Allamaraju <subbu at subbu.org>
> *Date: *Thursday, October 3, 2013 10:23 AM
> *To: *Tim Bell <Tim.Bell at cern.ch>
> *Cc: *"openstack at lists.openstack.org" <openstack at lists.openstack.org>
> *Subject: *Re: [Openstack] Cells use cases****
>
> ** **
>
> Hi Tim,****
>
> ** **
>
> Can you comment on scalability more? Are you referring to just the RPC
> layer in the control plane?****
>
> Subbu****
>
> ** **
>
>
> On Oct 3, 2013, at 8:53 AM, Tim Bell <Tim.Bell at cern.ch> wrote:****
>
>   ****
>
> At CERN, we’re running cells for scalability. When you go over 1000
> hypervisors or so, the general recommendation is to be in a cells
> configuration.****
>
>  ****
>
> Cells are quite complex and the full functionality is not there yet so
> some parts will need to wait for Havana.****
>
>  ****
>
> Tim****
>
>  ****
>
> *From:* Dmitry Ukov [mailto:dukov at mirantis.com <dukov at mirantis.com>]
> *Sent:* 03 October 2013 16:38
> *To:* openstack at lists.openstack.org
> *Subject:* [Openstack] Cells use cases****
>
>  ****
>
> Hello all,****
>
> I've really interested in  cells but unfortunately i can't find any useful
> use cases of them.****
>
> For instance I have 4 DCs and I need single entry point for them. In this
> case cells are  a bit complicated  solution. It's better to use multiple
> regions in keystone instead****
>
>  ****
>
> The only one good reason for cells, which I've found, is
> to organize so-called failure domains, i.e. scheduling on another DCs in
> case of failures.****
>
>  ****
>
> Does anyone have different use cases or vision on cells usage?****
>
> Thanks in advance.****
>
>  ****
>
> --
> Kind regards****
>
> Dmitry Ukov****
>
> IT Engineer****
>
> Mirantis, Inc.****
>
>  ****
>
>  _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131003/494cf981/attachment.html>


More information about the Openstack mailing list