[Openstack] Greatest deployment?

Matt Joyce matt.joyce at cloudscaling.com
Wed May 30 17:06:49 UTC 2012


LXC has a few benefits.  As you likely are aware it is faster than a
traditional hypervisor.  But I'm willing to argue that the price paid for
that benefit makes it largely not worthwhile for HPC use cases where
openstack would see use.

Firstly, and foremost using LXC you immediately lose the ability for your
users to define their own compute environment, or choose from more than one
pre-existing compute environment templates.  That's huge.  That's one of
the primary benefits of cloud in this area over existing technologies.  You
eliminate that, and I have to ask why virtualize at all?

Secondly, while LXC does provide a lot of native access, it still does
paging management internally just as kvm does.  So direct memory management
( some HPC users like this ) becomes just as problematic as it is in kvm.
Lots of overhead.

Third, the generally espoused major benefit of LXC is Disk I/O.  If you are
using lustre or gluster... I don't see any reason you care all that much.
Mind you I've been unable to find benchmarks on lustre use under kvm vs
lxc.  My guess is LXC is faster.  My guess is the difference is probably
negligible when compared to the general cost of either LXC or KVM over a
grid / batch solution.  Additionally there's always hardware pinning.

And of course finally there is the LXC management issue.  LXC has been
known to cause a lot of grief for administrators if they hand out root to
their users.  Since the containers are directly calling the kernel, and not
working through a hypervisor they can conflict each other.  For instance if
one unloads NFS or something, it can impact other users access to NFS.

I guess depending on how you are performing your allocations this last
concern may not matter.  If you are allocating entire systems at a time to
users, then obviously this doesn't matter.  But that goes back to my first
point, if you are doing that, why not just push a physical image to box at
allocation time with pxe and call it a day?  And I remind you hardware
pinning can be done in kvm and xen, providing at least near native access
to some devices ( ie network ).

For the record I figure more than a couple folks are avoiding hypervisors
entirely for the time being.  For some loads that certainly makes sense.
Others, I think it's just a general aversion to overhead and not really
very strategic thinking.

The opinions expressed here, are entirely my own.

-Matt


On Tue, May 29, 2012 at 9:31 PM, Michael Chapman <michael.chapman at anu.edu.au
> wrote:

> Matt,
>
> LXC is not a good alternative for several obvious reasons.  So think on
> all of that.
> Could you expand on why you believe LXC is not a good alternative? As an
> HPC provider we're currently weighing up options to get the most we can out
> of our Openstack deployment performance-wise. In particular we have quite a
> bit of IB, a fairly large Lustre deployment and some GPUs, and are
> seriously considering going down the LXC route to try to avoid wasting all
> of that by putting a hypervisor on top.
>
>  - Michael Chapman
>
>
> On Fri, May 25, 2012 at 1:34 AM, Matt Joyce <matt.joyce at cloudscaling.com>wrote:
>
>> We did some considerable HPC testing when I worked over at NASA Ames with
>> the Nebula project.  So I think we may have been the first to try out
>> openstack in an HPC capacity.
>>
>> If you can find Piyush Mehrotra from the NAS division at Ames, ( I'll
>> leave it to you to look him up ) he has comprehensive OpenStack tests from
>> the Bexar days.  He'd probably be willing to share some of that data if
>> there was interest ( assuming he hasn't already ).
>>
>> Several points of interest I think worth mentioning are:
>>
>> I think fundamentally many of the folks who are used to doing HPC work
>> dislike working with hypervisors in general.  The memory management and
>> general i/o latency is something they find to be a bit intolerable.
>> OpenNebula, and OpenStack rely on the same sets of open source
>> hypervisors.  In fact, I believe OpenStack supports more.  What they do
>> fundamentally is operate as an orchestration layer on top of the hypervisor
>> layer of the stack.  So in terms of performance you should not see much
>> difference between the two at all.  That being said, that's ignoring the
>> possibility of scheduler customisation and the sort.
>>
>> We ultimately, much like Amazon HPC ended up handing over VMs to
>> customers that consumed all the resources on a system thus negating the
>> benefit of VMs by a large amount.  1 primary reason for this is pinning the
>> 10 gig drivers, or infiniband if you have it, to a single VM allows for
>> direct pass through and no hypervisor latency.  We were seeing a maximum
>> throughput on our 10 gigs of about 8-9 gbit with virtio / jumbo frames via
>> kvm, while hardware was slightly above 10.  Several vendors in the area I
>> have spoken with are engaged in efforts to tie in physical layer
>> provisioning with OpenStack orchestration to bypass the hypervisor
>> entirely.  LXC is not a good alternative for several obvious reasons.  So
>> think on all of that.
>>
>> GPUs are highly specialised.  Depending on your workloads you may not
>> benefit from them.  Again you have the hardware pinning issue in VMs.
>>
>> As far as Disk I/O is concerned, large datasets need large disk volumes.
>> Large non immutable disk volumes.  So swift / lafs go right out the
>> window.  nova-volume has some limitations ( or it did at the time ) euca
>> tools couldn't handle 1 TB volumes and the APT maxed out around 2.  So we
>> had users raiding their volumes and asking how to target them to nodes to
>> increase I/O.  This was sub optimal.  Luster or gluster would be better
>> options here.  We chose gluster because we've used luster before, and
>> anyone who has knows it's pain.
>>
>> As for node targeting users cared about specific families of cpus.  Many
>> people optimised by cpu and wanted to target westmeres of nehalems.  We had
>> no means to do that at the time.
>>
>> Scheduling full instances is somewhat easier so long as all the nodes in
>> your zone are full instance use only.
>>
>> Matt Joyce
>> Now at Cloudscaling
>>
>>
>>
>>
>> On Thu, May 24, 2012 at 5:49 AM, John Paul Walters <jwalters at isi.edu>wrote:
>>
>>> Hi,
>>>
>>> On May 24, 2012, at 5:45 AM, Thierry Carrez wrote:
>>>
>>> >
>>> >
>>> >> OpenNebula has also this advantage, for me, that it's designed also to
>>> >> provide scientific cloud and it's used by few research centres and
>>> even
>>> >> supercomputing centres. How about Openstack? Anyone tried deploy it in
>>> >> supercomputing environment? Maybe huge cluster or GPU cluster or any
>>> >> other scientific group is using Openstack? Is anyone using Openstack
>>> in
>>> >> scentific environement or Openstack's purpose is to create commercial
>>> >> only cloud (business - large and small companies)?
>>> >
>>> > OpenStack is being used in a number of research clouds, including
>>> NeCTAR
>>> > (Australia's national research cloud). There is huge interest around
>>> > bridging the gap there, with companies like Nimbis or Bull being
>>> involved.
>>> >
>>> > Hopefully people with more information than I have will comment on this
>>> > thread.
>>> >
>>> >
>>> We're developing GPU, bare metal, and large SMP (think SGI UV) support
>>> for Openstack and we're targeting HPC/scientific computing workloads.  It's
>>> a work in progress, but we have people using our code and we're talking to
>>> folks about getting our code onto nodes within FutureGrid.  We have GPU
>>> support for LXC right now, and we're working on adding support for other
>>> hypervisors as well.  We're also working on getting the code into shape for
>>> merging upstream, some of which (the bare metal work) has already been
>>> done.  We had an HPC session at the most recent Design Summit, and it was
>>> well-attended with lots of great input.  If there are specific features
>>> that you're looking for, we'd love to hear about it.
>>>
>>> By the way, all of our code is available at
>>> https://github.com/usc-isi/nova, so if you'd like to try it out before
>>> it gets merged upstream, go for it.
>>>
>>> best,
>>> JP
>>>
>>>
>>> _______________________________________________
>>> Mailing list: https://launchpad.net/~openstack
>>> Post to     : openstack at lists.launchpad.net
>>> Unsubscribe : https://launchpad.net/~openstack
>>> More help   : https://help.launchpad.net/ListHelp
>>>
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack at lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>>
>>
>
>
> --
> Michael Chapman
> *Cloud Computing Services*
> ANU Supercomputer Facility
> Room 318, Leonard Huxley Building (#56), Mills Road
> The Australian National University
> Canberra ACT 0200 Australia
> Tel: *+61 2 6125 7106*
> Web: http://nci.org.au
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20120530/54a22196/attachment.html>


More information about the Openstack mailing list