[openstack-hpc] Looking for practical Openstack + Ceph guidance for shared HPC

Joshua Dotson josh at knoesis.org
Thu Jul 25 20:06:30 UTC 2013


Hello.

A contingent of my organization, the Kno.e.sis Center @ Wright State
University <http://www.knoesis.org/>, recently received a grant award which
we intend to use to support a handful of mid-size HPC-style workloads (MPI
<-- definitely, GPGPU <-- if possible/plausible) in addition to many
mid-size IaaS-style workloads (MongoDB, Storm, Hadoop, many others).  As a
third layer, I'm playing with the idea of evaluating an elastic OpenShift
Origin atop the same infrastructure.  Approximately $400k to $500k will
hopefully be available for this deployment, though exact numbers are not
yet available to me.

While I'm prepared to build a home-grown small-to-mid-size "classical" HPC,
using modern hardware, and a smaller silo for home-grown Openstack for the
minority stakeholders, I am hoping to find ways of making proponents of
both workloads simultaneously happy, or close to it.  That is, I would like
to give my computer scientist users a friendly method of running their
HPC-style jobs on a combined performance-tuned silo of Openstack.  Doing so
would load-balance the procured hardware and infrastructure with the users
who want a Tomcat or a Virtuoso instance.

I see a number of serious issues realizing such a goal.  For example, the
state of Infiniband vs. Openstack seems not quite
ready/available/documented/accessible for such use in production, unless
I'm just blind to the right blogs.  The added myriad abstractions and
latency virtualization might impose on an HPC task, not to mention cloud
software defined networking (Quantum, especially when sans hardware
acceleration), seem likely to really get in the way of practicality,
economics and efficiency.  That said, most of what we do here isn't HPC, so
I believe such trade-offs can be agreed upon, if a reasonable job
scheduling and workload management mechanism can be found and agreed upon
by all stake holders, grant proposal majority (HPC) and minority (IaaS)
alike.

I get the impression from my readings that HPC-style deployment (separate
from job queuing) against the EC2 API should work.  I don't have a good
feeling that the experience would be particularly friendly, however,
without paying for closed source applications.  I'm thinking a
high-performance Ceph install would help bring up the storage end of things
in a modern open-source CoTS way.  I've not done specific research on
Lustre + Openstack, but no reports of such a setup have presented
themselves to me, either.

These blue sky ideas matter nil, it seems, if a sufficiently-large
high-performance production-quality Openstack deployment is beyond the
funds to be allotted, which is something else I'm working on.  I've built
smallish but useful virt-manager, oVirt and Openstack environments here
already, but none of them are enough for the very-important HPC job
proposed for this grant.  The scientist running the proposed computation
gave me following information to clarify what would parity (for his job
only) his experience running the computation with an external HPC service
provider.

   - MPI
   - 20 Gbps Infiniband compute interconnect
   - 600 cores (those currently used are probably G4 Opteron 2.4 Ghz)
   - 4 GB RAM per core
   - at lest 2 TB shared storage, though I'm thinking we need much more for
   use by our general community
   - unsure of the storage networking topology

We're in the shopping phase for this grant award and are still playing with
ideas.  It seems likely to snap back into an old-school HPC, at this time.
I've fielded communication about our needs to a number of Openstack and
hardware providers, on the hope that they can bring something helpful to
the table.
Please let know if you can point me in the right direction(s). I'm up to
reading whatever text is thrown at me on this topic.  :-)

Thanks,
Joshua
-- 
Joshua M. Dotson
Systems Administrator
Kno.e.sis Center
Wright State University - Dayton, OH
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-hpc/attachments/20130725/d3413db8/attachment.html>


More information about the OpenStack-HPC mailing list