We are in need of more advanced multi-tier scheduling capability to properly place multiple streaming tiers on heterogeneous hardware. The top level must be scheduled like Storm or S4, as a coarse grained data flow, but individual operators must be scheduled on specialized hardware pools. We are looking at integrating this into the Nova scheduler, so that it is available as a standard capability of an OpenStack cloud. So, giving the context of this StarCluster project, what would be the pros and cons of forward porting an old cluster scheduler instead of incorporating this into the Nova scheduler itself? On Fri, 18 Apr 2014 16:38:35 -0400, Jonathan Proulx <jon@jonproulx.com> wrote: Hi All,
For those who don't know me I deployed and run the OpenStack cloud at MIT CSAIL (http://www.csail.mit.edu)
I'm currently working with Justin Riley (address in the "To" header of this email) who's the primary developer for StarCluster [http://star.mit.edu/cluster/index.html] on porting StarCluster to OpenStack. We have some operational and use case questions before we get too far down the implementation rathole^H^H^H^H^H^H^H path.
I've sent this to openstack-hpc and some select bcc's to get a sense of who might be interest and what opinions you have about how the port should be implemented.
What is it?
If your not familiar, StarCluster is is an open source cluster-computing toolkit for Amazon’s Elastic Compute Cloud (EC2) released under the LGPL license. It has been designed to automate and simplify the process of building, configuring, and managing clusters of virtual machines on Amazon’s EC2 cloud. StarCluster allows anyone to easily create a cluster computing environment in the cloud suited for distributed and parallel computing applications and systems.
It's main target audience is domain scientists who want to setup a SGE, Condor, Hadoop, and a few other sorts of cluster in "the cloud".
Its current implementation is basically a config file driven CLI that uses locally stored ssh keys for managing running the cluster. It has a fair sized user community and people seem to like it, which is why one of my users introduced Justin and I so I could provide a place for Justin to work and my user could get StarCluster on our private cloud.
Were is it at?
The CLI "basically works" on OpenStack now, though so far no end-users have touched it so not quite ready for public beta. If you're really interested in early code I'm sure Justin will be happy to share if you ask nicely.
He has also been working on a Horizon dashboard which would be an additional feature for the OpenStack version (there is no EC2 GUI).
The BIG QUESTIONS?
Do you think this is something that would be useful to your users?
To enable full functionality in the Horizon dashboard the dashboard app needs access to a private key to access the running (virtual) cluster nodes as root.
Waving hands over implementation details, and assuming you're interested in this functionality of you're this far into the email. Is storing key material a show stopper in your environment? In other words would you rather just fall back to the CLI with it's local ~/.ssh/id_rsa (or equivalent) for privileged operations?
My Opinion...
I don't like centrally storing crypto keys, but my horizon runs on my controller node so it is already a fairly privileged zone, and I don't necessarily trust my users to store their key material even as well as this could be done. So while a bit cautious about the details I think it is an acceptable risk in my (admittedly permissive) environment.
Thoughts? -Jon
_______________________________________________ OpenStack-HPC mailing list OpenStack-HPC@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-hpc