Open Stack

Fri Sep 13 09:44:26 UTC 2013

Thanks for your comments let me explain a bit more about Hadoop topology.

In Hadoop 1.2 version,  4 level topologies were introduced: all network,
rack, node group (represent Hadoop nodes on the same compute host in the
simplest case) and node. Usually Hadoop has replication factor 3. In this
case Hadoop placement algorithm is trying to put a HDFS block in the local
node or local node group, second replica should be placed outside the node
group, but on the same rack, and the last replica outside the initial rack.
Topology is defined by the path to vm e.g.

/datacenter1/rack1/host1/vm1
/datacenter1/rack1/host1/vm2
/datacenter1/rack1/host2/vm1
/datacenter1/rack1/host2/vm2
/datacenter1/rack2/host3/vm1
/datacenter1/rack2/host3/vm2
....

Also, this information will be used for job routing, to place the mapper as
closest as possible to the data.

The main idea to provide this information to Hadoop. Usually it direct
mapping between physical data center structure and Hadoop node placement,
but the case of public center the some abstract names will be fine if this
configuration a reflex a proximity information for Hadoop nodes.

Mike as I understand  holistic scheduler can provide needed information.
Can you give more details about it?

On Fri, Sep 13, 2013 at 11:54 AM, John Garbutt <john at johngarbutt.com> wrote:

> Exposing the detailed info in private cloud, sure makes sense. For
> public clouds, not so sure. Would be nice to find something that works
> for both.
>
> We let the user express their intent through the instance groups api.
> The scheduler will then do a best effort to meet that criteria, using
> its private information. At a courser grain, we have availability
> zones, that you could use to express "closeness", and probably often
> give you a good measure of closeness anyway.
>
> So a Hadoop user could request a several small groups of VMs defined
> in instance groups to be close, and maybe spread across different
> availability zones.
>
> Would that do the trick? Or does Hadoop/HDFS need a bit more
> granularity than that? Could it look to auto-detect "closeness" in
> some auto-setup phase, given rough user hints?
>
> John
>
> On 13 September 2013 07:40, Alex Glikson <GLIKSON at il.ibm.com> wrote:
> > If I understand correctly, what really matters at least in case of
> Hadoop is
> > network proximity between instances.
> > Hence, maybe Neutron would be a better fit to provide such information.
> In
> > particular, depending on virtual network configuration, having 2
> instances
> > on the same node does not guarantee that the network traffic between them
> > will be routed within the node.
> > Physical layout could be useful for availability-related purposes. But
> even
> > then, it should be abstracted in such a way that it will not reveal
> details
> > that a cloud provider will typically prefer not to expose. Maybe this
> can be
> > done by Ironic -- or a separate/new project (Tuskar sounds related).
> >
> > Regards,
> > Alex
> >
> >
> >
> >
> > From:        Mike Spreitzer <mspreitz at us.ibm.com>
> > To:        OpenStack Development Mailing List
> > <openstack-dev at lists.openstack.org>,
> > Date:        13/09/2013 08:54 AM
> > Subject:        Re: [openstack-dev] [nova] [savanna] Host information for
> > non        admin        users
> > ________________________________
> >
> >
> >
> >> From: Nirmal Ranganathan <rnirmal at gmail.com>
> >> ...
> >> Well that's left upto the specific block placement policies in hdfs,
> >> all we are providing with the topology information is a hint on
> >> node/rack placement.
> >
> > Oh, you are looking at the placement of HDFS blocks within the fixed
> storage
> > volumes, not choosing where to put the storage volumes.  In that case I
> > understand and agree that simply providing identifiers from the
> > infrastructure to the middleware (HDFS) will suffice.  Coincidentally my
> > group is working on this very example right now in our own environment.
>  We
> > have a holistic scheduler that is given a whole template to place, and it
> > returns placement information.  We imagine, as does Hadoop, a general
> > hierarchy in the physical layout, and the holistic scheduler returns, for
> > each VM, the path from the root to the VM's host.
> >
> > Regards,
> >
> > Mike_______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130913/ef619cf4/attachment.html>

Open Stack

[openstack-dev] [nova] [savanna] Host information for non admin users

OpenStack

Community

Documentation

Branding & Legal