[openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service

Day, Phil philip.day at hp.com
Fri Jan 24 18:37:06 UTC 2014

> Good points - thank you.  For arbitrary operations, I agree that it would be
> better to expose a token in the metadata service, rather than allowing the
> metadata service to expose unbounded amounts of API functionality.  We
> should therefore also have a per-instance token in the metadata, though I
> don't see Keystone getting the prerequisite IAM-level functionality for two+
> releases (?).
I can also see that in Neutron not all instances have access to the API servers.
so I'm not against having something in metadata providing its well-focused.

> In terms of information exposed: An alternative would be to try to connect
> to every IP in the subnet we are assigned; this blueprint can be seen as an
> optimization on that (to avoid DDOS-ing the public clouds).  

Well if you're on a Neutron private network then you'd only be DDOS-ing yourself.
In fact I think Neutron allows broadcast and multicast on private networks, and
as nova-net is going to be deprecated at some point I wonder if this is reducing
to a corner case ?

> So I've tried to
> expose only the information that enables directed scanning: availability zone,
> reservation id, security groups, network ids & labels & cidrs & IPs [example
> below].  A naive implementation will just try every peer; a smarter
> implementation might check the security groups to try to filter it, or the zone
> information to try to connect to nearby peers first.  Note that I don't expose
> e.g. the instance state: if you want to know whether a node is up, you have
> to try connecting to it.  I don't believe any of this information is at all
> sensitive, particularly not to instances in the same project.
Does it really need all of that - it seems that IP address would really be enough
and the agents or whatever in the instance could take it from there ?

What worried me most, I think, is that if we make this part of the standard
metadata then everyone would get it, and that raises a couple of concerns:

- Users with lots of instances (say 1000's) but who weren't trying to run any form 
of discovery would start getting a lot more metadata returned, which might cause
performance issues

- Some users might be running instances on behalf of customers (consider say a
PaaS type service where the user gets access into an instance but not to the
Nova API.   In that case I wouldn't want one instance to be able to discover these
kinds of details about other instances. 

So it kind of feels to me that this should be some other specific set of metadata
that instances can ask for, and that instances have to explicitly opt into. 

We already have a mechanism now where an instance can push metadata as a
way of Windows instances sharing their passwords - so maybe this could build
on that somehow - for example each instance pushes the data its willing to share
with other instances owned by the same tenant ?

> On external agents doing the configuration: yes, they could put this into user
> defined metadata, but then we're tied to a configuration system.  We have
> to get 20 configuration systems to agree on a common format (Heat, Puppet,
> Chef, Ansible, SaltStack, Vagrant, Fabric, all the home-grown systems!)  It
> also makes it hard to launch instances concurrently (because you want node
> #2 to have the metadata for node #1, so you have to wait for node #1 to get
> an IP).
Well you've kind of got to agree on a common format anyway haven't you
if the information is going to come from metadata ?   But I get your other points. 

> More generally though, I have in mind a different model, which I call
> 'configuration from within' (as in 'truth comes from within'). I don't want a big
> imperialistic configuration system that comes and enforces its view of the
> world onto primitive machines.  I want a smart machine that comes into
> existence, discovers other machines and cooperates with them.  This is the
> Netflix pre-baked AMI concept, rather than the configuration management
> approach.
> The blueprint does not exclude 'imperialistic' configuration systems, but it
> does enable e.g. just launching N instances in one API call, or just using an
> auto-scaling group.  I suspect the configuration management systems would
> prefer this to having to implement this themselves.

Yep, I get the concept, and metadata does seem like the best existing
mechanism to do this as its already available to all instances regardless of
where they are on the network, and it's a controlled interface.  I'd just like to
see it separate from the existing metadata blob, and on an opt-in basis.


More information about the OpenStack-dev mailing list