[openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service
Murray, Paul (HP Cloud Services)
pmurray at hp.com
Fri Jan 24 19:04:09 UTC 2014
Hi Justin,
It's nice to see someone bringing this kind of thing up. Seeding discovery is a handy primitive to have.
Multicast is not generally used over the internet, so the comment about removing multicast is not really justified, and any of the approaches that work there could be used. Alternatively your instances could use the nova or neutron APIs to obtain any information you want - if they are network connected - but certainly whatever is starting them has access, so something can at least provide the information.
I agree that the metadata service is a sensible alternative. Do you imagine your instances all having access to the same metadata service? Is there something more generic and not tied to the architecture of a single openstack deployment?
Although this is a simple example, it is also the first of quite a lot of useful primitives that are commonly provided by configuration services. As it is possible to do what you want by other means (including using an implementation that has multicast within subnets - I'm sure neutron does actually have this), it seems that this makes less of a special case and rather a requirement for a more general notification service?
Having said that I do like this kind of stuff :)
Paul.
From: Justin Santa Barbara [mailto:justin at fathomdb.com]
Sent: 24 January 2014 15:43
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service
Good points - thank you. For arbitrary operations, I agree that it would be better to expose a token in the metadata service, rather than allowing the metadata service to expose unbounded amounts of API functionality. We should therefore also have a per-instance token in the metadata, though I don't see Keystone getting the prerequisite IAM-level functionality for two+ releases (?).
However, I think I can justify peer discovery as the 'one exception'. Here's why: discovery of peers is widely used for self-configuring clustered services, including those built in pre-cloud days. Multicast/broadcast used to be the solution, but cloud broke that. The cloud is supposed to be about distributed systems, yet we broke the primary way distributed systems do peer discovery. Today's workarounds are pretty terrible, e.g. uploading to an S3 bucket, or sharing EC2 credentials with the instance (tolerable now with IAM, but painful to configure). We're not talking about allowing instances to program the architecture (e.g. attach volumes etc), but rather just to do the equivalent of a multicast for discovery. In other words, we're restoring some functionality we took away (discovery via multicast) rather than adding programmable-infrastructure cloud functionality.
We expect the instances to start a gossip protocol to determine who is actually up/down, who else is in the cluster, etc. As such, we don't need accurate information - we only have to help a node find one living peer. (Multicast/broadcast was not entirely reliable either!) Further, instance #2 will contact instance #1, so it doesn't matter if instance #1 doesn't have instance #2 in the list, as long as instance #2 sees instance #1. I'm relying on the idea that instance launching takes time > 0, so other instances will be in the starting state when the metadata request comes in, even if we launch instances simultaneously. (Another reason why I don't filter instances by state!)
I haven't actually found where metadata caching is implemented, although the constructor of InstanceMetadata documents restrictions that really only make sense if it is. Anyone know where it is cached?
In terms of information exposed: An alternative would be to try to connect to every IP in the subnet we are assigned; this blueprint can be seen as an optimization on that (to avoid DDOS-ing the public clouds). So I've tried to expose only the information that enables directed scanning: availability zone, reservation id, security groups, network ids & labels & cidrs & IPs [example below]. A naive implementation will just try every peer; a smarter implementation might check the security groups to try to filter it, or the zone information to try to connect to nearby peers first. Note that I don't expose e.g. the instance state: if you want to know whether a node is up, you have to try connecting to it. I don't believe any of this information is at all sensitive, particularly not to instances in the same project.
On external agents doing the configuration: yes, they could put this into user defined metadata, but then we're tied to a configuration system. We have to get 20 configuration systems to agree on a common format (Heat, Puppet, Chef, Ansible, SaltStack, Vagrant, Fabric, all the home-grown systems!) It also makes it hard to launch instances concurrently (because you want node #2 to have the metadata for node #1, so you have to wait for node #1 to get an IP).
More generally though, I have in mind a different model, which I call 'configuration from within' (as in 'truth comes from within'). I don't want a big imperialistic configuration system that comes and enforces its view of the world onto primitive machines. I want a smart machine that comes into existence, discovers other machines and cooperates with them. This is the Netflix pre-baked AMI concept, rather than the configuration management approach.
The blueprint does not exclude 'imperialistic' configuration systems, but it does enable e.g. just launching N instances in one API call, or just using an auto-scaling group. I suspect the configuration management systems would prefer this to having to implement this themselves.
(Example JSON below)
Justin
---
Example JSON:
[
{
"availability_zone": "nova",
"network_info": [
{
"id": "e60bbbaf-1d2e-474e-bbd2-864db7205b60",
"network": {
"id": "f2940cd1-f382-4163-a18f-c8f937c99157",
"label": "private",
"subnets": [
{
"cidr": "10.11.12.0/24<http://10.11.12.0/24>",
"ips": [
{
"address": "10.11.12.4",
"type": "fixed",
"version": 4
}
],
"version": 4
},
{
"cidr": null,
"ips": [],
"version": null
}
]
}
}
],
"reservation_id": "r-44li8lxt",
"security_groups": [
{
"name": "default"
}
],
"uuid": "2adcdda2-561b-494b-a8f6-378b07ac47a4"
},
... (the above is repeated for every instance)...
]
On Fri, Jan 24, 2014 at 8:43 AM, Day, Phil <philip.day at hp.com<mailto:philip.day at hp.com>> wrote:
> Hi Justin,
>
>
>
> I can see the value of this, but I'm a bit wary of the metadata service
> extending into a general API - for example I can see this extending into a
> debate about what information needs to be made available about the instances
> (would you always want all instances exposed, all details, etc) - if not
> we'd end up starting to implement policy restrictions in the metadata
> service and starting to replicate parts of the API itself.
>
>
>
> Just seeing instances launched before me doesn't really help if they've been
> deleted (but are still in the cached values) does it ?
>
>
>
> Since there is some external agent creating these instances, why can't that
> just provide the details directly as user defined metadata ?
>
>
>
> Phil
>
>
>
> From: Justin Santa Barbara [mailto:justin at fathomdb.com<mailto:justin at fathomdb.com>]
> Sent: 23 January 2014 16:29
> To: OpenStack Development Mailing List
> Subject: [openstack-dev] [Nova] bp proposal: discovery of peer instances
> through metadata service
>
>
>
> Would appreciate feedback / opinions on this blueprint:
> https://blueprints.launchpad.net/nova/+spec/first-discover-your-peers
>
>
>
> The idea is: clustered services typically run some sort of gossip protocol,
> but need to find (just) one peer to connect to. In the physical
> environment, this was done using multicast. On the cloud, that isn't a
> great solution. Instead, I propose exposing a list of instances in the same
> project, through the metadata service.
>
>
>
> In particular, I'd like to know if anyone has other use cases for instance
> discovery. For peer-discovery, we can cache the instance list for the
> lifetime of the instance, because it suffices merely to see instances that
> were launched "before me". (peer1 might not join to peer2, but peer2 will
> join to peer1). Other use cases are likely much less forgiving!
>
>
> Justin
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org<mailto:OpenStack-dev at lists.openstack.org>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140124/b398768c/attachment.html>
More information about the OpenStack-dev
mailing list