[openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service

Justin Santa Barbara justin at fathomdb.com
Fri Jan 24 15:43:23 UTC 2014


Good points - thank you.  For arbitrary operations, I agree that it would
be better to expose a token in the metadata service, rather than allowing
the metadata service to expose unbounded amounts of API functionality.  We
should therefore also have a per-instance token in the metadata, though I
don't see Keystone getting the prerequisite IAM-level functionality for
two+ releases (?).

However, I think I can justify peer discovery as the 'one exception'.
 Here's why: discovery of peers is widely used for self-configuring
clustered services, including those built in pre-cloud days.
 Multicast/broadcast used to be the solution, but cloud broke that.  The
cloud is supposed to be about distributed systems, yet we broke the primary
way distributed systems do peer discovery. Today's workarounds are pretty
terrible, e.g. uploading to an S3 bucket, or sharing EC2 credentials with
the instance (tolerable now with IAM, but painful to configure).  We're not
talking about allowing instances to program the architecture (e.g. attach
volumes etc), but rather just to do the equivalent of a multicast for
discovery.  In other words, we're restoring some functionality we took away
(discovery via multicast) rather than adding programmable-infrastructure
cloud functionality.

We expect the instances to start a gossip protocol to determine who is
actually up/down, who else is in the cluster, etc.  As such, we don't need
accurate information - we only have to help a node find one living peer.
 (Multicast/broadcast was not entirely reliable either!)  Further, instance
#2 will contact instance #1, so it doesn’t matter if instance #1 doesn’t
have instance #2 in the list, as long as instance #2 sees instance #1.  I'm
relying on the idea that instance launching takes time > 0, so other
instances will be in the starting state when the metadata request comes in,
even if we launch instances simultaneously.  (Another reason why I don't
filter instances by state!)

I haven't actually found where metadata caching is implemented, although
the constructor of InstanceMetadata documents restrictions that really only
make sense if it is.  Anyone know where it is cached?

In terms of information exposed: An alternative would be to try to connect
to every IP in the subnet we are assigned; this blueprint can be seen as an
optimization on that (to avoid DDOS-ing the public clouds).  So I’ve tried
to expose only the information that enables directed scanning: availability
zone, reservation id, security groups, network ids & labels & cidrs & IPs
[example below].  A naive implementation will just try every peer; a
smarter implementation might check the security groups to try to filter it,
or the zone information to try to connect to nearby peers first.  Note that
I don’t expose e.g. the instance state: if you want to know whether a node
is up, you have to try connecting to it.  I don't believe any of this
information is at all sensitive, particularly not to instances in the same
project.

On external agents doing the configuration: yes, they could put this into
user defined metadata, but then we're tied to a configuration system.  We
have to get 20 configuration systems to agree on a common format (Heat,
Puppet, Chef, Ansible, SaltStack, Vagrant, Fabric, all the home-grown
systems!)  It also makes it hard to launch instances concurrently (because
you want node #2 to have the metadata for node #1, so you have to wait for
node #1 to get an IP).

More generally though, I have in mind a different model, which I call
'configuration from within' (as in 'truth comes from within'). I don’t want
a big imperialistic configuration system that comes and enforces its view
of the world onto primitive machines.  I want a smart machine that comes
into existence, discovers other machines and cooperates with them.  This is
the Netflix pre-baked AMI concept, rather than the configuration management
approach.

The blueprint does not exclude 'imperialistic' configuration systems, but
it does enable e.g. just launching N instances in one API call, or just
using an auto-scaling group.  I suspect the configuration management
systems would prefer this to having to implement this themselves.

(Example JSON below)

Justin

---

Example JSON:

[
    {
        "availability_zone": "nova",
        "network_info": [
            {
                "id": "e60bbbaf-1d2e-474e-bbd2-864db7205b60",
                "network": {
                    "id": "f2940cd1-f382-4163-a18f-c8f937c99157",
                    "label": "private",
                    "subnets": [
                        {
                            "cidr": "10.11.12.0/24",
                            "ips": [
                                {
                                    "address": "10.11.12.4",
                                    "type": "fixed",
                                    "version": 4
                                }
                            ],
                            "version": 4
                        },
                        {
                            "cidr": null,
                            "ips": [],
                            "version": null
                        }
                    ]
                }
            }
        ],
        "reservation_id": "r-44li8lxt",
        "security_groups": [
            {
                "name": "default"
            }
        ],
        "uuid": "2adcdda2-561b-494b-a8f6-378b07ac47a4"
    },

*… (the above is repeated for every instance)…*
]




On Fri, Jan 24, 2014 at 8:43 AM, Day, Phil <philip.day at hp.com> wrote:
> Hi Justin,
>
>
>
> I can see the value of this, but I’m a bit wary of the metadata service
> extending into a general API – for example I can see this extending into a
> debate about what information needs to be made available about the
instances
> (would you always want all instances exposed, all details, etc) – if not
> we’d end up starting to implement policy restrictions in the metadata
> service and starting to replicate parts of the API itself.
>
>
>
> Just seeing instances launched before me doesn’t really help if they’ve
been
> deleted (but are still in the cached values) does it ?
>
>
>
> Since there is some external agent creating these instances, why can’t
that
> just provide the details directly as user defined metadata ?
>
>
>
> Phil
>
>
>
> From: Justin Santa Barbara [mailto:justin at fathomdb.com]
> Sent: 23 January 2014 16:29
> To: OpenStack Development Mailing List
> Subject: [openstack-dev] [Nova] bp proposal: discovery of peer instances
> through metadata service
>
>
>
> Would appreciate feedback / opinions on this blueprint:
> https://blueprints.launchpad.net/nova/+spec/first-discover-your-peers
>
>
>
> The idea is: clustered services typically run some sort of gossip
protocol,
> but need to find (just) one peer to connect to.  In the physical
> environment, this was done using multicast.  On the cloud, that isn't a
> great solution.  Instead, I propose exposing a list of instances in the
same
> project, through the metadata service.
>
>
>
> In particular, I'd like to know if anyone has other use cases for instance
> discovery.  For peer-discovery, we can cache the instance list for the
> lifetime of the instance, because it suffices merely to see instances that
> were launched "before me".  (peer1 might not join to peer2, but peer2 will
> join to peer1).  Other use cases are likely much less forgiving!
>
>
> Justin
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140124/6f84f6cd/attachment.html>


More information about the OpenStack-dev mailing list