[openstack-dev] [Nova] bp proposal: discovery of peer instances through metadata service

Justin Santa Barbara justin at fathomdb.com
Wed Jan 29 13:26:20 UTC 2014


Certainly my original inclination (and code!) was to agree with you Vish, but:

1) It looks like we're going to have writable metadata anyway, for
communication from the instance to the API.
2) I believe the restrictions make it impractical to abuse it as a
message-bus: size-limits, quotas and write-once make it very poorly
suited for anything queue like.
3) Anything that isn't opt-in will likely have security implications
which means that it won't get deployed.  This must be deployed to be
useful.

In short: I agree that it's not the absolute ideal solution (for me,
that would be no opt-in), but it feels like the best solution given
that we must have opt-in, or else e.g. HP won't deploy it.  It uses a
(soon to be) existing mechanism, and is readily extensible without
breaking APIs.

On your idea of scoping by security group, I believe a certain someone
is looking at supporting hierarchical projects, so we will likely need
to support more advanced logic here later anyway.  For example:  the
ability to specify whether an entry should be shared with instances in
child projects.  This will likely take the form of a sort of selector
language, so I anticipate we could offer a filter on security groups
as well if this is useful.  We might well also allow selection by
instance tags.  The approach allows this, though I would like to keep
it as simple as possible at first (share with other instances in
project or don't share)

Justin


On Tue, Jan 28, 2014 at 10:39 PM, Vishvananda Ishaya
<vishvananda at gmail.com> wrote:
>
> On Jan 28, 2014, at 12:17 PM, Justin Santa Barbara <justin at fathomdb.com> wrote:
>
>> Thanks John - combining with the existing effort seems like the right
>> thing to do (I've reached out to Claxton to coordinate).  Great to see
>> that the larger issues around quotas / write-once have already been
>> agreed.
>>
>> So I propose that sharing will work in the same way, but some values
>> are visible across all instances in the project.  I do not think it
>> would be appropriate for all entries to be shared this way.  A few
>> options:
>>
>> 1) A separate endpoint for shared values
>> 2) Keys are shared iff  e.g. they start with a prefix, like 'peers_XXX'
>> 3) Keys are set the same way, but a 'shared' parameter can be passed,
>> either as a query parameter or in the JSON.
>>
>> I like option #3 the best, but feedback is welcome.
>>
>> I think I will have to store the value using a system_metadata entry
>> per shared key.  I think this avoids issues with concurrent writes,
>> and also makes it easier to have more advanced sharing policies (e.g.
>> when we have hierarchical projects)
>>
>> Thank you to everyone for helping me get to what IMHO is a much better
>> solution than the one I started with!
>>
>> Justin
>
> I am -1 on the post data. I think we should avoid using the metadata service
> as a cheap queue for communicating across vms and this moves strongly in
> that direction.
>
> I am +1 on providing a list of ip addresses in the current security group(s)
> via metadata. I like limiting by security group instead of project because
> this could prevent the 1000 instance case where people have large shared
> tenants and it also provides a single tenant a way to have multiple autodiscoverd
> services. Also the security group info is something that neutron has access
> to so the neutron proxy should be able to generate the necessary info if
> neutron is in use.
>
> Just as an interesting side note, we put this vm list in way back in the NASA
> days as an easy way to get mpi clusters running. In this case we grouped the
> instances by the key_name used to launch the instance instead of security group.
> I don't think it occurred to us to use security groups at the time.  Note we
> also provided the number of cores, but this was for convienience because the
> mpi implementation didn't support discovering number of cores. Code below.
>
> Vish
>
> $ git show 2cf40bb3
> commit 2cf40bb3b21d33f4025f80d175a4c2ec7a2f8414
> Author: Vishvananda Ishaya <vishvananda at yahoo.com>
> Date:   Thu Jun 24 04:11:54 2010 +0100
>
>     Adding mpi data
>
> diff --git a/nova/endpoint/cloud.py b/nova/endpoint/cloud.py
> index 8046d42..74da0ee 100644
> --- a/nova/endpoint/cloud.py
> +++ b/nova/endpoint/cloud.py
> @@ -95,8 +95,21 @@ class CloudController(object):
>      def get_instance_by_ip(self, ip):
>          return self.instdir.by_ip(ip)
>
> +    def _get_mpi_data(self, project_id):
> +        result = {}
> +        for node_name, node in self.instances.iteritems():
> +            for instance in node.values():
> +                if instance['project_id'] == project_id:
> +                    line = '%s slots=%d' % (instance['private_dns_name'], instance.get('vcpus', 0))
> +                    if instance['key_name'] in result:
> +                        result[instance['key_name']].append(line)
> +                    else:
> +                        result[instance['key_name']] = [line]
> +        return result
> +
>      def get_metadata(self, ip):
>          i = self.get_instance_by_ip(ip)
> +        mpi = self._get_mpi_data(i['project_id'])
>          if i is None:
>              return None
>          if i['key_name']:
> @@ -135,7 +148,8 @@ class CloudController(object):
>                  'public-keys' : keys,
>                  'ramdisk-id': i.get('ramdisk_id', ''),
>                  'reservation-id': i['reservation_id'],
> -                'security-groups': i.get('groups', '')
> +                'security-groups': i.get('groups', ''),
> +                'mpi': mpi
>              }
>          }
>          if False: # TODO: store ancestor ids
>
>>
>>
>>
>>
>> On Tue, Jan 28, 2014 at 4:38 AM, John Garbutt <john at johngarbutt.com> wrote:
>>> On 27 January 2014 14:52, Justin Santa Barbara <justin at fathomdb.com> wrote:
>>>> Day, Phil wrote:
>>>>
>>>>>
>>>>>>> We already have a mechanism now where an instance can push metadata as
>>>>>>> a way of Windows instances sharing their passwords - so maybe this
>>>>>>> could
>>>>>>> build on that somehow - for example each instance pushes the data its
>>>>>>> willing to share with other instances owned by the same tenant ?
>>>>>>
>>>>>> I do like that and think it would be very cool, but it is much more
>>>>>> complex to
>>>>>> implement I think.
>>>>>
>>>>> I don't think its that complicated - just needs one extra attribute stored
>>>>> per instance (for example into instance_system_metadata) which allows the
>>>>> instance to be included in the list
>>>>
>>>>
>>>> Ah - OK, I think I better understand what you're proposing, and I do like
>>>> it.  The hardest bit of having the metadata store be full read/write would
>>>> be defining what is and is not allowed (rate-limits, size-limits, etc).  I
>>>> worry that you end up with a new key-value store, and with per-instance
>>>> credentials.  That would be a separate discussion: this blueprint is trying
>>>> to provide a focused replacement for multicast discovery for the cloud.
>>>>
>>>> But: thank you for reminding me about the Windows password though...  It may
>>>> provide a reasonable model:
>>>>
>>>> We would have a new endpoint, say 'discovery'.  An instance can POST a
>>>> single string value to the endpoint.  A GET on the endpoint will return any
>>>> values posted by all instances in the same project.
>>>>
>>>> One key only; name not publicly exposed ('discovery_datum'?); 255 bytes of
>>>> value only.
>>>>
>>>> I expect most instances will just post their IPs, but I expect other uses
>>>> will be found.
>>>>
>>>> If I provided a patch that worked in this way, would you/others be on-board?
>>>
>>> I like that idea. Seems like a good compromise. I have added my review
>>> comments to the blueprint.
>>>
>>> We have this related blueprints going on, setting metadata on a
>>> particular server, rather than a group:
>>> https://blueprints.launchpad.net/nova/+spec/metadata-service-callbacks
>>>
>>> It is limiting things using the existing Quota on metadata updates.
>>>
>>> It would be good to agree a similar format between the two.
>>>
>>> John
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list