[openstack-dev] [savanna] cluster scaling on the 0.2 branch

Jon Maron jmaron at hortonworks.com
Fri Aug 30 19:47:04 UTC 2013


I've done some additional debugging/testing, and the issue is definitely in the savanna provisioning code.

I have verified that the correct inputs are provided to the validate_scaling method invocation, and that those references remain unaltered.  The scaling request involves adding one node of a new node group named 'another', and adding one node to the existing 'slave' node group:

cluster.node_groups:

[<savanna.db.models.NodeGroup[object at 107b15f50] {created=datetime.datetime(2013, 8, 30, 19, 20, 49, 857213), updated=datetime.datetime(2013, 8, 30, 19, 20, 49, 857222), id=u'effcc91c-d0de-4508-84ba-9cedc7e321f6', name=u'master', flavor_id=u'3', image_id=None, node_processes=[u'NAMENODE', u'SECONDARY_NAMENODE', u'GANGLIA_SERVER', u'GANGLIA_MONITOR', u'AMBARI_SERVER', u'AMBARI_AGENT', u'JOBTRACKER', u'NAGIOS_SERVER'], node_configs={}, volumes_per_node=0, volumes_size=10, volume_mount_prefix=u'/volumes/disk', count=1, cluster_id=u'd3052854-8b56-47b6-b3c1-612750aab612', node_group_template_id=u'15344a5c-5e83-496a-9648-d7b58f40ad1f'}>, 
<savanna.db.models.NodeGroup[object at 107ca1750] {created=datetime.datetime(2013, 8, 30, 19, 20, 49, 860178), updated=datetime.datetime(2013, 8, 30, 19, 20, 49, 860184), id=u'b56a2e69-58d9-4e95-a54f-d9b994bc8515', name=u'slave', flavor_id=u'3', image_id=None, node_processes=[u'DATANODE', u'HDFS_CLIENT', u'GANGLIA_MONITOR', u'AMBARI_AGENT', u'TASKTRACKER', u'MAPREDUCE_CLIENT'], node_configs={}, volumes_per_node=0, volumes_size=10, volume_mount_prefix=u'/volumes/disk', count=1, cluster_id=u'd3052854-8b56-47b6-b3c1-612750aab612', node_group_template_id=u'5dd6aa5a-496c-4dda-b94c-3b3752eb0efb'}>]

additional:

{<savanna.db.models.NodeGroup[object at 107cc77d0] {created=None, updated=None, id=None, name=u'another', flavor_id=u'3', image_id=None, node_processes=[u'DATANODE', u'HDFS_CLIENT', u'GANGLIA_MONITOR', u'AMBARI_AGENT', u'TASKTRACKER', u'MAPREDUCE_CLIENT'], node_configs={}, volumes_per_node=0, volumes_size=10, volume_mount_prefix=u'/volumes/disk', count=1, cluster_id=None, node_group_template_id=u'f7f2ddc3-18ca-439f-9c08-570ff9307baf'}>: 1}

existing:

{u'slave': 2}

Once the scale_cluster() call is made, the cluster does have the additional node group, but the list of instances isn't correct:

cluster.node_groups (note the addition of the 'another' node group):

- [<savanna.db.models.NodeGroup[object at 107c9cad0] {created=datetime.datetime(2013, 8, 30, 19, 20, 49, 857213), updated=datetime.datetime(2013, 8, 30, 19, 20, 49, 857222), id=u'effcc91c-d0de-4508-84ba-9cedc7e321f6', name=u'master', flavor_id=u'3', image_id=None, node_processes=[u'NAMENODE', u'SECONDARY_NAMENODE', u'GANGLIA_SERVER', u'GANGLIA_MONITOR', u'AMBARI_SERVER', u'AMBARI_AGENT', u'JOBTRACKER', u'NAGIOS_SERVER'], node_configs={}, volumes_per_node=0, volumes_size=10, volume_mount_prefix=u'/volumes/disk', count=1, cluster_id=u'd3052854-8b56-47b6-b3c1-612750aab612', node_group_template_id=u'15344a5c-5e83-496a-9648-d7b58f40ad1f'}>, 
- <savanna.db.models.NodeGroup[object at 107c9cc90] {created=datetime.datetime(2013, 8, 30, 19, 20, 49, 860178), updated=datetime.datetime(2013, 8, 30, 19, 34, 51, 39463), id=u'b56a2e69-58d9-4e95-a54f-d9b994bc8515', name=u'slave', flavor_id=u'3', image_id=None, node_processes=[u'DATANODE', u'HDFS_CLIENT', u'GANGLIA_MONITOR', u'AMBARI_AGENT', u'TASKTRACKER', u'MAPREDUCE_CLIENT'], node_configs={}, volumes_per_node=0, volumes_size=10, volume_mount_prefix=u'/volumes/disk', count=2, cluster_id=u'd3052854-8b56-47b6-b3c1-612750aab612', node_group_template_id=u'5dd6aa5a-496c-4dda-b94c-3b3752eb0efb'}>, 
- <savanna.db.models.NodeGroup[object at 107cc7290] {created=datetime.datetime(2013, 8, 30, 19, 34, 49, 309577), updated=datetime.datetime(2013, 8, 30, 19, 34, 49, 309584), id=u'b8ea4e37-68d1-471d-9ddf-b74c2c533892', name=u'another', flavor_id=u'3', image_id=None, node_processes=[u'DATANODE', u'HDFS_CLIENT', u'GANGLIA_MONITOR', u'AMBARI_AGENT', u'TASKTRACKER', u'MAPREDUCE_CLIENT'], node_configs={}, volumes_per_node=0, volumes_size=10, volume_mount_prefix=u'/volumes/disk', count=1, cluster_id=u'd3052854-8b56-47b6-b3c1-612750aab612', node_group_template_id=u'f7f2ddc3-18ca-439f-9c08-570ff9307baf'}>]

However, only the instance for the existing node group is passed in:

[<savanna.db.models.Instance[object at 107cc9f50] {created=datetime.datetime(2013, 8, 30, 19, 34, 50, 727467), updated=datetime.datetime(2013, 8, 30, 19, 35, 36, 853529), extra=None, node_group_id=u'b56a2e69-58d9-4e95-a54f-d9b994bc8515', instance_id=u'59e4f689-5124-4205-8629-ad90ffc913d5', instance_name=u'scale-slave-002', internal_ip=u'192.168.32.8', management_ip=u'172.18.3.9', volumes=[]}>]

So clearly, the list of instances passed in is lacking the instance reference for the 'another' node group.  

I have filed https://bugs.launchpad.net/savanna/+bug/1219059. 

-- Jon


On Aug 29, 2013, at 8:22 AM, Nadezhda Privalova <nprivalova at mirantis.com> wrote:

> Hi Jon,
> 
> Unfortunately, I'm not able to reproduce this issue with vanilla plugin. The behavior you described is not correct. Here a json I used to repro an issue:
> 
> {
>             "add_node_groups": [
>                 {
>                     "name": "worker-tasktracker",
>                     "count":1,
>                     "node_processes": [
>                     "tasktracker"
>                 ],
>                 "flavor_id": "42"
>                 }
>             ],
>   
>             "resize_node_groups": [
>                 {
>                     "name": "worker-datanode",
>                     "count":2
>                 }
>             ]
>   
> }
> 
> I added 'print instances' in vanilla plugin's scaling cluster method. Here is a result:
> 
> [<savanna.db.models.Instance[object at 10e59fe90] {created=datetime.datetime(2013, 8, 29, 12, 9, 24, 307150), updated=datetime.datetime(2013, 8, 29, 12, 11, 45, 614216), extra=None, node_group_id=u'2b892060-b53e-4224-98ae-9ffff11aa014', instance_id=u'1f50ceca-ee84-477d-a730-79839af59d08', instance_name=u'np-oozie-old-0.2-worker-tasktracker-001', internal_ip=u'10.155.0.108', management_ip=u'172.18.79.248', volumes=[]}>, <savanna.db.models.Instance[object at 10e588590] {created=datetime.datetime(2013, 8, 29, 12, 9, 25, 751723), updated=datetime.datetime(2013, 8, 29, 12, 11, 45, 614467), extra=None, node_group_id=u'9935bf76-d08b-4cdd-ad20-bbcb2ea5666f', instance_id=u'67092d96-9808-4830-be4a-9f7f54e04b58', instance_name=u'np-oozie-old-0.2-worker-datanode-002', internal_ip=u'10.155.0.110', management_ip=u'172.18.79.254', volumes=[]}>]
>  
> So the behavior as expected.
> We may try to debug this together if you want. Please feel free to ping me.
> 
> Thanks,
> Nadya
> 
> 
> 
> 
> 
> 
> On Thu, Aug 29, 2013 at 5:49 AM, Jon Maron <jmaron at hortonworks.com> wrote:
> Hi,
> 
>   I am trying to back port the HDP scaling implementation to the 0.2 branch and have run into a number of differences.  At this point I am trying to figure out whether what I am observing is intended or symptoms of a bug.
> 
>   For a case in which I am adding one instance to an existing node group as well as an additional node group with one instance I am seeing the following arguments being passed to the scale_cluster method of the plugin:
> 
> - A cluster object that contains the following set of node groups:
> 
> [<savanna.db.models.NodeGroup[object at 10d8bdd90] {created=datetime.datetime(2013, 8, 28, 21, 50, 5, 208003), updated=datetime.datetime(2013, 8, 28, 21, 50, 5, 208007), id=u'd6fadb7a-367b-41ed-989c-af40af2d3e3d', name=u'master', flavor_id=u'3', image_id=None, node_processes=[u'NAMENODE', u'SECONDARY_NAMENODE', u'GANGLIA_SERVER', u'GANGLIA_MONITOR', u'AMBARI_SERVER', u'AMBARI_AGENT', u'JOBTRACKER', u'NAGIOS_SERVER'], node_configs={}, volumes_per_node=0, volumes_size=10, volume_mount_prefix=u'/volumes/disk', count=1, cluster_id=u'e086d444-2a0f-4105-8ef2-51c56cdb70d2', node_group_template_id=u'15344a5c-5e83-496a-9648-d7b58f40ad1f'}>, 
> <savanna.db.models.NodeGroup[object at 10d8bd950] {created=datetime.datetime(2013, 8, 28, 21, 50, 5, 210962), updated=datetime.datetime(2013, 8, 28, 22, 5, 1, 728402), id=u'672e5597-2a8d-4470-8f5d-8cc43c7bb28e', name=u'slave', flavor_id=u'3', image_id=None, node_processes=[u'DATANODE', u'HDFS_CLIENT', u'GANGLIA_MONITOR', u'AMBARI_AGENT', u'TASKTRACKER', u'MAPREDUCE_CLIENT'], node_configs={}, volumes_per_node=0, volumes_size=10, volume_mount_prefix=u'/volumes/disk', count=2, cluster_id=u'e086d444-2a0f-4105-8ef2-51c56cdb70d2', node_group_template_id=u'5dd6aa5a-496c-4dda-b94c-3b3752eb0efb'}>, 
> <savanna.db.models.NodeGroup[object at 10d897f90] {created=datetime.datetime(2013, 8, 28, 22, 4, 59, 871379), updated=datetime.datetime(2013, 8, 28, 22, 4, 59, 871388), id=u'880e1b17-f4e4-456d-8421-31bf8ef1fb65', name=u'slave2', flavor_id=u'1', image_id=None, node_processes=[u'DATANODE', u'HDFS_CLIENT', u'GANGLIA_MONITOR', u'AMBARI_AGENT', u'TASKTRACKER', u'MAPREDUCE_CLIENT'], node_configs={}, volumes_per_node=0, volumes_size=10, volume_mount_prefix=u'/volumes/disk', count=1, cluster_id=u'e086d444-2a0f-4105-8ef2-51c56cdb70d2', node_group_template_id=u'd67da924-792b-4558-a5cb-cb97bba4107f'}>]
>  
>   So it appears that the cluster is already configured with the three node groups (two original, one new) and the associated counts.
> 
> - The list of instances.  However, whereas the master branch was passing me two instances (one instance representing the addition to the existing group, one representing the new instance associated with the added node group), in the 0.2 branch I am only seeing one instance being passed (the one instance being added to the existing node group):
> 
> (Pdb) p instances
> [<savanna.db.models.Instance[object at 10d8bf050] {created=datetime.datetime(2013, 8, 28, 22, 5, 1, 725343), updated=datetime.datetime(2013, 8, 28, 22, 5, 47, 286665), extra=None, node_group_id=u'672e5597-2a8d-4470-8f5d-8cc43c7bb28e', instance_id=u'377694a2-a589-479b-860f-f1541d249624', instance_name=u'scale-slave-002', internal_ip=u'192.168.32.4', management_ip=u'172.18.3.5', volumes=[]}>]
> (Pdb) p len(instances)
> 1
> 
>   I am not certain why I am not getting a listing of instances representing the instances being added to the cluster as I do in the master branch.  Is this intended?  How do I obtain the instance reference for the instance being added to the new node group?
> 
> -- Jon
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130830/0833b87e/attachment.html>


More information about the OpenStack-dev mailing list