[Openstack] [Heat/Ceilometer/Havana]: Auto scaling no longer occurring after some time

Steven Dake sdake at redhat.com
Mon Feb 24 22:27:55 UTC 2014


Juha,

Copying Angus so he sees.  He wrote a big majority of the ceilometer + 
heat integration and might have a better idea of the details of the 
problem you face.

On 02/24/2014 01:27 AM, Juha Tynninen wrote:
> Hi,
>
> I'm having some problems concerning auto scaling feature.
> Any ideas?
>
> First scaling up and down is working just fine. But then when tested 
> later on scaling down/up is no longer working properly.
> Scaling down may occur even it shouldn't or scaling up doesn't occur 
> even it should. When in this situation I remove all the
> received metric data from the DB, auto scaling starts to work again.
>
> Ceilometer is configured to use Mongo and the auto scaling is based on 
> the cpu_util metrics.
>
> Related configurations:
> -----------------------
> /etc/ceilometer/pipeline.yaml on compute nodes:
>
> name: cpu_pipeline
> interval: 15
>
> /etc/ceilometer/ceilometer.conf on controller:
> evaluation_interval=15
>
> Heat template used:
> -------------------
>    "Resources" : {
>
>         "Group_A" : {
>             "Type" : "AWS::AutoScaling::AutoScalingGroup",
>             "Properties" : {
>                 "AvailabilityZones" : { "Fn::GetAZs" : ""},
>                 "LaunchConfigurationName" : { "Ref" : "Group_A_Config" },
>                 "MinSize" : "1",
>                 "MaxSize" : "3",
>                 "Tags" : [
>                   { "Key" : "metering.server_group", "Value" : 
> "Group_A" },
>                   { "Key" : "custom_metadata", "Value" : "test" }
>                 ],
>                 "VPCZoneIdentifier" : [ { "Ref" : "PrivateSubnetId" } ]
>             }
>         },
>
>         "Group_A_Config" : {
>             "Type" : "AWS::AutoScaling::LaunchConfiguration",
>             "Properties": {
>                 "ImageId" : { "Ref" : "ImageId" },
>                 "InstanceType" : { "Ref" : "InstanceType" },
>                 "KeyName" : { "Ref" : "KeyName" }
>             }
>         },
>
>         "ScaleUpPolicy" : {
>             "Type" : "AWS::AutoScaling::ScalingPolicy",
>             "Properties" : {
>                 "AdjustmentType" : "ChangeInCapacity",
>                 "AutoScalingGroupName" : { "Ref" : "Group_A" },
>                 "Cooldown" : "20",
>                 "ScalingAdjustment" : "1"
>             }
>         },
>
>         "ScaleDownPolicy" : {
>             "Type" : "AWS::AutoScaling::ScalingPolicy",
>             "Properties" : {
>                 "AdjustmentType" : "ChangeInCapacity",
>                 "AutoScalingGroupName" : { "Ref" : "Group_A" },
>                 "Cooldown" : "20",
>                 "ScalingAdjustment" : "-1"
>             }
>         },
>
> "CPUAlarmHigh": {
>             "Type": "OS::Ceilometer::Alarm",
>             "Properties": {
>                 "description": "Scale-up if CPU is greater than 90% 
> for 20 seconds",
>                 "meter_name": "cpu_util",
>                 "statistic": "avg",
>                 "period": "20",
>                 "evaluation_periods": "1",
>                 "threshold": "90",
>                 "alarm_actions":
>                     [ {"Fn::GetAtt": ["ScaleUpPolicy", "AlarmUrl"]} ],
>                 "matching_metadata":
>                     {"metadata.user_metadata.server_group": "Group_A" },
>                 "comparison_operator": "gt"
>             }
>         },
>
>         "CPUAlarmLow": {
>             "Type": "OS::Ceilometer::Alarm",
>             "Properties": {
>                 "description": "Scale-down if CPU is less than 50% for 
> 20 seconds",
>                 "meter_name": "cpu_util",
>                 "statistic": "avg",
>                 "period": "20",
>                 "evaluation_periods": "1",
>                 "threshold": "50",
>                 "alarm_actions":
>                     [ {"Fn::GetAtt": ["ScaleDownPolicy", "AlarmUrl"]} ],
>                 "matching_metadata":
>                     {"metadata.user_metadata.server_group": "Group_A" },
>                 "comparison_operator": "lt"
>         }
>
> In ceilometer logs I can see the following kind of warnings:
>
> <44>Feb 24 08:41:08 node-16 
> ceilometer-ceilometer.collector.dispatcher.database WARNING: message 
> signature invalid, discarding message: {u'counter_name': 
> u'instance.scheduled', u'user_id': None, u'message_signature': 
> u'd1b49ddf004edc5b7a8dc9405b42a71f2ae975d04c25838c3dc0ea0e6f6e4edd', 
> u'timestamp': u'2014-02-24 08:41:08.334580', u'resource_id': 
> u'48c815ab-01c9-4ac8-9096-ac171976598c', u'message_id': 
> u'67e611e4-9d2f-11e3-81f1-080027e519cb', u'source': u'openstack', 
> u'counter_unit': u'instance', u'counter_volume': 1, u'project_id': 
> u'efcca4ba425c4beda73eb31a54df931a', u'resource_metadata': 
> {u'instance_id': u'48c815ab-01c9-4ac8-9096-ac171976598c', 
> u'weighted_host': {u'host': u'node-18', u'weight': 3818.0}, u'host': 
> u'scheduler.node-16', u'request_spec': {u'num_instances': 1, 
> u'block_device_mapping': [{u'instance_uuid': 
> u'48c815ab-01c9-4ac8-9096-ac171976598c', u'guest_format': None, 
> u'boot_index': 0, u'delete_on_termination': True, u'no_device': None, 
> u'connection_info': None, u'volume_id': None, u'device_name': None, 
> u'disk_bus': None, u'image_id': 
> u'11848cbf-a428-4dfb-8818-2f0a981f540b', u'source_type': u'image', 
> u'device_type': u'disk', u'snapshot_id': None, u'destination_type': 
> u'local', u'volume_size': None}], u'image': {u'status': u'active', 
> u'name': u'cirrosImg', u'deleted': False, u'container_format': 
> u'bare', u'created_at': u'2014-02-12T08:46:04.000000', u'disk_format': 
> u'qcow2', u'updated_at': u'2014-02-12T08:46:04.000000', u'properties': 
> {}, u'min_disk': 0, u'min_ram': 0, u'checksum': 
> u'50bdc35edb03a38d91b1b071afb20a3c', u'owner': 
> u'efcca4ba425c4beda73eb31a54df931a', u'is_public': True, 
> u'deleted_at': None, u'id': u'11848cbf-a428-4dfb-8818-2f0a981f540b', 
> u'size': 9761280}, u'instance_type': {u'root_gb': 1, u'name': 
> u'm1.tiny', u'ephemeral_gb': 0, u'memory_mb': 512, u'vcpus': 1, 
> u'extra_specs': {}, u'swap': 0, u'rxtx_factor': 1.0, u'flavorid': 
> u'1', u'vcpu_weight': None, u'id': 2}, u'instance_properties': 
> {u'vm_state': u'building', u'availability_zone': None, 
> u'terminated_at': None, u'ephemeral_gb': 0, u'instance_type_id': 2, 
> u'user_data': 
>  u'Q29udGVudC1UeXBlOiBtdWx0aXBhcnQvbWl4ZWQ7IGJvdW5kYXJ5PSI9PT0
> ...
> , u'cleaned': False, u'vm_mode': None, u'deleted_at': None, 
> u'reservation_id': u'r-l91mh33v', u'id': 274, u'security_groups': 
> {u'objects': []}, u'disable_terminate': False, u'root_device_name': 
> None, u'display_name': 
> u'tyky-Group_A-55cklit7nvbq-Group_A-2-yis32na5m7ey', u'uuid': 
> u'48c815ab-01c9-4ac8-9096-ac171976598c', u'default_swap_device': None, 
> u'info_cache': {u'instance_uuid': 
> u'48c815ab-01c9-4ac8-9096-ac171976598c', u'network_info': []}, 
> u'hostname': u'tyky-group-a-55cklit7nvbq-group-a-2-yis32na5m7ey', 
> u'launched_on': None, u'display_description': 
> u'tyky-Group_A-55cklit7nvbq-Group_A-2-yis32na5m7ey', u'key_data': 
> u'ssh-rsa 
> AAAAB3NzaC1yc2EAAAADAQABAAABAQC39hmz8e40Xv/+QKkLyRA7j02RfIG61cr1j41RftnkOF3ZbwBzi7qibsOA3gC9Ln05YbB6z2/iUnQzxQsoOpmlnXuv2O296utY2ZCTKhdFSzn2Ot7l635zEXkivMc97wz4bITtaBTjX3nV6sXOfevdTIOJeC11SqxmfNRRzXcz9fRv6kLjz7IrA0tvRTp2xDVtFEj+vFLWaXc3TcUSygxiSLeAuNkH1rZ9jVuHXXvzb/e7navrGyJec2P86AQg2TUk77MhLjPcbyKiJJK0DhK6zOkZUWXtgIVQx7+gO/Xs2QgQHcw+VdzRzpJK+/EOzUOU8IDWNnyfaJEnQEoX2oMj 
> Generated by Nova\n', u'deleted': False, u'config_drive': u'', 
> u'power_state': 0, u'default_ephemeral_device': None, u'progress': 0, 
> u'project_id': u'efcca4ba425c4beda73eb31a54df931a', u'launched_at': 
> None, u'scheduled_at': None, u'node': None, u'ramdisk_id': u'', 
> u'access_ip_v6': None, u'access_ip_v4': None, u'kernel_id': u'', 
> u'key_name': u'heat_key', u'updated_at': None, u'host': None, 
> u'user_id': u'ef4e983291ef4ad1b88eb1f776bd52b6', u'system_metadata': 
> {u'instance_type_memory_mb': 512, u'instance_type_swap': 0, 
> u'instance_type_vcpu_weight': None, u'instance_type_root_gb': 1, 
> u'instance_type_name': u'm1.tiny', u'instance_type_id': 2, 
> u'instance_type_ephemeral_gb': 0, u'instance_type_rxtx_factor': 1.0, 
> u'image_disk_format': u'qcow2', u'instance_type_flavorid': u'1', 
> u'instance_type_vcpus': 1, u'image_container_format': u'bare', 
> u'image_min_ram': 0, u'image_min_disk': 1, u'image_base_image_ref': 
> u'11848cbf-a428-4dfb-8818-2f0a981f540b'}, u'task_state': 
> u'scheduling', u'shutdown_terminate': False, u'cell_name': None, 
> u'root_gb': 1, u'locked': False, u'name': u'instance-00000112', 
> u'created_at': u'2014-02-24T08:41:08.257534', u'locked_by': None, 
> u'launch_index': 0, u'memory_mb': 512, u'vcpus': 1, u'image_ref': 
> u'11848cbf-a428-4dfb-8818-2f0a981f540b', u'architecture': None, 
> u'auto_disk_config': False, u'os_type': None, u'metadata': 
> {u'metering.server_group': u'Group_A', u'AutoScalingGroupName': 
> u'tyky-Group_A-55cklit7nvbq', u'custom_metadata': u'test'}}, 
> u'security_group': [u'default'], u'instance_uuids': 
> [u'48c815ab-01c9-4ac8-9096-ac171976598c']}, u'event_type': 
> u'scheduler.run_instance.scheduled'}, u'counter_type': u'delta'}
>
> Also the following warnings/errors can be seen but they seem to occur 
> when auto scaling is properly working and have no negative effects as 
> such:
>
> <44>Feb 24 08:43:08 node-16 
> <U+FEFF>ceilometer-ceilometer.transformer.conversions WARNING: 
> dropping sample with no predecessor: <ceilometer.sample.Sample object 
> at 0x3774fd0>
> <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: 
> Publishing 1 samples on metering
> <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: 
> Publishing 1 samples on metering
> <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: 
> Publishing 1 samples on metering
> <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: 
> Publishing 1 samples on metering
> <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: 
> Publishing 1 samples on metering
> <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: 
> Publishing 1 samples on metering
> <44>Feb 24 08:43:09 node-16 ceilometer-ceilometer.publisher.rpc AUDIT: 
> Publishing 1 samples on metering
> <43>Feb 24 08:43:09 node-16 
> ceilometer-ceilometer.collector.dispatcher.database ERROR: Failed to 
> record metering data: not okForStor
> age
> Traceback (most recent call last):
>   File 
> "/usr/lib/python2.7/dist-packages/ceilometer/collector/dispatcher/database.py", 
> line 65, in record_metering_data
>     self.storage_conn.record_metering_data(meter)
>   File 
> "/usr/lib/python2.7/dist-packages/ceilometer/storage/impl_mongodb.py", 
> line 417, in record_metering_data
>     upsert=True,
>   File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 
> 487, in update
>     check_keys, self.__uuid_subtype), safe)
>   File "/usr/lib/python2.7/dist-packages/pymongo/mongo_client.py", 
> line 969, in _send_message
>     rv = self.__check_response_to_last_error(response)
>   File "/usr/lib/python2.7/dist-packages/pymongo/mongo_client.py", 
> line 911, in __check_response_to_last_error
>     raise OperationFailure(details["err"], details["code"])
> OperationFailure: not okForStorage
>
> Br,
> -Juha
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20140224/f1c0b5c4/attachment.html>


More information about the Openstack mailing list