[Openstack] [Heat/Ceilometer/Havana]: Auto scaling no longer occurring after some time
Juha Tynninen
juha.tynninen at tieto.com
Tue Feb 25 06:51:27 UTC 2014
Hi,
Some update... I yesterday added "repeat_actions" : true -definition to
OS::Ceilometer::Alarm resources in the Heat template:
"CPUAlarmHigh": {
"Type": "OS::Ceilometer::Alarm",
"Properties": {
"description": "Scale-up if CPU is greater than 90% for 30
seconds",
"meter_name": "cpu_util",
"statistic": "avg",
"period": "30",
"evaluation_periods": "1",
"threshold": "90",
"alarm_actions":
[ {"Fn::GetAtt": ["ScaleUpPolicy", "AlarmUrl"]} ],
"matching_metadata":
{"metadata.user_metadata.server_group": "Group_A" },
"comparison_operator": "gt",
"repeat_actions" : true
}
},
"CPUAlarmLow": {
"Type": "OS::Ceilometer::Alarm",
"Properties": {
"description": "Scale-down if CPU is less than 50% for 30
seconds",
"meter_name": "cpu_util",
"statistic": "avg",
"period": "30",
"evaluation_periods": "1",
"threshold": "50",
"alarm_actions":
[ {"Fn::GetAtt": ["ScaleDownPolicy", "AlarmUrl"]} ],
"matching_metadata":
{"metadata.user_metadata.server_group": "Group_A" },
"comparison_operator": "lt",
"repeat_actions" : true
}
}
...and everything seemed to work fine. But now I just created a stack again
and generated some load inside the first VM started. Scaling up occurred,
but after that the system is now continuously scaling up and down the VMs
even the load situation doesn't change. Seems to be the "repeat_actions"
definitions didn't help after all...
Br,
-Juha
On 25 February 2014 00:27, Steven Dake <sdake at redhat.com> wrote:
> Juha,
>
> Copying Angus so he sees. He wrote a big majority of the ceilometer +
> heat integration and might have a better idea of the details of the problem
> you face.
>
>
> On 02/24/2014 01:27 AM, Juha Tynninen wrote:
>
> Hi,
>
> I'm having some problems concerning auto scaling feature.
> Any ideas?
>
> First scaling up and down is working just fine. But then when tested
> later on scaling down/up is no longer working properly.
> Scaling down may occur even it shouldn't or scaling up doesn't occur even
> it should. When in this situation I remove all the
> received metric data from the DB, auto scaling starts to work again.
>
> Ceilometer is configured to use Mongo and the auto scaling is based on
> the cpu_util metrics.
>
> Related configurations:
> -----------------------
> /etc/ceilometer/pipeline.yaml on compute nodes:
>
> name: cpu_pipeline
> interval: 15
>
> /etc/ceilometer/ceilometer.conf on controller:
> evaluation_interval=15
>
> Heat template used:
> -------------------
> "Resources" : {
>
> "Group_A" : {
> "Type" : "AWS::AutoScaling::AutoScalingGroup",
> "Properties" : {
> "AvailabilityZones" : { "Fn::GetAZs" : ""},
> "LaunchConfigurationName" : { "Ref" : "Group_A_Config" },
> "MinSize" : "1",
> "MaxSize" : "3",
> "Tags" : [
> { "Key" : "metering.server_group", "Value" : "Group_A" },
> { "Key" : "custom_metadata", "Value" : "test" }
> ],
> "VPCZoneIdentifier" : [ { "Ref" : "PrivateSubnetId" } ]
> }
> },
>
> "Group_A_Config" : {
> "Type" : "AWS::AutoScaling::LaunchConfiguration",
> "Properties": {
> "ImageId" : { "Ref" : "ImageId" },
> "InstanceType" : { "Ref" : "InstanceType" },
> "KeyName" : { "Ref" : "KeyName" }
> }
> },
>
> "ScaleUpPolicy" : {
> "Type" : "AWS::AutoScaling::ScalingPolicy",
> "Properties" : {
> "AdjustmentType" : "ChangeInCapacity",
> "AutoScalingGroupName" : { "Ref" : "Group_A" },
> "Cooldown" : "20",
> "ScalingAdjustment" : "1"
> }
> },
>
> "ScaleDownPolicy" : {
> "Type" : "AWS::AutoScaling::ScalingPolicy",
> "Properties" : {
> "AdjustmentType" : "ChangeInCapacity",
> "AutoScalingGroupName" : { "Ref" : "Group_A" },
> "Cooldown" : "20",
> "ScalingAdjustment" : "-1"
> }
> },
>
> "CPUAlarmHigh": {
> "Type": "OS::Ceilometer::Alarm",
> "Properties": {
> "description": "Scale-up if CPU is greater than 90% for 20
> seconds",
> "meter_name": "cpu_util",
> "statistic": "avg",
> "period": "20",
> "evaluation_periods": "1",
> "threshold": "90",
> "alarm_actions":
> [ {"Fn::GetAtt": ["ScaleUpPolicy", "AlarmUrl"]} ],
> "matching_metadata":
> {"metadata.user_metadata.server_group": "Group_A" },
> "comparison_operator": "gt"
> }
> },
>
> "CPUAlarmLow": {
> "Type": "OS::Ceilometer::Alarm",
> "Properties": {
> "description": "Scale-down if CPU is less than 50% for 20
> seconds",
> "meter_name": "cpu_util",
> "statistic": "avg",
> "period": "20",
> "evaluation_periods": "1",
> "threshold": "50",
> "alarm_actions":
> [ {"Fn::GetAtt": ["ScaleDownPolicy", "AlarmUrl"]} ],
> "matching_metadata":
> {"metadata.user_metadata.server_group": "Group_A" },
> "comparison_operator": "lt"
> }
>
> In ceilometer logs I can see the following kind of warnings:
>
> <44>Feb 24 08:41:08 node-16
> ceilometer-ceilometer.collector.dispatcher.database WARNING: message
> signature invalid, discarding message: {u'counter_name':
> u'instance.scheduled', u'user_id': None, u'message_signature':
> u'd1b49ddf004edc5b7a8dc9405b42a71f2ae975d04c25838c3dc0ea0e6f6e4edd',
> u'timestamp': u'2014-02-24 08:41:08.334580', u'resource_id':
> u'48c815ab-01c9-4ac8-9096-ac171976598c', u'message_id':
> u'67e611e4-9d2f-11e3-81f1-080027e519cb', u'source': u'openstack',
> u'counter_unit': u'instance', u'counter_volume': 1, u'project_id':
> u'efcca4ba425c4beda73eb31a54df931a', u'resource_metadata': {u'instance_id':
> u'48c815ab-01c9-4ac8-9096-ac171976598c', u'weighted_host': {u'host':
> u'node-18', u'weight': 3818.0}, u'host': u'scheduler.node-16',
> u'request_spec': {u'num_instances': 1, u'block_device_mapping':
> [{u'instance_uuid': u'48c815ab-01c9-4ac8-9096-ac171976598c',
> u'guest_format': None, u'boot_index': 0, u'delete_on_termination': True,
> u'no_device': None, u'connection_info': None, u'volume_id': None,
> u'device_name': None, u'disk_bus': None, u'image_id':
> u'11848cbf-a428-4dfb-8818-2f0a981f540b', u'source_type': u'image',
> u'device_type': u'disk', u'snapshot_id': None, u'destination_type':
> u'local', u'volume_size': None}], u'image': {u'status': u'active', u'name':
> u'cirrosImg', u'deleted': False, u'container_format': u'bare',
> u'created_at': u'2014-02-12T08:46:04.000000', u'disk_format': u'qcow2',
> u'updated_at': u'2014-02-12T08:46:04.000000', u'properties': {},
> u'min_disk': 0, u'min_ram': 0, u'checksum':
> u'50bdc35edb03a38d91b1b071afb20a3c', u'owner':
> u'efcca4ba425c4beda73eb31a54df931a', u'is_public': True, u'deleted_at':
> None, u'id': u'11848cbf-a428-4dfb-8818-2f0a981f540b', u'size': 9761280},
> u'instance_type': {u'root_gb': 1, u'name': u'm1.tiny', u'ephemeral_gb': 0,
> u'memory_mb': 512, u'vcpus': 1, u'extra_specs': {}, u'swap': 0,
> u'rxtx_factor': 1.0, u'flavorid': u'1', u'vcpu_weight': None, u'id': 2},
> u'instance_properties': {u'vm_state': u'building', u'availability_zone':
> None, u'terminated_at': None, u'ephemeral_gb': 0, u'instance_type_id': 2,
> u'user_data': u'Q29udGVudC1UeXBlOiBtdWx0aXBhcnQvbWl4ZWQ7IGJvdW5kYXJ5PSI9PT0
> ...
> , u'cleaned': False, u'vm_mode': None, u'deleted_at': None,
> u'reservation_id': u'r-l91mh33v', u'id': 274, u'security_groups':
> {u'objects': []}, u'disable_terminate': False, u'root_device_name': None,
> u'display_name': u'tyky-Group_A-55cklit7nvbq-Group_A-2-yis32na5m7ey',
> u'uuid': u'48c815ab-01c9-4ac8-9096-ac171976598c', u'default_swap_device':
> None, u'info_cache': {u'instance_uuid':
> u'48c815ab-01c9-4ac8-9096-ac171976598c', u'network_info': []}, u'hostname':
> u'tyky-group-a-55cklit7nvbq-group-a-2-yis32na5m7ey', u'launched_on': None,
> u'display_description':
> u'tyky-Group_A-55cklit7nvbq-Group_A-2-yis32na5m7ey', u'key_data': u'ssh-rsa
> AAAAB3NzaC1yc2EAAAADAQABAAABAQC39hmz8e40Xv/+QKkLyRA7j02RfIG61cr1j41RftnkOF3ZbwBzi7qibsOA3gC9Ln05YbB6z2/iUnQzxQsoOpmlnXuv2O296utY2ZCTKhdFSzn2Ot7l635zEXkivMc97wz4bITtaBTjX3nV6sXOfevdTIOJeC11SqxmfNRRzXcz9fRv6kLjz7IrA0tvRTp2xDVtFEj+vFLWaXc3TcUSygxiSLeAuNkH1rZ9jVuHXXvzb/e7navrGyJec2P86AQg2TUk77MhLjPcbyKiJJK0DhK6zOkZUWXtgIVQx7+gO/Xs2QgQHcw+VdzRzpJK+/EOzUOU8IDWNnyfaJEnQEoX2oMj
> Generated by Nova\n', u'deleted': False, u'config_drive': u'',
> u'power_state': 0, u'default_ephemeral_device': None, u'progress': 0,
> u'project_id': u'efcca4ba425c4beda73eb31a54df931a', u'launched_at': None,
> u'scheduled_at': None, u'node': None, u'ramdisk_id': u'', u'access_ip_v6':
> None, u'access_ip_v4': None, u'kernel_id': u'', u'key_name': u'heat_key',
> u'updated_at': None, u'host': None, u'user_id':
> u'ef4e983291ef4ad1b88eb1f776bd52b6', u'system_metadata':
> {u'instance_type_memory_mb': 512, u'instance_type_swap': 0,
> u'instance_type_vcpu_weight': None, u'instance_type_root_gb': 1,
> u'instance_type_name': u'm1.tiny', u'instance_type_id': 2,
> u'instance_type_ephemeral_gb': 0, u'instance_type_rxtx_factor': 1.0,
> u'image_disk_format': u'qcow2', u'instance_type_flavorid': u'1',
> u'instance_type_vcpus': 1, u'image_container_format': u'bare',
> u'image_min_ram': 0, u'image_min_disk': 1, u'image_base_image_ref':
> u'11848cbf-a428-4dfb-8818-2f0a981f540b'}, u'task_state': u'scheduling',
> u'shutdown_terminate': False, u'cell_name': None, u'root_gb': 1, u'locked':
> False, u'name': u'instance-00000112', u'created_at':
> u'2014-02-24T08:41:08.257534', u'locked_by': None, u'launch_index': 0,
> u'memory_mb': 512, u'vcpus': 1, u'image_ref':
> u'11848cbf-a428-4dfb-8818-2f0a981f540b', u'architecture': None,
> u'auto_disk_config': False, u'os_type': None, u'metadata':
> {u'metering.server_group': u'Group_A', u'AutoScalingGroupName':
> u'tyky-Group_A-55cklit7nvbq', u'custom_metadata': u'test'}},
> u'security_group': [u'default'], u'instance_uuids':
> [u'48c815ab-01c9-4ac8-9096-ac171976598c']}, u'event_type':
> u'scheduler.run_instance.scheduled'}, u'counter_type': u'delta'}
>
> Also the following warnings/errors can be seen but they seem to occur
> when auto scaling is properly working and have no negative effects as such:
>
> <44>Feb 24 08:43:08 node-16
> <U+FEFF>ceilometer-ceilometer.transformer.conversions WARNING: dropping
> sample with no predecessor: <ceilometer.sample.Sample object at 0x3774fd0>
> <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT:
> Publishing 1 samples on metering
> <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT:
> Publishing 1 samples on metering
> <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT:
> Publishing 1 samples on metering
> <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT:
> Publishing 1 samples on metering
> <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT:
> Publishing 1 samples on metering
> <44>Feb 24 08:43:08 node-16 ceilometer-ceilometer.publisher.rpc AUDIT:
> Publishing 1 samples on metering
> <44>Feb 24 08:43:09 node-16 ceilometer-ceilometer.publisher.rpc AUDIT:
> Publishing 1 samples on metering
> <43>Feb 24 08:43:09 node-16
> ceilometer-ceilometer.collector.dispatcher.database ERROR: Failed to record
> metering data: not okForStor
> age
> Traceback (most recent call last):
> File
> "/usr/lib/python2.7/dist-packages/ceilometer/collector/dispatcher/database.py",
> line 65, in record_metering_data
> self.storage_conn.record_metering_data(meter)
> File
> "/usr/lib/python2.7/dist-packages/ceilometer/storage/impl_mongodb.py", line
> 417, in record_metering_data
> upsert=True,
> File "/usr/lib/python2.7/dist-packages/pymongo/collection.py", line 487,
> in update
> check_keys, self.__uuid_subtype), safe)
> File "/usr/lib/python2.7/dist-packages/pymongo/mongo_client.py", line
> 969, in _send_message
> rv = self.__check_response_to_last_error(response)
> File "/usr/lib/python2.7/dist-packages/pymongo/mongo_client.py", line
> 911, in __check_response_to_last_error
> raise OperationFailure(details["err"], details["code"])
> OperationFailure: not okForStorage
>
> Br,
> -Juha
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20140225/29a61649/attachment.html>
More information about the Openstack
mailing list