/
AutoScaling in a Heat Stack

AutoScaling in a Heat Stack

Sometimes we may need to have a stack that can respond when a group of servers are using a lot or little resources, such as memory usage. For example if a group of servers exceed a given memory usage threshold, we want that group of resources to scale up. This documentation will go through autoscaling, and how autoscaling could be implemented in a Heat stack.

To create an autoscaling stack we need:

  • AutoScaling Group: A group of servers defined so that the number of servers in the group and be increased or decreased.

  • Alarms: Alarms created using OpenStack Aodh to monitor the resource usage of the VMs in the autoscaling group. For example, we could create an alarm to monitor memory usage and alarm if the autoscaling group exceeds the alarm’s threshold.

  • Scaling Policies: Policies which are executed when an Aodh Alarm is triggered. When an alarm is triggered, the scaling policy attached to that alarm will instruct the autoscaling group to change in size, either increasing or decreasing the number of VMs.

Heat Resources

This section will cover the resources available in Heat that are required for creating an autoscaling stack.

AutoScaling involves resources from:

  • Heat: For creating an autoscaling group and defining the scaling policies

  • Aodh: For alarm creation

  • Gnocchi: For metrics that are used in threshold alarms

OS::Heat::AutoScalingGroup

This is an autoscaling group which can scale resources. This group can create the desired number of similar resources and we can define the minimum and maximum count for the given resource.

the_resource: type: OS::Heat::AutoScalingGroup properties: #required max_size: Integer # maximum number of resources in the group min_size: Integer # minimum number of resources in the group resource: {...} # resource definition for the resources in the group, written in HOT (Heat Orchestrated Template) format #optional desired_capacity: Integer # desired initial number of resources cooldown: Integer # cool down period in seconds rolling_updates: {"min_in_service": Integer, "max_batch_size": Integer, "pause_time": Number} # policy for rolling updates in the group, defaults to: {"min_in_service": 0, "max_batch_size": 1, "pause_time": 0} # min_in_service: minimum number of resources in service while rolling updates are executed # max_batch_size: maximum number of resources to replace at once # pause_time: number of seconds to wait between batches of updates

For example:

autoscaling-group: type: OS::Heat::AutoScalingGroup properties: min_size: 1 max_size: 3 resource: type: server.yaml #Refers to a Heat Template for creating a VM properties: flavor: {get_param: flavor} image: {get_param: image} key_name: {get_param: key_name} network: {get_param: network} metadata: {"metering.server_group": {get_param: "OS::stack_id"}}

OS::Heat::ScalingPolicy

the_resource: type: OS::Heat::ScalingPolicy properties: # required adjustment_type: String # Type of adjustment. Allowed values: “change_in_capacity”, “exact_capacity”, “percent_change_in_capacity” auto_scaling_group_id: String # AutoScaling Group ID to apply policy to scaling_adjustment: Number # Size of adjustment # Optional cooldown: Number # cooldown period, in seconds min_adjustment_step: Integer # minimum number of resources that are added or removed when the AutoScalingGroup scales up or down. Only used if specifying percent_change_in_capacity for adjustment_type property

For example:

scaleup_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: autoscaling-group} cooldown: 60 scaling_adjustment: 1 scaledown_policy: type: OS::Heat::ScalingPolicy properties: adjustment_type: change_in_capacity auto_scaling_group_id: {get_resource: autoscaling-group} cooldown: 60 scaling_adjustment: -1

OS::Aodh::GnocchiAggregationByResourcesAlarm

This resource creates an alarm as an aggregation of resources alarm. This alarm is a threshold alarm monitoring the aggregated metrics of the members of the autoscaling group defined above. Gnocchi provides the metrics which Aodh uses to determine whether an alarm should be triggered.

the_resource: type: OS::Aodh::GnocchiAggregationByResourcesAlarm properties: # required metric: String # metric name watched by the alarm query: String # query to filter the metrics resource_type: String # resource type threshold: Number # threshold to evaluate against # optional aggregation_method: String # method to compare to the threshold alarm_actions: [Value, Value, ...] # list of webhooks to invoke when state transitions to alarm alarm_queues: [String, String, ...] # list of Zaqar queues to post to when state transitions to alarm comparison_operator: String # operator used to compare specified statistic with threshold. Allowed values: “le”, “ge”, “eq”, “lt”, “gt”, “ne” description: String # alarm description enabled: Boolean # Defaults to true. Determines if alarm evaluation is enabled evaluation_periods: Integer # number of periods to evaluate over granularity: Integer # time range in seconds insufficient_data_actions: [Value, Value, ...] # list of webhooks to invoke when state transitions to insufficient data insufficient_data_queues: [String, String, ...] # list of Zaqar queues to post to when state transitions to alarm ok_actions: [Value, Value, ...] # list of webhooks to invoke when state transitions to ok ok_queues: [String, String, ...] # list of Zaqar queues to post to when state transitions to ok repeat_actions: Boolean # Defaults to True. False to trigger actions when the threshold is reached AND the alarm has changed state severity: String # severity of alarm. Allowed values: “low”, “moderate”, “critical” time_constraints: [{"name": String, "start": String, "description": String, "duration": Integer, "timezone": String}, {"name": String, "start": String, "description": String, "duration": Integer, "timezone": String}, ...] # Describe time constraints for alarm, defaults to [] # description: description for time constraints # duration: duration for time constraint # name: name for time constraint # start: start time for time constraint. A CRON expression property # timezone: Timezone for the time constraint.

For example, for our autoscaling stack we could define the alarms in the following way:

memory_alarm_high: type: OS::Aodh::GnocchiAggregationByResourcesAlarm properties: description: Scale up if memory > 1000 MB metric: memory.usage aggregation_method: mean granularity: 300 evaluation_periods: 1 threshold: 1000 resource_type: instance comparison_operator: gt query: list_join: - '' - - {'=': {server_group: {get_param: "OS::stack_id"}}} alarm_actions: - get_attr: [scaleup_policy, signal_url] memory_alarm_low: type: OS::Aodh::GnocchiAggregationByResourcesAlarm properties: description: Scale down if memory < 200MB metric: memory.usage aggregation_method: mean granularity: 300 evaluation_periods: 1 threshold: 200 resource_type: instance comparison_operator: lt query: list_join: - '' - - {'=': {server_group : {get_param: "OS::stack_id"}}} alarm_actions: - get_attr: [scaledown_policy, signal_url]

 

References

Related content