Today, cloud dominates in the IT world due to the many ways by which it has enabled businesses to solve their major IT concerns like cost, durability, flexibility, performance, security, operability, expertise, man effort, extendibility, scalability etc. The innovation of Cloud Computing and its associated technologies have revolutionized the IT world and most SMEs and almost all enterprises are heavily utilizing and harnessing the powers of cloud technologies.
Why scalability?
Businesses and projects are not meant to be static, but should be dynamic with growth, development and expansion. This growth process has adopted the IT infrastructure as its main channel. This is because of the tremendous capabilities the infrastructure provides to the business through multiple divisions of operation, research, administration, marketing, management etc. As a result of the targeted and non-targeted but evolutionary growth, businesses need to expand or scale their IT infrastructure to adapt it to the growing business needs. It is where cloud, again has proved its strength – through the easiness and flexibility by which it can be scaled up or down to match with the computational and data needs of the business. Scalability is the capability of an IT infrastructure to dynamically re-adjust itself to achieve the desired capability level by adding or removing various IT resources related to computation, storage, networking etc.
Scaling a cloud based IT infrastructure normally happens when one or more of the following (but not limited to) scenarios occur:
1. The business wants to expand its IT infrastructure for implementing a growth plan or the growth has demanded an expansion of the infrastructure.
2. The public and/or IT resources are experiencing high workloads due to new scenarios associated with policy changes, new marketing techniques, operational changes etc.
3. Business needs more resources for one or more of the business divisions like administration, operation etc.
4. An optimization plan demands change in quantity, quality and type of resources.
5. A peak load time or seasonal hike in traffic or computational requirements needs more resources.
5. An expansion – logical or geographical demands more capabilities.
Types of scaling
Scale Up or Vertical Scaling: this is needed at a more physical level, like adding more RAM or additional physical disks, cache, processors etc. to take the system from a lower to a higher level. It can also be like enhancing the server configuration, adding more physical units to the existing server or even replacing the current server with an entirely new and more powerful one. Normally this is needed when it is time to decommission the current server or a new RAID configuration is needed in place of current storage partitioning or more memory capability is needed to support the high computational requirements. Since it is at hardware or physical level, it involves extra cost in upgrading or purchasing new hardware, components, servers, etc.
Scale Out or Horizontal Scaling: this happens side to side. You will be adding more computational, storage or networking resources on a logical level through technologies like virtualization, elasticity, instances etc. The additional resources are added to the existing resource to increase the overall IT capability horizontally. A typical example is adding more compute or storage instances through the same physical server using virtualization with hypervisors. A load balancer is used to distribute loads across the connected resources and thus scaling is achieved. So a distributed or clustered type of interconnected architecture is best suitable for horizontal scaling. This type of scaling is more cost-effective, easy, rapid and flexible since it utilizes virtualization technology to get more resources on existing physical host and hence there is no need of additional hardware purchases, lengthy upgrades routines etc.
Scaling techniques
Auto-scaling is based on workload triggers that cause resource provisioning events to allocate more resources automatically and dynamically without manual intervention.
Scheduled scaling uses workload and resource utilization planning to schedule scaling of resources at specific points of time. This approach does not wait for scaling necessity to occur, but rather scale the system beforehand at scheduled time points. Any instantaneous arise of requirements are met with auto-scaling.
Predictive scaling analyzes network and resource usage patterns, trends as well as historic data to foresee the needed capacity at various periods like peak period, off-peak period, seasonal hike period etc and scale the cloud according to the capacity requirements.
What is a scalable architecture?
As evident from the above definition of types of scaling, scaling out or horizontal scaling is the most flexible and cost effective approach for scaling your cloud. The architecture of the cloud is an important factor in deciding the easiness and effectiveness of scaling. A loosely coupled, component and service based, pluggable architecture is the best option that allows effective scaling out. The application needs to be logically separated as components and services for independent scaling. For example an in-house utilized management component/service may need less compute, storage and network resources compared to the ecommerce component/service that receives heavy traffic and transaction requirements from the outside world. So resource allocation for these 2 components need not be same since only organizational staff, management and stakeholders only are the service consumers for the management component while general public including customers form a heavy client base for the ecommerce component. Here, the architecture and scaling framework must support component based resource allocation and scaling to optimize resource usage to prevent both over and under allocation. This type of service based scaling is also called product level scaling since the product requirements are the decisive factors in scaling.
Another aspect of a scalable architecture is its capability to scale at technology component level. Various technology components like the API server, transaction server, transaction DB, reporting DB, network interfaces, orchestration service, containerization service, etc. need to be scaled independently depending upon the requirements. For the above example, the transaction DB may need more resources compared to the reporting DB.
Auto-scaling in OpenStack Cloud
OpenStack, the popular open source cloud platform software is horizontally scalable and has built-in techniques for auto-scaling. OpenStack supports horizontal scaling through virtualization and load balancing. The OpenStack component services or instances can be quickly provisioned and recalled to implement scalability. OpenStack Heat, which is the orchestration engine for OpenStack performs auto-scaling upon triggers received from the metering componentOpenStack Ceilometer. Ceilometer, the telemetry service can be configured to set resource consumption alarms by the user. For example, when an alarm is set for 90% of usage of compute instances in the OpenStack, Ceilometer that monitors the usage, detects it when the load is 90% and triggers an alarm for Heat. Heat then auto-scale the cloud by adding more compute resources as specified by a Heat Orchestration Template (HOT). A bottom limit, say 25% usage, when detected by Ceilometer, triggers another alarm to Heat for scale-down. Heat then re-calls the previously allocated compute resources to restore the cloud to its minimum capacity. Users can define the upper and lower limits of usage or load that should trigger the scale-up and scale-down processes, using the HOT template.
Figure shows a scenario where additional nodes or instances for the apache httpd server are created and managed by the orchestration service and load balanced to distribute loads for achieving true scalability.
More about OpenStack Heat
OpenStack Heat can handle multiple instances through simple commands and HOT template files. What Heat can do to implement auto-scaling in OpenStack cloud are:
1. Create new instances from images.
2. Use the metadata service for configuration.
3. Use Ceilometer to create alarms for CPU/Memory/Network usage in instances.
4. Based on the alarms, perform events attached to these triggers for auto-scaling, such as deploying or terminating instances according to work load.
Below code shows an example Heat Orchestration Template:
heat_template_version: 2015-04-30
description: Simple template to deploy a single compute instance
    type: string
    label: Key Name
    description: Name of key-pair to be used for compute instance
    type: string
    label: Image ID
    description: Image to be used for compute instance
    type: string
    label: Instance Type
    description: Type of instance (flavor) to be used
    type: OS::Nova::Server
      key_name: { get_param: key_name }
      image: { get_param: image_id }
      flavor: { get_param: instance_type }
Heat is the built in orchestration engine with OpenStack that is responsible for orchestrating the cloud components using a declarative template format through an OpenStack-native REST API. The Heat Orchestration Template or HOT defines the cloud application infrastructure in YAML text files. Relationships between the various OpenStack cloud components are specified in HOT (Example: this volume is associated with this server, etc.). Using these template files, Heat makes OpenStack API calls to create the infrastructure in the desired order and configuration. Though the primary responsibility of Heat is to manage the infrastructure, the HOT integrate with other automation and configuration management tools like Puppet, Ansible, etc.
Auto-scaling is the mechanism that provides meaningful scaling service through on-demand and simultaneous scale up and down processes. The true potential of scalability of a cloud infrastructure is gained when resource allocation is optimized; that is without over or under allocation or utilization. OpenStack Heat and the Heat Orchestration Template is a great way of achieving auto-scaling in OpenStack cloud infrastructure, that optimizes resource usage as well as maintain a stable and growing production deployment.