OpenStack Cloud Design and Architecture Considerations

In a previous article, we have provided an Introduction to the OpenStack Technology, an overview of its architecture, service components and advantages. As we have specified, OpenStack is a cloud deployment and management system that allows administrators to control the cloud elements for computation, storage and networking and allows users to manage resource provisioning through the web interface. This article goes some more deep to explain the architecture and design guidelines for an OpenStack Cloud deployment that uses the Red Hat OpenStack platform to build a private or public IaaS cloud with Red Hat Enterprise Linux.

OpenStack Components or Core Services
The OpenStack cloud platform is implemented as a group of interrelated and interacting components for compute, storage and networking services. There are 9 core services with code names that provide these services. There are 3 methods by which administrators can manage and provision the OpenStack cloud and its components. These are through the web-based dashboard, command-line clients and through the OpenStack API. The 9 core services are:

Horizon: web-based Dashboard service to manage the OpenStack cloud resources and services.
Keystone: the centralized Identity service for the authentication and authorization of the OpenStack users, roles, services and projects.
Neutron: the Networking service that interconnects the various OpenStack services by providing connectivity between their interfaces.
Cinder: the Block Storage service that provides disk volumes for the OpenStack virtual machines.
Nova: the Compute service responsible for managing and provisioning OpenStack virtual machines on hypervisor nodes.
Glance: the Image registry service that manages the virtual machine images and volume snapshots that are taken from existing servers and are used as templates for new servers.
Swift: the Object Storage service for storing data, VM images, etc. as objects in the underlying file system.
Ceilometer: the Telemetry service that provides monitoring and measurement of the cloud resources.
Heat: the OpenStack Orchestration service that provides templates to create and manage cloud resources like storage, networking, instances etc.
There are other OpenStack services too, like Ironic that provisions bare-metal or physical machines, Trove that provides the Database as a Service functionality, Sahara that allows users to provision and manage Hadoop clusters on OpenStack, etc.

  • Text Hover
High level overview of the OpenStack core services. Image courtesy of Red Hat Enterprise Linux -

Architectural guidelines for designing an OpenStack Cloud Platform
When designing an OpenStack cloud platform, there are certain factors to be considered that have direct impact upon the configuration and resource allocation for the platform. The first one is the Duration of the project. Resources like vRAM, network traffic and storage will have noticeable changes with time. The more the duration, memory and storage need more resources. Also network traffic can be estimated based on an initial sample period and then taking into consideration, the spikes that can occur on peak periods. The frequency of usage has its effect on allocation of number of vCPUs, vRAM quantity and to determine the computational load.

Next design consideration is about the Compute Resources needed for the OpenStack cloud. The initial compute resources allocation can be based on the expected service and workload calculations. Additional resources can be provisioned later on demand. Since the instances come in different flavors, specific resource pools matching to the specific flavors should be designed so that they can be provisioned and used on-need. A consistent and common hardware design across resources in a pool enables maximum usage of available hardware. It also provides easy deployment and support.

Next consideration is Flavors, the resource template that determines the instance size and capacity. Unlike default flavors, user defined flavors can specify storage, swap disk, metadata to restrict usage and can allow to measure capacity forecasting. vCPU-to-physical CPU core ratio is another design factor. The default allocation ratio in RHE OpenStack Platform is 16 vCPUs per physical CPU core or per hyper threaded core. It can be reduced to 8 vCPUs if memory is low. This ratio has dependency on the total RAM available including the reserved 4GB system memory. An example allocation is 56 vCPUs for a host with 14 VMs and 64GB RAM.

In the Memory Overhead consideration, the VM memory overhead and KVM hypervisor memory overhead are to be considered. As examples, for a 256MB vRAM, 310MB physical is to be allocated, 512 Vs. 610 etc. A good estimation of hypervisor overhead is 100MB per VM. Another factor is Over-subscription, in which excessive number of VMs compared to the available memory on the host is allocated leading to poor performance. For example, a quad core CPU with 256GB of RAM and more than 200 1GB instances will cause performance issues. Therefore an optimal ratio of number of instances to available host memory is to be found out and allocated.

Regarding the Density factor, following points are to be considered: if instance density is lower, more hosts to support the compute needs are needed. A higher host density with dual-socket design can be reduced by using quad-socket designs. For data centers with older infrastructure and those with higher rack counts, it is important to reduce the power and cooling density. It is important to select the right Compute Hardware for performance and scalability of the cloud platform. Blade servers that support dual-socket, multi-core CPUs decreases server density compared to 1U rack-mounted servers. 2U rack-mounted servers take half the density than 1U servers. Large rack-mounted servers like 4U servers provide higher compute capacity and support 4 to 8 CPU sockets. These have much lower server density but come with a high cost. Sled rack-mounted servers support multiple independent servers in a single 2U or 3U enclosure but have high density than 1U or 2U rack-mounted servers.

When designing and selecting Storage Resources, following general factors are to be considered:
1. The applications must be compatible with the cloud based storage sub-system.
2. I/O performance benchmarks and data should be analyzed to find out platform behavior under different loads and storage should be accordingly selected.
3. The storage sub-system including the hardware should be inter-operable with the OpenStack components especially the KVM hypervisor.
4. A robust security design focused on SLAs, legal requirements, industry regulations, required certifications, compliance with needed standards like HIPAA, ISO9000, etc. and suitable access controls should be implemented.

Swift Object Storage Service
It should be designed such that these resource pools are sufficient enough for your object data needs. The rack-level and zone-level designs should satisfy the need of replicas required by the project. Each replica should be setup in its own availability zone with independent power, cooling and network resources available to that zone. Keep in mind that, even if the object data storage distributes data across the storage cluster, each partition cannot span more than one disk. So maximum number of partitions should always be less than number of disks.

Cinder Block Storage Service
A RAID configuration will be suitable for achieving redundancy. The hardware design and configuration should be same for all hardware nodes in the device pool. Apart from block storage resource needs, it should be taken into account that the service should provide high availability and redundancy for the APIs that provisions the access to the storage nodes. An additional load balancing layer is preferred to provide access to backend database services that services and stores state of block storage volumes. The storage hardware should be optimized for capacity, connectivity, cost-effective, direct attachment, scalability, fault tolerance and performance.

Network Resources
Network availability is crucial for the efficient functioning of the hypervisors in a cloud deployment. The cloud uses more peer-to-peer communication than a core network topology usually does. The VMs need to communicate with each other like they are on the same network. So, in order to deal with this additional overhead, OpenStack uses multiple network segments. Services are segregated to separate segments for security and to prevent unauthorized access. The OpenStack Networking Service – Neutron, is the core software-defined networking (SDN) component of the OpenStack platform. The general design considerations for Network resources include security, capacity planning, complexity, configuration, avoiding single point of failure, tuning etc.

For the OpenStack cloud platform performance requirements are related to network performance, compute resource performance and storage systems performance. Hardware load balancers can boost network performance by providing fast and reliable front-end services to cloud APIs. Hardware specifications and configurations along with other tunable parameters of the OpenStack components also influence performance. Using a dedicated data storage network with dedicated interfaces on the compute nodes also can improve performance. The controller nodes that provide services to the users and also assist in internal cloud operation also need to be carefully designed for optimal hardware and configurations. To avoid single point of failure, OpenStack services should be deployed over multiple servers with adequate backup capabilities.

Security considerations are organized on the basis of security domains, where a domain includes users, applications, servers and networks that share same user-access-authentication rules. These security domains are categorized as public, guest, management and data. The domain trust requirements depend on the nature of the cloud whether it is private, public or hybrid. For authentication, the Identity service Keystone can use technologies like LDAP for user management. The authentication API services should be placed behind hardware that performs SSL termination since it deals with sensitive information like user names, passwords and authentication tokens.