Enhancing Persistent Storage in Amazon EKS Leveraging EBS for Stateful Workloads

As an AWS DevOps Engineer, I frequently assist clients migrating their applications to Kubernetes environments like Amazon Elastic Kubernetes Service (EKS). While Kubernetes excels in managing stateless applications, many production-grade workloads require persistent storage to retain critical data across container and instance restarts. A major challenge is ensuring that data remains intact when ephemeral containers or EC2 instances are terminated and replaced within an autoscaling environment.

In such situations, I recommend Amazon Elastic Block Store (EBS) as a robust solution for ensuring data persistence in EKS clusters. EBS integrates seamlessly with Kubernetes through Persistent Volumes (PV) and Persistent Volume Claims (PVC), providing a reliable method to preserve stateful data, which is crucial for many applications on EKS. Below, I delve into the common use cases, practical implementations, and the strategic advantages of EBS in managing data persistence in EKS clusters.

The Need for Persistent Storage in EKS

In Kubernetes, EC2 instances acting as worker nodes are part of an autoscaling group, which means these instances are inherently ephemeral. When terminated, any locally stored data is lost. This is not problematic for stateless applications, such as microservices or stateless web applications that don’t retain state. However, stateful applications like databases, logging systems, or data analytics platforms require reliable data persistence. The ability to store and access data even after instance terminations or container rescheduling is vital for maintaining application consistency and operational integrity.

Why Use Amazon EBS in EKS?

Amazon EBS is a durable, scalable, and high-performance block storage service optimized for use with EC2 instances. For EKS, EBS offers several advantages:

  • Data Durability: EBS ensures that data remains accessible independently of EC2 instance lifecycles, maintaining data integrity across node replacements and container restarts.

  • Scalability: EBS allows for dynamic adjustment of storage size and performance by selecting appropriate volume types tailored to your workload requirements.

  • Performance: EBS volumes are engineered to deliver high throughput, with options such as General Purpose SSD (gp2/gp3) or Provisioned IOPS (io1/io2) volumes to meet varying performance needs.

Key Applications of EBS in EKS

  1. Stateful Databases
    A prevalent application of EBS in EKS is for stateful databases. For example, deploying a MySQL or PostgreSQL database in an EKS cluster requires data persistence to manage transaction logs, user data, and application state. EBS volumes can be attached to database pods via Kubernetes Persistent Volumes (PV) to ensure that data remains intact across pod rescheduling and EC2 instance lifecycle events.
  • Deployment Case Study: In a project for an e-commerce platform, we ensured that their PostgreSQL database, running on EKS, maintained customer order and inventory data during node replacements and pod rescheduling. By leveraging EBS volumes attached to the database pods, we achieved continuous data availability and ensured high availability across the cluster’s lifecycle.
  1. Centralized Logging and Monitoring
    For applications generating substantial logs, such as web servers or monitoring agents, persistent storage is crucial for log retention and analysis. EBS volumes provide a reliable solution to store logs, ensuring they are preserved even if the underlying node is terminated or the container is rescheduled. This setup aids in centralized logging, compliance auditing, and troubleshooting.
  • Implementation Insight: I assisted an organization in establishing a centralized logging architecture on EKS using Elasticsearch and Fluentd. By mounting EBS volumes on each worker node, we ensured that logs were preserved across instance failures and scaling events. This approach provided the organization with reliable access to logs for in-depth analysis and auditing, maintaining data continuity despite instance replacements.
  1. Machine Learning and Data Analytics
    Large-scale data processing, including machine learning (ML) and big data analytics, often necessitates persistent access to extensive datasets. EBS volumes are well-suited for storing large datasets required by ML models and data analytics jobs running on EKS. This ensures data accessibility and durability throughout long-running processing tasks.
  • Solution Example: While working with a client in the healthcare sector, we implemented an ML training pipeline on EKS that required access to large patient datasets. The datasets were stored on EBS volumes and mounted to the processing pods, ensuring consistent data availability and reliability during model training and subsequent pod restarts. This approach significantly enhanced the efficiency and stability of their ML training workflows.
  1. Backup and Disaster Recovery
    EBS supports snapshotting, enabling data to be backed up to Amazon S3 for long-term storage or disaster recovery. These snapshots can be quickly restored or replicated across regions, providing robust data protection and recovery options.
  • Disaster Recovery Strategy: A financial institution required a resilient backup and recovery solution for its transactional data hosted on EKS. By scheduling periodic EBS snapshots and storing them in S3, we ensured that transaction logs and critical data were safeguarded. In the event of a failure, the data could be swiftly restored from these snapshots, reducing downtime and mitigating potential data loss.
  1. High-Performance Applications
    Certain high-performance enterprise applications, such as large-scale databases or high-frequency trading platforms, demand high IOPS and consistent performance. Provisioned IOPS (io1/io2) EBS volumes are designed to meet these performance requirements, ensuring low latency and high throughput.
  • Performance Optimization Example: For a client in the financial sector, we deployed a high-frequency trading platform on EKS, which required extremely fast and reliable storage for transaction data. We configured Provisioned IOPS EBS volumes to meet the platform’s stringent performance needs, ensuring minimal latency and optimal operational efficiency.

Best Practices for Using EBS in EKS

  • Define a Storage Class: Use Kubernetes Storage Classes to dynamically provision EBS volumes as needed. This approach simplifies the management of persistent volumes across your cluster.

  • Optimize Volume Sizing: Select appropriate EBS volume sizes and types to balance performance and cost. Ensure that your storage configuration aligns with your application’s requirements.

  • Automate Snapshots: Implement automated EBS snapshot scheduling using AWS Backup or Lambda functions to maintain up-to-date backups and facilitate quick recovery.

  • Monitor Performance: Utilize AWS monitoring tools like CloudWatch to track EBS volume performance and verify that your application meets its throughput and IOPS requirements.

EBS Mounting to EC2 Instance under EKS Cluster

To effectively integrate EBS with EKS, mounting EBS volumes to EC2 instances within the EKS cluster is essential. This integration ensures that persistent storage is available for workloads running in the cluster. Below are the practical steps and commands to achieve this

Step 1: Create an EBS Volume

By using Below Command:

aws ec2 create-volume –availability-zone <your-availability-zone> –size <volume-size> –volume-type <volume-type>

Replace <your-availability-zone>, <volume-size>, and <volume-type> with appropriate values.

Step 2: Attach the EBS Volume to Worker Nodes

Manually attach the EBS volume to your worker nodes, or use an automated script.

Step 3: Update IAM Roles

Ensure the IAM role associated with your EKS worker nodes has the necessary permissions. You may need to add the following permissions:

{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“ec2:AttachVolume”,
“ec2:DetachVolume”
],
“Resource”: “*”
}
]
}

Step 4: Mount the EBS Volume in a Pod

  1. 1. Create a PersistentVolume (PV):

Yaml Script:

apiVersion: v1
kind: PersistentVolume
metadata:
name: ebs-pv
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
– ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: manual
awsElasticBlockStore:
volumeID: vol-0123456789abcdef0
fsType: ext4

  1. 2. Create a PersistentVolumeClaim (PVC):

Yaml Script:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ebs-pvc
spec:
accessModes:
– ReadWriteOnce
storageClassName: manual
resources:
requests:
storage: 10Gi

  1. 3. Use the PVC in a Pod:

Yaml Script:

apiVersion: v1
kind: Pod
metadata:
name: ebs-app
spec:
containers:
– name: app
image: nginx
volumeMounts:
– mountPath: “/mnt/data”
name: ebs-volume
volumes:
– name: ebs-volume
persistentVolumeClaim:
claimName: ebs-pvc

Example YAML Script for Mounting EBS Volume

Yaml Script:

apiVersion: v1
kind: PersistentVolume
metadata:
name: ebs-pv
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
– ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: manual
awsElasticBlockStore:
volumeID: vol-0123456789abcdef0
fsType: ext4

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ebs-pvc
spec:
accessModes:
– ReadWriteOnce
storageClassName: manual
resources:
requests:
storage: 10Gi

apiVersion: v1
kind: Pod
metadata:
name: ebs-app
spec:
containers:
– name: app
image: nginx
volumeMounts:
– mountPath: “/mnt/data”
name: ebs-volume
volumes:
– name: ebs-volume
persistentVolumeClaim:
claimName: ebs-pvc

Apply the YAML File

By using below command:

kubectl apply -f ebs-pv-pvc-pod.yaml

Verify the PVC and PV Status

Ensure the PersistentVolumeClaim (PVC) and PersistentVolume (PV) are in the “Bound” state:

By using below command:

kubectl get pvc
kubectl get pv

Verify the Pod Status

Check the status of the Pod to ensure it is running:

By using below command:

kubectl get pods

Verify File Creation in the Pod

Exec into the Pod to check the mount point and verify the file created by the initialization command:

kubectl exec -it ebs-app — /bin/sh
ls /mnt/data

Conclusion

In my experience as an AWS DevOps Engineer, I’ve found that reliable persistent storage is essential for keeping application data consistent and accessible within EKS clusters. By integrating Amazon EBS, organizations can provide their stateful applications with dependable, scalable, and high-performance storage. This approach not only ensures data integrity and availability but also significantly boosts the overall resilience and efficiency of Kubernetes deployments. Utilizing EBS effectively can lead to smoother operations, reduced downtime, and a more robust infrastructure for your applications.