Using S3 as Local Storage on kubernetes with S3 CSI Driver

Using S3 as Local Storage on kubernetes with S3 CSI Driver

Table of Contents

Originally posted on the Israeli Tech Radar on medium.

TLDR; In the world of cloud-native applications, efficient and scalable storage solutions are crucial. Amazon S3 (Simple Storage Service) is a popular object storage service, but what if you could use it as local storage in your Kubernetes clusters? Enter the S3 CSI (Container Storage Interface) Driver, a game-changer for developers and operations teams alike.

What is the S3 CSI Driver?

The S3 CSI Driver is a plugin that allows Kubernetes clusters to use Amazon S3 buckets as persistent storage volumes. It bridges the gap between cloud-native object storage and the local file system needs of containerized applications.

Importantly, the S3 CSI driver supports mounting existing buckets, providing a seamless integration with your current S3 infrastructure.

Benefits of using S3 as local storage

  1. Write once run anywhere: If your app treats it as a local folder and it’s s3, it can run locally in development without s3 / s3 alternative e.g minio

  2. Scalability: S3 offers virtually unlimited storage capacity.

  3. Cost-effectiveness: Pay only for the storage you use.

  4. Durability: S3 provides 99.999999999% (11 9’s) of data durability.

  5. Flexibility: Access your data from anywhere, not just within the cluster.

  6. Separation of concerns: Infrastructure teams can manage buckets while developers focus on application logic.

Setting up the S3 CSI Driver

To use S3 as local storage, you need to set up the S3 CSI Driver in your Kubernetes cluster.

Here’s a high-level overview:

Start by installing the S3 CSI Driver in your cluster using helm The official helm chart is available here I installed it on eks version 1.31.

Make sure to create a role and an irsa / pod identity association to the controllers service account, configure the driver with appropriate IAM permissions and trust policy something like the following:

{
 "Version": "2012-10-17",
 "Statement": [
      {
       "Effect": "Allow",
       "Principal": {
        "Federated": "arn:aws:iam::XXXXXXXX:oidc-provider/oidc.eks.<aws-region>.amazonaws.com/id/<cluster-oidc-dentifier>"
       },
       "Action": "sts:AssumeRoleWithWebIdentity",
       "Condition": {
        "StringEquals": {
         "oidc.eks.eu-west-1.amazonaws.com/id/77165DB5EDA8217E458CA2FE1C7957F8:sub": "system:serviceaccount:kube-system:s3-csi-driver-sa",
         "oidc.eks.eu-west-1.amazonaws.com/id/77165DB5EDA8217E458CA2FE1C7957F8:aud": "sts.amazonaws.com"
        }
       }
      }
     ]
    }

This role will requires iam permissions to s3 and you want them with the least privilege approach something like the following:

{
  "Statement": [
    {
        "Action": "s3:ListBucket",
        "Effect": "Allow",
        "Resource": "arn:aws:s3:::developers-bucket",
        "Sid": "MountpointFullBucketAccess"
    },
    {
        "Action": [
            "s3:PutObject",
            "s3:GetObject",
            "s3:DeleteObject",
            "s3:AbortMultipartUpload"
        ],
        "Effect": "Allow",
        "Resource": "arn:aws:s3:::developers-bucket/*",
        "Sid": "MountpointFullObjectAccess"
    }
  ],
  "Version": "2012-10-17"
}

Finallu in your eks cluster create a StorageClass for S3 like so:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: s3
provisioner: s3.csi.aws.com
parameters:
  type: standard

In our example, we assume these steps have been completed by the infrastructure team using terraform code and the eks-addons module to configure the s3-csi-plugin and the requires permissions to read and write (not delete) from the bucket named developers-bucket.

Configuring and using S3 as local storage

To use an S3 bucket as local storage, you need to create a PersistentVolume (PV) and a PersistentVolumeClaim (PVC).

Here’s an example:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: s3-pv-developers-bucket
spec:
  capacity:
    storage: 5Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  storageClassName: s3
  csi:
    driver: s3.csi.aws.com
    volumeHandle: developers-bucket
    volumeAttributes:
      bucketName: developers-bucket
      mountOptions: "--dir-mode=0777 --file-mode=0666"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: s3-pvc-developers-bucket
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: s3
  resources:
    requests:
      storage: 5Gi
  volumeName: s3-pv-developers-bucket

To use this storage in a pod, add the following to your pod spec:

apiVersion: apps/v1
kind: Deployment
  metadata:
    name: s3-app
  spec:
    replicas: 1
    selector:
      matchLabels:
        app: s3-app
    template:
      metadata:
        labels:
          app: s3-app
      spec:
        containers:
        - name: app-container
          image: ubuntu:latest
          command: ["/bin/sh"]
          args: ["-c", "while true; do sleep 30; done;"]
          volumeMounts:
          - name: s3-storage
            mountPath: /data
        volumes:
        - name: s3-storage
          persistentVolumeClaim:
            claimName: s3-pvc-developers-bucket

If we kubectl exec into this container we can read and write to /data as if it was local storage.

Best practices and considerations for s3 and object storage in general

  1. Security: Ensure proper IAM permissions and consider using encryption at rest. As described above the firstname-lastname + you cannot delete an object, just create new ones.

  2. Performance: S3 as local storage may not be suitable for high-performance, low-latency applications. For batch / etl processes this method is sufficient.

  3. Costs: Monitor your S3 usage to avoid unexpected costs, always true ;)

  4. Backup: Even though S3 is highly durable, consider implementing backup strategies for critical data, modifying storage class for objects older than n days etc.

  5. Versioning: Enable S3 versioning for additional data protection.

Conclusion

The S3 CSI Driver opens up new possibilities for using Amazon S3 as local storage in Kubernetes environments. It combines the scalability and durability of S3 with the flexibility of local storage, offering a powerful solution for cloud-native applications, by following the configuration steps and best practices outlined in this post, you can effectively leverage S3 as local storage in your Kubernetes deployments.

As mentioned this setup was done on eks, i’m curious to test how I could do this locally with minio ink3d I use for my local experiment needs …

Your Sincerely, HP

comments powered by Disqus

Related Posts

Terraform - the Defacto Tool for Infrastructure Provisioning

Terraform - the Defacto Tool for Infrastructure Provisioning

In this introduction session by Haggai Philip Zagury, DevOps Architect from Tikal, we will learn Terraform basics, starting from the basics to modules and some small tips and tricks you pick-up along the way.

Read More
Mise en place approach in software development

Mise en place approach in software development

This is a brief of a tech talk about the Dora Metrics project. it’s a very short introduction to the Dora Metrics project, it was given as part of the DevOpsDays Tel Aviv 2024 conference, and started as a 10 minuete lihtning talk which wat triggered me to learn more about the metrics and especially how you messure them.

Read More
Kubernetes Control Loops: The Secret Sauce Behind Your Microservice Bliss

Kubernetes Control Loops: The Secret Sauce Behind Your Microservice Bliss

Originally posted on the Israeli Tech Radar on medium.

TLDR; Ever dreamt of a system that manages your microservices like a maestro, constantly keeping them in perfect harmony? Kubernetes makes this a reality with control loops, the magic behind the scenes. Operators take it a step further, letting you customize Kubernetes for even more control over custom workload needs.

Read More