Securing the DevOps Pipeline Part 2: Hardening Kubernetes and Cloud Security

Strengthening Deployments and Runtime Protection

In Part 1 of this series, we explored the foundational aspects of securing a DevOps pipeline, including CI/CD security best practices, image scanning, enforcing signed images, restricting pipeline permissions, and managing secrets effectively. These strategies focused on protecting the software supply chain from build-time threats and credential leaks.

However, securing the CI/CD pipeline is only the beginning. Once applications are deployed, they need strong security controls within Kubernetes and cloud environments to prevent lateral movement, privilege escalation, and runtime threats.

In Part 2, we shift our focus to securing containerized deployments and cloud infrastructure. We will cover Kubernetes security hardening, real-time monitoring, infrastructure as code (IaC) security, and shifting security left to catch misconfigurations early. These strategies will help your team secure workloads in production and respond to potential threats before they lead to a security breach.

Container and Kubernetes Security: Hardening Your Deployments

In one of our company projects, an engineer mistakenly granted a container unnecessary root privilege, which allowed an attacker to escalate their access and compromise the system. This incident reinforced the importance of restricting permissions and controlling communication between containers in Kubernetes.

Containers and Kubernetes environments require special attention to security.

Strategies for Containers and Kubernetes Security

There are several basic strategies that you need to understand when you are hardening your container/Kubernetes environments against internal and external threats:

Container Image Scanning: Use tools like Anchore or Snyk to scan images for vulnerabilities before pushing them to registries.
Kubernetes RBAC: Apply the principle of least privilege, ensuring services and users have only the permissions they need.
Network Policies: Implement Kubernetes network policies to restrict communication between pods.
Runtime Security: Tools like Falco can help detect suspicious behavior in running containers.

Kubernetes Network Policy to Restrict Pod Communication

By default, Kubernetes allows all pods to communicate freely, which can expose sensitive services to unintended access. To enforce stricter security, we can use a NetworkPolicy to limit which pods can talk to each other.

The following Kubernetes NetworkPolicy ensures that only frontend pods can communicate with backend pods, blocking all other inbound connections:

apiVersion: networking.k8s.io/v1

kind: NetworkPolicy

metadata:

spec:

podSelector:

matchLabels:

app: frontend

policyTypes:

- Ingress

ingress:

- from:

- podSelector:

matchLabels:

app: backend

This Kubernetes NetworkPolicy restricts communication between pods. It targets “frontend” pods (app: frontend) and allows incoming traffic only from “backend” pods (app: backend).

The policy uses the Ingress type, meaning it controls incoming traffic, ensuring that “frontend” pods can only receive traffic from “backend” pods, improving security by isolating other pods. When you limit pod communication, this policy reduces attack surfaces and prevents unauthorized access between services. It’s an essential step in hardening Kubernetes deployments and mitigating lateral movement in the event of a breach.

Use Pod Security Policies (PSP) or Pod Security Admission (PSA)

In one of our deployments, a developer accidentally launched a container with root privileges, which exposed the system to potential privilege escalation attacks. Running containers as root is a major security risk, as it allows attackers to gain control over the host machine.

To mitigate this, Kubernetes provides Pod Security Policies (PSP) and Pod Security Admission (PSA) to enforce strict security controls on pod execution.

Example: Enforcing a Restricted Pod Security Policy

The following PodSecurityPolicy prevents containers from running as root and enforces additional security restrictions:

apiVersion: policy/v1beta1

kind: PodSecurityPolicy

metadata:

spec:

privileged: false

allowPrivilegeEscalation: false

runAsUser:

rule: MustRunAsNonRoot

readOnlyRootFilesystem: true

volumes:

- 'configMap'

- 'emptyDir'

- 'secret'

What This Policy Does

Apply this policy to your cluster to enforce security standards. This Kubernetes PodSecurityPolicy (PSP) configuration enhances the security of your cluster by preventing containers from running with elevated privileges. Specifically, it enforces that containers cannot run as root (runAsUser: MustRunAsNonRoot) and blocks privilege escalation (allowPrivilegeEscalation: false).

Additionally, it ensures that the container’s root filesystem is read-only (readOnlyRootFilesystem: true), which reduces the risk of unauthorized modifications. The volumes section restricts the types of volumes that can be mounted, allowing only configMap, emptyDir, and secret to limit access to potentially sensitive data. This policy helps mitigate security risks associated with running containers with excessive permissions.

Why This Matters

When you apply this security policy, organizations can enforce container security best practices and reduce the risk of privilege escalation attacks. While PSP is deprecated in Kubernetes 1.21+, its successor, Pod Security Admission (PSA), provides similar enforcement mechanisms for restricting insecure configurations.

Carefully Employ Role-Based Access Control (RBAC)

In a past project, a misconfigured RBAC policy accidentally gave all developers full admin privileges on a Kubernetes cluster. This led to accidental deletions, unexpected configuration changes, and security risks. While RBAC provides a structured way to define access controls, it does not inherently prevent misconfigurations. Instead, combining RBAC with regular policy reviews, automated scanning tools, and least-privilege enforcement helps reduce the risk of mistakes like this.

To limit unintended privilege escalation, RBAC should be carefully implemented alongside policy validation tools that detect misconfigurations before they become security risks. Additionally, organizations should regularly audit roles, enforce strict access policies, and implement monitoring solutions to catch excessive permissions early.

How to Lock Down Permissions:

Follow the principle of least privilege – Grant only the necessary permissions to users and services.
Use Roles and RoleBindings – Instead of ClusterRoles, use namespace-scoped Roles, when possible, to limit permissions to specific namespaces.
Regularly audit RBAC policies – Review and refine access controls to prevent privilege creep.

Example: Restricted role binding

Imagine you have a development team where certain users only need to view resources in the production namespace but should not be able to modify anything. To enforce this, you can define a RoleBinding that assigns a read-only role to a specific developer.

Here’s how you can do it:

apiVersion: rbac.authorization.k8s.io/v1

kind: RoleBinding

metadata:

namespace: production

subjects:

- kind: User

apiGroup: rbac.authorization.k8s.io

roleRef:

kind: Role

apiGroup: rbac.authorization.k8s.io

This RoleBinding example defines a restricted role for a specific user, developer-read-access in the production namespace. It grants the user access with a read-only role, meaning the user can only view resources and cannot modify them.

The roleRef section specifies that the user is assigned the read-only role, which is a predefined role within the rbac.authorization.k8s.io API group. By using this role binding, you ensure that users are only granted the minimum permissions required for their tasks, adhering to the principle of least privilege. This setup enhances security by limiting access to critical resources.

This RBAC policy prevents unauthorized changes while still allowing users to access the information they need.

Deploy Security Monitoring and Incident Response Tools

Real-time monitoring and alerting are essential for detecting suspicious activity in a Kubernetes cluster. Falco, an open-source runtime security tool, helps detect unexpected behaviour by monitoring system calls and alerting on potential security violations.

Example: Deploying Falco for Runtime Security Monitoring

Suppose you want to monitor your cluster for unauthorized container activity, such as unexpected privilege escalations or file access attempts. You can quickly deploy Falco using Helm, which simplifies installation and management.

Here’s how you can install Falco in your cluster:

1 2	helm repo add falcosecurity https://falcosecurity.github.io/charts helm install falco falcosecurity/falco

Once installed, Falco starts monitoring system calls and detecting policy violations. Below is an example of Falco detecting a privilege escalation attempt:

1 2	11:34:21.123456123: Warning Privilege Escalation Detected (user=root, process=sudo) 11:34:22.654321654: Notice Unexpected File Modification (/etc/passwd edited by unknown process)

You can customize Falco’s detection rules to match your security requirements, filtering out false positives and tailoring alerts to your environment. Integrating Falco with logging solutions like Elasticsearch, Grafana, or SIEM tools further enhances visibility and incident response capabilities.

Set Up Kubernetes Audit Logging

Tracking security events in your Kubernetes cluster is crucial for detecting unauthorized changes and ensuring compliance. Audit logging helps you monitor actions such as the creation, deletion, and modification of key resources. By defining an audit policy, you can control which events are logged and at what level of detail.

Example: Configuring an Audit Policy to Track Key Events

Suppose you want to log changes to critical Kubernetes resources—such as Pods, Deployments, and Services—to detect potential security threats or misconfigurations. The following audit policy helps you achieve this:

apiVersion: audit.k8s.io/v1

kind: Policy

rules:

- level: Metadata

verbs: ["create", "delete", "update", "patch"]

resources:

- group: ""

resources: ["pods", "deployments", "services"]

Breakdown of the Audit Policy ConfigurationSpecify the Audit API Version and Policy Kind

The apiVersion: audit.k8s.io/v1 ensures compatibility with the Kubernetes audit logging system.
The kind: Policy defines an audit logging policy.

Logging Level

The level: Metadata setting ensures that Kubernetes logs metadata about API requests but not full request/response bodies.
This helps capture essential security details like who performed the action, when it happened, and what resource was affected, without exposing sensitive data.

Monitor Key Kubernetes Events

These: ["create", "delete", "update", "patch"] section ensures that any changes to resources are logged.

The resources list targets Pods, Deployments, and Services, which are critical for cluster operations and security. Below are sample log entries for each action (create, delete, update and patch):

Log Entry for Creating a Pod

{

"kind": "Event",

"apiVersion": "audit.k8s.io/v1",

"level": "Metadata",

"verb": "create",

"user": "admin",

"objectRef": {

"resource": "pods",

"namespace": "production",

"name": "nginx-pod"

"timestamp": "2025-03-06T12:45:30Z"

}

Log Entry for Deleting a Deployment

{

"kind": "Event",

"apiVersion": "audit.k8s.io/v1",

"level": "Metadata",

"verb": "delete",

"user": "devops-engineer",

"objectRef": {

"resource": "deployments",

"namespace": "staging",

"name": "backend-service"

"timestamp": "2025-03-06T13:15:22Z"

}

Log Entry for Updating a Service

{

"kind": "Event",

"apiVersion": "audit.k8s.io/v1",

"level": "Metadata",

"verb": "update",

"user": "automation-bot",

"objectRef": {

"resource": "services",

"namespace": "production",

"name": "payment-gateway"

"timestamp": "2025-03-06T14:05:10Z"

}

Log Entry for Patching a Pod

{

"kind": "Event",

"apiVersion": "audit.k8s.io/v1",

"level": "Metadata",

"verb": "patch",

"user": "sre-team",

"objectRef": {

"resource": "pods",

"namespace": "dev",

"name": "web-app"

"timestamp": "2025-03-06T15:30:45Z"

}

Monitoring and Incident Response: Always Be Ready

Even with strong security measures in place, incidents can still occur. I once dealt with a situation where an application was exploited, but due to inadequate monitoring, we didn’t detect the breach for hours. Real-time threat detection and automated response are crucial for minimizing damage and improving security resilience.

The following are a few strategies and tools that you can employ when monitor your DevOps pipelines.

Security Information and Event Management (SIEM): Tools like Splunk or Datadog can aggregate logs and detect anomalies in real-time.
Automated Incident Response: Use SOAR (Security Orchestration, Automation, and Response) tools to automate threat detection and response.
Regular Security Drills: Just as important as tools is practice..Conduct red team exercises and chaos engineering to test the resilience of your security measures.

Example: Setting Up AWS GuardDuty for Threat Detection

To quickly detect and respond to suspicious activity in your AWS environment, you can use AWS GuardDuty, a threat detection service that continuously monitors for malicious activity. The following command enables GuardDuty in your AWS account:

1	aws guardduty create-detector –enable

How This Works

aws guardduty create-detector – This command initializes GuardDuty, enabling it to analyze logs and detect potential threats.
--enable – Ensures that GuardDuty starts monitoring right away without requiring additional configuration.

Once enabled, GuardDuty continuously scans AWS CloudTrail logs, VPC Flow Logs, and DNS logs for anomalies, such as:

Unusual API calls from unexpected regions.
Unauthorized access attempts.
Data exfiltration activities.

Example GuardDuty Findings

Here’s an example output when GuardDuty detects suspicious activity:

{

"schemaVersion": "2.0",

"accountId": "123456789012",

"region": "us-east-1",

"id": "abcd1234-ef56-7890-gh12-ijklmnopqrst",

"type": "UnauthorizedAccess:IAMUser/AnomalousBehavior",

"severity": 6.0,

"createdAt": "2025-03-06T14:10:00Z",

"updatedAt": "2025-03-06T14:15:00Z",

"resource": {

"resourceType": "AWS::IAM::User",

"accessKeyDetails": {

"userName": "compromised-user",

"accessKeyId": "AKIAEXAMPLE"

}

"service": {

"serviceName": "guardduty",

"eventFirstSeen": "2025-03-06T14:05:00Z",

"eventLastSeen": "2025-03-06T14:10:00Z",

"action": {

"actionType": "AWS_API_CALL",

"apiCallDetails": {

"api": "ListBuckets",

"serviceName": "s3.amazonaws.com",

"remoteIpDetails": {

"ipAddressV4": "203.0.113.42",

"country": "Unknown"

}

In this example, GuardDuty has detected unauthorized API calls made by a compromised IAM (AWS Identity and Access Management) user attempting to list S3 buckets. The finding includes details such as:

User and access key ID associated with the suspicious activity.
API call and AWS service being targeted.
IP address and location of the request origin.

When you integrate GuardDuty with AWS Security Hub or SIEM tools, you can automate responses to detected threats—such as isolating compromised instances or triggering alerts to security teams.

This proactive monitoring approach ensures that security incidents are detected and addressed swiftly, minimizing potential damage.

Infrastructure as Code (IaC) Security

The concept and methods of Infrastructure as Code (IaC) are powerful, yet come with inherent risks—just as it streamlines deployments, it can just as easily propagate security misconfigurations at scale. I once encountered a Terraform script that, due to a single oversight, inadvertently made an S3 bucket publicly accessible, exposing sensitive data to the internet. This incident reinforced the importance of proactive security scanning in IaC workflows.

The following are a few tools and techniques to help you to secure your IaC environment.

IaC Scanning: Tools like tfsec, Checkov, and KICS catch misconfigurations.
Least Privilege Policies: Ensure IAM roles and permissions follow the principle of least privilege.
Automated Compliance: Use Open Policy Agent (OPA) to enforce security policies.

Example: Using tfsec for Terraform Security Scanning

When working with Terraform, it’s critical to catch security misconfigurations before applying changes. tfsec is a static analysis tool that scans Terraform code for potential security risks.

1	tfsec ./terraform

This command checks for insecure configurations in Terraform files and provides actionable insights to fix them.

What this does

tfsec ./terraform – Runs a security scan on all Terraform configuration files in the ./terraform directory.
Finds Misconfigurations Early – Identifies insecure IAM roles, public S3 buckets, weak security groups, and hardcoded secrets.
Provides Fix Suggestions – Offers remediation guidance to help teams correct issues before deploying infrastructure.

For example, if a Terraform script accidentally sets an S3 bucket to public:

resource "aws_s3_bucket_public_access_block" "example" {

bucket = aws_s3_bucket.example.id

block_public_acls = false # Security risk: Public ACLs allowed

}

tfsec will flag this misconfiguration and suggest enforcing stricter access controls.

Why This Matters

Integrating tfsec into CI/CD pipelines ensures that security best practices are applied before deployment, reducing the risk of exposing sensitive infrastructure due to simple configuration errors.

Shifting Security Left: Catching Issues Early

Early in my DevOps career, I made the mistake of assuming security checks at deployment were enough. The reality? Fixing security issues at deployment is costly and time-consuming. The key is to shift security left—integrating security into every phase of the DevOps pipeline, from code commit to production.

Some techniques and tools to employ:

Secure Coding Practices: Implement linting and static code analysis tools like SonarQube or Checkmarx to catch vulnerabilities in the code before it even enters the CI/CD pipeline.
Pre-Commit Hooks: Use tools like pre-commit to enforce security policies on developers’ local machines.

Example: Enforcing Security with Pre-commit Hooks

To prevent hardcoded secrets and misconfigurations from being committed, set up a pre-commit hook in your repository.

Step 1: Add a `.pre-commit-config.yaml` file

# .pre-commit-config.yaml

repos:

- repo: https://github.com/pre-commit/pre-commit-hooks

rev: v3.4.0

hooks:

- id: detect-secrets

- id: check-yaml

By adding a .pre-commit-config.yaml file, you can specify repositories and hooks to run before committing code. In this case, the configuration adds the pre-commit-hooks repository and specifies two hooks: detect-secrets to identify hardcoded secrets in the code and check-yaml to validate YAML file syntax. These checks are run automatically whenever you try to commit, helping prevent sensitive data from being committed and ensuring code quality.

Step 2: Install and Enable Pre-commit

# .pre-commit-config.yaml

repos:

- repo: https://github.com/pre-commit/pre-commit-hooks

rev: v3.4.0

hooks:

- id: detect-secrets # Scans for hardcoded secrets

- id: check-yaml # Ensures YAML syntax is valid

This ensures no secrets or misconfigurations are committed:

pip install pre-commit – Installs the pre-commit tool.
pre-commit install – Enables hooks in your repository.
pre-commit run --all-files – Runs security checks on all files.

I once inherited a project where the previous team didn’t use pre-commit hooks. Within the first week of auditing, I found multiple instances of AWS keys embedded in code. After implementing detect-secrets, we caught an exposed database password in a commit that could have led to a major breach.

Wrapping Up

Securing the CI/CD pipeline is a crucial first step, but it’s not enough on its own. Once applications are deployed, they’re exposed to new risks like misconfigurations, excessive permissions, and runtime threats. In Part 2, we covered practical ways to lock down Kubernetes, including network policies, Pod Security Admission (PSA), and Role-Based Access Control (RBAC). We also looked at security monitoring tools like Falco and Kubernetes audit logging, which help detect suspicious activity before it turns into a full-blown incident.

Beyond Kubernetes, cloud security and Infrastructure as Code (IaC) play a major role in preventing misconfigurations before they reach production. AWS GuardDuty helps detect anomalies and unauthorized access, while tfsec scans Terraform configurations for security flaws. And by shifting security left with pre-commit hooks, teams can catch vulnerabilities early—before they become a problem.

Security isn’t a one-time fix—it’s a continuous effort. By combining what we covered in Part 1 (CI/CD security) with the Kubernetes and cloud security strategies in Part 2, teams can build a stronger, more resilient DevOps workflow.

Container and Kubernetes Security: Hardening Your Deployments

Strategies for Containers and Kubernetes Security

Kubernetes Network Policy to Restrict Pod Communication

Use Pod Security Policies (PSP) or Pod Security Admission (PSA)

Example: Enforcing a Restricted Pod Security Policy

What This Policy Does

Why This Matters

Carefully Employ Role-Based Access Control (RBAC)

Example: Restricted role binding

Deploy Security Monitoring and Incident Response Tools

Example: Deploying Falco for Runtime Security Monitoring

Set Up Kubernetes Audit Logging

Example: Configuring an Audit Policy to Track Key Events

Breakdown of the Audit Policy ConfigurationSpecify the Audit API Version and Policy Kind

Logging Level

Monitor Key Kubernetes Events

Monitoring and Incident Response: Always Be Ready

Example: Setting Up AWS GuardDuty for Threat Detection

How This Works

Example GuardDuty Findings

Infrastructure as Code (IaC) Security

Example: Using tfsec for Terraform Security Scanning

What this does

Why This Matters

Shifting Security Left: Catching Issues Early

Example: Enforcing Security with Pre-commit Hooks

Step 1: Add a .pre-commit-config.yaml file

Step 2: Install and Enable Pre-commit

Wrapping Up

Article tags

Recommended

About the author

Bravin Wasike

Bravin's contributions

Articles

Books

Top topics

Bravin's latest contributions:

DevOps vs. SRE: Bridging the Gap, Not Building Walls (Part 2 – Putting it into Practice)

DevOps vs. SRE: Bridging the Gap, Not Building Walls (Part 1)

What are the Key DevOps Performance Metrics You Should Track?

Step 1: Add a `.pre-commit-config.yaml` file