Strengthening Deployments and Runtime Protection
In Part 1 of this series, we explored the foundational aspects of securing a DevOps pipeline, including CI/CD security best practices, image scanning, enforcing signed images, restricting pipeline permissions, and managing secrets effectively. These strategies focused on protecting the software supply chain from build-time threats and credential leaks.
However, securing the CI/CD pipeline is only the beginning. Once applications are deployed, they need strong security controls within Kubernetes and cloud environments to prevent lateral movement, privilege escalation, and runtime threats.
In Part 2, we shift our focus to securing containerized deployments and cloud infrastructure. We will cover Kubernetes security hardening, real-time monitoring, infrastructure as code (IaC) security, and shifting security left to catch misconfigurations early. These strategies will help your team secure workloads in production and respond to potential threats before they lead to a security breach.
Container and Kubernetes Security: Hardening Your Deployments
In one of our company projects, an engineer mistakenly granted a container unnecessary root privilege, which allowed an attacker to escalate their access and compromise the system. This incident reinforced the importance of restricting permissions and controlling communication between containers in Kubernetes.
Containers and Kubernetes environments require special attention to security.
Strategies for Containers and Kubernetes Security
There are several basic strategies that you need to understand when you are hardening your container/Kubernetes environments against internal and external threats:
- Container Image Scanning: Use tools like Anchore or Snyk to scan images for vulnerabilities before pushing them to registries.
- Kubernetes RBAC: Apply the principle of least privilege, ensuring services and users have only the permissions they need.
- Network Policies: Implement Kubernetes network policies to restrict communication between pods.
- Runtime Security: Tools like Falco can help detect suspicious behavior in running containers.
Kubernetes Network Policy to Restrict Pod Communication
By default, Kubernetes allows all pods to communicate freely, which can expose sensitive services to unintended access. To enforce stricter security, we can use a NetworkPolicy
to limit which pods can talk to each other.
The following Kubernetes NetworkPolicy
ensures that only frontend pods can communicate with backend pods, blocking all other inbound connections:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: restrict-internal-traffic spec: podSelector: matchLabels: app: frontend policyTypes: - Ingress ingress: - from: - podSelector: matchLabels: app: backend |
This Kubernetes NetworkPolicy restricts communication between pods. It targets “frontend” pods (app: frontend
) and allows incoming traffic only from “backend” pods (app: backend
).
The policy uses the Ingress
type, meaning it controls incoming traffic, ensuring that “frontend” pods can only receive traffic from “backend” pods, improving security by isolating other pods. When you limit pod communication, this policy reduces attack surfaces and prevents unauthorized access between services. It’s an essential step in hardening Kubernetes deployments and mitigating lateral movement in the event of a breach.
Use Pod Security Policies (PSP) or Pod Security Admission (PSA)
In one of our deployments, a developer accidentally launched a container with root privileges, which exposed the system to potential privilege escalation attacks. Running containers as root is a major security risk, as it allows attackers to gain control over the host machine.
To mitigate this, Kubernetes provides Pod Security Policies (PSP) and Pod Security Admission (PSA) to enforce strict security controls on pod execution.
Example: Enforcing a Restricted Pod Security Policy
The following PodSecurityPolicy
prevents containers from running as root and enforces additional security restrictions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: restricted spec: privileged: false allowPrivilegeEscalation: false runAsUser: rule: MustRunAsNonRoot readOnlyRootFilesystem: true volumes: - 'configMap' - 'emptyDir' - 'secret' |
What This Policy Does
Apply this policy to your cluster to enforce security standards. This Kubernetes PodSecurityPolicy
(PSP) configuration enhances the security of your cluster by preventing containers from running with elevated privileges. Specifically, it enforces that containers cannot run as root (runAsUser: MustRunAsNonRoot
) and blocks privilege escalation (allowPrivilegeEscalation: false
).
Additionally, it ensures that the container’s root filesystem is read-only (readOnlyRootFilesystem: true
), which reduces the risk of unauthorized modifications. The volumes
section restricts the types of volumes that can be mounted, allowing only configMap, emptyDir
, and secret
to limit access to potentially sensitive data. This policy helps mitigate security risks associated with running containers with excessive permissions.
Why This Matters
When you apply this security policy, organizations can enforce container security best practices and reduce the risk of privilege escalation attacks. While PSP is deprecated in Kubernetes 1.21+, its successor, Pod Security Admission (PSA), provides similar enforcement mechanisms for restricting insecure configurations.
Carefully Employ Role-Based Access Control (RBAC)
In a past project, a misconfigured RBAC policy accidentally gave all developers full admin privileges on a Kubernetes cluster. This led to accidental deletions, unexpected configuration changes, and security risks. While RBAC provides a structured way to define access controls, it does not inherently prevent misconfigurations. Instead, combining RBAC with regular policy reviews, automated scanning tools, and least-privilege enforcement helps reduce the risk of mistakes like this.
To limit unintended privilege escalation, RBAC should be carefully implemented alongside policy validation tools that detect misconfigurations before they become security risks. Additionally, organizations should regularly audit roles, enforce strict access policies, and implement monitoring solutions to catch excessive permissions early.
How to Lock Down Permissions:
- Follow the principle of least privilege – Grant only the necessary permissions to users and services.
- Use Roles and RoleBindings – Instead of ClusterRoles, use namespace-scoped Roles, when possible, to limit permissions to specific namespaces.
- Regularly audit RBAC policies – Review and refine access controls to prevent privilege creep.
Example: Restricted role binding
Imagine you have a development team where certain users only need to view resources in the production namespace but should not be able to modify anything. To enforce this, you can define a RoleBinding
that assigns a read-only role to a specific developer.
Here’s how you can do it:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: DeveloperReadAccess namespace: production subjects: - kind: User name: developer-read-access apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: read-only apiGroup: rbac.authorization.k8s.io |
This RoleBinding
example defines a restricted role for a specific user, developer-read-access
in the production
namespace. It grants the user access with a read-only
role, meaning the user can only view resources and cannot modify them.
The roleRef
section specifies that the user is assigned the read-only
role, which is a predefined role within the rbac.authorization.k8s.io
API group. By using this role binding, you ensure that users are only granted the minimum permissions required for their tasks, adhering to the principle of least privilege. This setup enhances security by limiting access to critical resources.
This RBAC policy prevents unauthorized changes while still allowing users to access the information they need.
Deploy Security Monitoring and Incident Response Tools
Real-time monitoring and alerting are essential for detecting suspicious activity in a Kubernetes cluster. Falco, an open-source runtime security tool, helps detect unexpected behaviour by monitoring system calls and alerting on potential security violations.
Example: Deploying Falco for Runtime Security Monitoring
Suppose you want to monitor your cluster for unauthorized container activity, such as unexpected privilege escalations or file access attempts. You can quickly deploy Falco using Helm, which simplifies installation and management.
Here’s how you can install Falco in your cluster:
1 2 |
helm repo add falcosecurity https://falcosecurity.github.io/charts helm install falco falcosecurity/falco |
Once installed, Falco starts monitoring system calls and detecting policy violations. Below is an example of Falco detecting a privilege escalation attempt:
1 2 |
11:34:21.123456123: Warning Privilege Escalation Detected (user=root, process=sudo) 11:34:22.654321654: Notice Unexpected File Modification (/etc/passwd edited by unknown process) |
You can customize Falco’s detection rules to match your security requirements, filtering out false positives and tailoring alerts to your environment. Integrating Falco with logging solutions like Elasticsearch, Grafana, or SIEM tools further enhances visibility and incident response capabilities.
Set Up Kubernetes Audit Logging
Tracking security events in your Kubernetes cluster is crucial for detecting unauthorized changes and ensuring compliance. Audit logging helps you monitor actions such as the creation, deletion, and modification of key resources. By defining an audit policy, you can control which events are logged and at what level of detail.
Example: Configuring an Audit Policy to Track Key Events
Suppose you want to log changes to critical Kubernetes resources—such as Pods, Deployments, and Services—to detect potential security threats or misconfigurations. The following audit policy helps you achieve this:
1 2 3 4 5 6 7 8 |
apiVersion: audit.k8s.io/v1 kind: Policy rules: - level: Metadata verbs: ["create", "delete", "update", "patch"] resources: - group: "" resources: ["pods", "deployments", "services"] |
Breakdown of the Audit Policy ConfigurationSpecify the Audit API Version and Policy Kind
- The
apiVersion: audit.k8s.io/
v1 ensures compatibility with the Kubernetes audit logging system. - The
kind: Policy
defines an audit logging policy.
Logging Level
- The
level: Metadata
setting ensures that Kubernetes logs metadata about API requests but not full request/response bodies. - This helps capture essential security details like who performed the action, when it happened, and what resource was affected, without exposing sensitive data.
Monitor Key Kubernetes Events
- These:
["create", "delete", "update", "patch"]
section ensures that any changes to resources are logged.
The resources
list targets Pods, Deployments, and Services, which are critical for cluster operations and security. Below are sample log entries for each action (create, delete, update and patch):
Log Entry for Creating a Pod
1 2 3 4 5 6 7 8 9 10 11 12 13 |
{ "kind": "Event", "apiVersion": "audit.k8s.io/v1", "level": "Metadata", "verb": "create", "user": "admin", "objectRef": { "resource": "pods", "namespace": "production", "name": "nginx-pod" }, "timestamp": "2025-03-06T12:45:30Z" } |
Log Entry for Deleting a Deployment
1 2 3 4 5 6 7 8 9 10 11 12 13 |
{ "kind": "Event", "apiVersion": "audit.k8s.io/v1", "level": "Metadata", "verb": "delete", "user": "devops-engineer", "objectRef": { "resource": "deployments", "namespace": "staging", "name": "backend-service" }, "timestamp": "2025-03-06T13:15:22Z" } |
Log Entry for Updating a Service
1 2 3 4 5 6 7 8 9 10 11 12 13 |
{ "kind": "Event", "apiVersion": "audit.k8s.io/v1", "level": "Metadata", "verb": "update", "user": "automation-bot", "objectRef": { "resource": "services", "namespace": "production", "name": "payment-gateway" }, "timestamp": "2025-03-06T14:05:10Z" } |
Log Entry for Patching a Pod
1 2 3 4 5 6 7 8 9 10 11 12 13 |
{ "kind": "Event", "apiVersion": "audit.k8s.io/v1", "level": "Metadata", "verb": "patch", "user": "sre-team", "objectRef": { "resource": "pods", "namespace": "dev", "name": "web-app" }, "timestamp": "2025-03-06T15:30:45Z" } |
Monitoring and Incident Response: Always Be Ready
Even with strong security measures in place, incidents can still occur. I once dealt with a situation where an application was exploited, but due to inadequate monitoring, we didn’t detect the breach for hours. Real-time threat detection and automated response are crucial for minimizing damage and improving security resilience.
The following are a few strategies and tools that you can employ when monitor your DevOps pipelines.
- Security Information and Event Management (SIEM): Tools like Splunk or Datadog can aggregate logs and detect anomalies in real-time.
- Automated Incident Response: Use SOAR (Security Orchestration, Automation, and Response) tools to automate threat detection and response.
- Regular Security Drills: Just as important as tools is practice..Conduct red team exercises and chaos engineering to test the resilience of your security measures.
Example: Setting Up AWS GuardDuty for Threat Detection
To quickly detect and respond to suspicious activity in your AWS environment, you can use AWS GuardDuty, a threat detection service that continuously monitors for malicious activity. The following command enables GuardDuty in your AWS account:
1 |
aws guardduty create-detector –enable |
How This Works
aws guardduty create-detector
– This command initializes GuardDuty, enabling it to analyze logs and detect potential threats.--enable
– Ensures that GuardDuty starts monitoring right away without requiring additional configuration.
Once enabled, GuardDuty continuously scans AWS CloudTrail logs, VPC Flow Logs, and DNS logs for anomalies, such as:
- Unusual API calls from unexpected regions.
- Unauthorized access attempts.
- Data exfiltration activities.
Example GuardDuty Findings
Here’s an example output when GuardDuty detects suspicious activity:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
{ "schemaVersion": "2.0", "accountId": "123456789012", "region": "us-east-1", "id": "abcd1234-ef56-7890-gh12-ijklmnopqrst", "type": "UnauthorizedAccess:IAMUser/AnomalousBehavior", "severity": 6.0, "createdAt": "2025-03-06T14:10:00Z", "updatedAt": "2025-03-06T14:15:00Z", "resource": { "resourceType": "AWS::IAM::User", "accessKeyDetails": { "userName": "compromised-user", "accessKeyId": "AKIAEXAMPLE" } }, "service": { "serviceName": "guardduty", "eventFirstSeen": "2025-03-06T14:05:00Z", "eventLastSeen": "2025-03-06T14:10:00Z", "action": { "actionType": "AWS_API_CALL", "apiCallDetails": { "api": "ListBuckets", "serviceName": "s3.amazonaws.com", "remoteIpDetails": { "ipAddressV4": "203.0.113.42", "country": "Unknown" } } } } } |
In this example, GuardDuty has detected unauthorized API calls made by a compromised IAM (AWS Identity and Access Management) user attempting to list S3 buckets. The finding includes details such as:
- User and access key ID associated with the suspicious activity.
- API call and AWS service being targeted.
- IP address and location of the request origin.
When you integrate GuardDuty with AWS Security Hub or SIEM tools, you can automate responses to detected threats—such as isolating compromised instances or triggering alerts to security teams.
This proactive monitoring approach ensures that security incidents are detected and addressed swiftly, minimizing potential damage.
Infrastructure as Code (IaC) Security
The concept and methods of Infrastructure as Code (IaC) are powerful, yet come with inherent risks—just as it streamlines deployments, it can just as easily propagate security misconfigurations at scale. I once encountered a Terraform script that, due to a single oversight, inadvertently made an S3 bucket publicly accessible, exposing sensitive data to the internet. This incident reinforced the importance of proactive security scanning in IaC workflows.
The following are a few tools and techniques to help you to secure your IaC environment.
- IaC Scanning: Tools like tfsec, Checkov, and KICS catch misconfigurations.
- Least Privilege Policies: Ensure IAM roles and permissions follow the principle of least privilege.
- Automated Compliance: Use Open Policy Agent (OPA) to enforce security policies.
Example: Using tfsec for Terraform Security Scanning
When working with Terraform, it’s critical to catch security misconfigurations before applying changes. tfsec is a static analysis tool that scans Terraform code for potential security risks.
1 |
tfsec ./terraform |
This command checks for insecure configurations in Terraform files and provides actionable insights to fix them.
What this does
tfsec ./terraform
– Runs a security scan on all Terraform configuration files in the./terraform
directory.- Finds Misconfigurations Early – Identifies insecure IAM roles, public S3 buckets, weak security groups, and hardcoded secrets.
- Provides Fix Suggestions – Offers remediation guidance to help teams correct issues before deploying infrastructure.
For example, if a Terraform script accidentally sets an S3 bucket to public:
1 2 3 4 |
resource "aws_s3_bucket_public_access_block" "example" { bucket = aws_s3_bucket.example.id block_public_acls = false # Security risk: Public ACLs allowed } |
tfsec
will flag this misconfiguration and suggest enforcing stricter access controls.
Why This Matters
Integrating tfsec
into CI/CD pipelines ensures that security best practices are applied before deployment, reducing the risk of exposing sensitive infrastructure due to simple configuration errors.
Shifting Security Left: Catching Issues Early
Early in my DevOps career, I made the mistake of assuming security checks at deployment were enough. The reality? Fixing security issues at deployment is costly and time-consuming. The key is to shift security left—integrating security into every phase of the DevOps pipeline, from code commit to production.
Some techniques and tools to employ:
- Secure Coding Practices: Implement linting and static code analysis tools like SonarQube or Checkmarx to catch vulnerabilities in the code before it even enters the CI/CD pipeline.
- Pre-Commit Hooks: Use tools like pre-commit to enforce security policies on developers’ local machines.
Example: Enforcing Security with Pre-commit Hooks
To prevent hardcoded secrets and misconfigurations from being committed, set up a pre-commit hook in your repository.
Step 1: Add a .pre-commit-config.yaml
file
1 2 3 4 5 6 7 |
# .pre-commit-config.yaml repos: - repo: https://github.com/pre-commit/pre-commit-hooks rev: v3.4.0 hooks: - id: detect-secrets - id: check-yaml |
By adding a .pre-commit-config.yaml
file, you can specify repositories and hooks to run before committing code. In this case, the configuration adds the pre-commit-hooks
repository and specifies two hooks: detect-secrets
to identify hardcoded secrets in the code and check-yaml
to validate YAML file syntax. These checks are run automatically whenever you try to commit, helping prevent sensitive data from being committed and ensuring code quality.
Step 2: Install and Enable Pre-commit
1 2 3 4 5 6 7 |
# .pre-commit-config.yaml repos: - repo: https://github.com/pre-commit/pre-commit-hooks rev: v3.4.0 hooks: - id: detect-secrets # Scans for hardcoded secrets - id: check-yaml # Ensures YAML syntax is valid |
This ensures no secrets or misconfigurations are committed:
pip install pre-commit
– Installs the pre-commit tool.pre-commit install
– Enables hooks in your repository.pre-commit run --all-files
– Runs security checks on all files.
I once inherited a project where the previous team didn’t use pre-commit hooks. Within the first week of auditing, I found multiple instances of AWS keys embedded in code. After implementing detect-secrets, we caught an exposed database password in a commit that could have led to a major breach.
Wrapping Up
Securing the CI/CD pipeline is a crucial first step, but it’s not enough on its own. Once applications are deployed, they’re exposed to new risks like misconfigurations, excessive permissions, and runtime threats. In Part 2, we covered practical ways to lock down Kubernetes, including network policies, Pod Security Admission (PSA), and Role-Based Access Control (RBAC). We also looked at security monitoring tools like Falco and Kubernetes audit logging, which help detect suspicious activity before it turns into a full-blown incident.
Beyond Kubernetes, cloud security and Infrastructure as Code (IaC) play a major role in preventing misconfigurations before they reach production. AWS GuardDuty helps detect anomalies and unauthorized access, while tfsec scans Terraform configurations for security flaws. And by shifting security left with pre-commit hooks, teams can catch vulnerabilities early—before they become a problem.
Security isn’t a one-time fix—it’s a continuous effort. By combining what we covered in Part 1 (CI/CD security) with the Kubernetes and cloud security strategies in Part 2, teams can build a stronger, more resilient DevOps workflow.
Load comments