We've guided dozens of organizations through Kubernetes migrations. The ones that succeed share a common trait: they invest heavily in planning before writing a single Dockerfile. The ones that struggle almost always skipped the assessment phase, jumped straight to containerization, and discovered critical blockers mid-migration.
This guide covers the seven phases we walk through with every client. The timeline varies — a small team with 5 microservices might complete this in 8 weeks, while an enterprise with 50+ services and compliance requirements might need 6 months. The phases, however, are the same.
Application & Infrastructure Assessment
Before you can plan a migration, you need a complete picture of what you're migrating. This sounds obvious, but most organizations don't have an accurate, up-to-date inventory of their running services.
Application Inventory
Document every service that will eventually run on Kubernetes. For each one, capture:
- Runtime: Language, framework, runtime version (e.g., Java 17 + Spring Boot 3.2, Node.js 20 + Express)
- State: Is it stateless or stateful? Does it write to local disk? Does it rely on sticky sessions?
- Configuration: How is it configured today? Environment variables, config files, feature flags?
- Dependencies: Databases, message queues, caches, external APIs, shared filesystems
- Traffic patterns: Average and peak RPS, latency requirements, WebSocket or long-polling connections
- Health checks: Does it expose health endpoints? How does the current platform determine if it's healthy?
Infrastructure Inventory
Map your current infrastructure — not just servers, but the supporting services:
- Load balancers: L4 vs L7, TLS termination, session affinity rules
- DNS: Internal and external DNS entries, TTLs, any DNS-based service discovery
- Storage: NFS mounts, block storage, object storage, shared volumes between services
- Networking: VPNs, firewall rules, private subnets, service-to-service communication patterns
- Secrets: Where are credentials stored today? Vault, AWS SSM, config files, environment variables?
Pro tip: Use tools like
netstat,ss, or eBPF-based tools like Pixie to discover actual network connections between services. Documented architecture diagrams are almost always incomplete or outdated.
Dependency Mapping & Risk Assessment
With your inventory complete, map the dependencies between services. This determines your migration order — you can't migrate a service before its dependencies are accessible from Kubernetes.
Dependency Graph
Build a directed graph of service dependencies. Identify:
- Leaf services: Services with no downstream dependencies on other internal services. These are your first migration candidates.
- Shared databases: Multiple services reading/writing the same database is the most common migration blocker. These need to be accessible from both the old and new environment during the transition.
- Circular dependencies: Services that depend on each other must be migrated together or require a temporary bridge.
Risk Classification
Classify each service by migration risk:
- Low risk: Stateless HTTP APIs with no local storage, standard health checks, already containerized or easily containerizable
- Medium risk: Services with persistent storage needs, specific networking requirements, or complex configuration
- High risk: Stateful services (databases, message queues), services with host-level dependencies, legacy applications that can't be easily containerized
Start your migration with low-risk leaf services. Early wins build confidence and surface infrastructure issues (networking, DNS, storage) before you tackle the hard stuff.
Containerization Strategy
Not every application needs a rewrite to run on Kubernetes. Choose the right containerization approach for each service:
Lift and Shift
Package the existing application into a container with minimal changes. This works well for applications that already follow the twelve-factor methodology — they read configuration from environment variables, log to stdout, and don't rely on local state.
# Example: Multi-stage build for a Go service
FROM golang:1.22-alpine AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -o /app ./cmd/server
FROM gcr.io/distroless/static-debian12
COPY --from=build /app /app
USER nonroot:nonroot
ENTRYPOINT ["/app"]
Refactor for Cloud-Native
Some applications need changes to run well on Kubernetes:
- Session state: Move from in-memory sessions to Redis or a database-backed session store
- File uploads: Switch from local filesystem to object storage (S3, MinIO, GCS)
- Configuration: Migrate from config files to environment variables or ConfigMaps
- Logging: Switch from file-based logging to structured JSON on stdout
- Health checks: Add
/healthzand/readyzendpoints for liveness and readiness probes
Dockerfile Best Practices
- Use multi-stage builds to minimize image size
- Pin base image versions with digests, not just tags
- Run as a non-root user (
USER 1000:1000) - Use
.dockerignoreto exclude build artifacts,.git, and secrets - Set
HEALTHCHECKinstructions in the Dockerfile - Use distroless or Alpine-based images to reduce attack surface
CI/CD Pipeline Setup
Your CI/CD pipeline is the backbone of a Kubernetes deployment workflow. Set it up before migrating workloads — not after.
Pipeline Architecture
A production-grade Kubernetes CI/CD pipeline typically includes:
- Build: Compile code, run unit tests, build container image
- Scan: Run Trivy or Grype against the image for CVEs, run Checkov or Kubesec against manifests
- Push: Tag and push to a private registry (Harbor, ECR, GCR, ACR)
- Deploy to staging: Apply manifests via GitOps (ArgoCD or Flux) or direct
kubectl apply - Integration tests: Run smoke tests and integration tests against staging
- Deploy to production: Promote the same image (not a rebuild) to production
GitOps vs. Push-Based Deployment
We strongly recommend GitOps for Kubernetes deployments. With ArgoCD or Flux, your Git repository becomes the single source of truth for cluster state. Benefits include:
- Full audit trail of every deployment in Git history
- Easy rollback via
git revert - Drift detection — the cluster is continuously reconciled to match Git
- No need to grant CI/CD pipelines direct
kubectlaccess to production
# ArgoCD Application manifest
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-service
namespace: argocd
spec:
project: default
source:
repoURL: https://gitlab.example.com/team/my-service-manifests.git
targetRevision: main
path: overlays/production
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
Testing & Validation
Testing a Kubernetes migration goes beyond functional tests. You need to validate that the application behaves identically in the new environment under realistic conditions.
Testing Layers
- Functional tests: Does the application return correct responses? Run your existing test suite against the Kubernetes-hosted version.
- Performance tests: Use tools like k6, Locust, or Gatling to generate realistic load. Compare latency percentiles (p50, p95, p99) against the baseline from the old environment.
- Chaos tests: Use Chaos Mesh or Litmus to simulate pod failures, node failures, and network partitions. Verify that your readiness probes, PodDisruptionBudgets, and replica counts handle failures gracefully.
- Security tests: Run kube-bench for CIS compliance, kubeaudit for security misconfigurations, and network policy tests to verify isolation.
Shadow Traffic
For critical services, consider running shadow traffic before cutting over. Tools like Istio's traffic mirroring can duplicate production requests to the Kubernetes-hosted version without affecting users. Compare response codes, latencies, and payloads between the old and new environments.
# Istio VirtualService with traffic mirroring
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: my-service
spec:
hosts:
- my-service.production.svc.cluster.local
http:
- route:
- destination:
host: my-service-legacy
mirror:
host: my-service-k8s
mirrorPercentage:
value: 100.0
Rollback Planning
Every migration plan needs a rollback plan. If something goes wrong during cutover, you need to be able to revert to the previous environment quickly and safely.
Rollback Triggers
Define explicit criteria that trigger a rollback. Don't leave this to judgment calls during an incident:
- Error rate exceeds 1% for more than 5 minutes
- p99 latency exceeds 2x the baseline for more than 5 minutes
- Any data integrity issue is detected
- Health check failures on more than 20% of pods
Rollback Mechanics
The rollback mechanism depends on your cutover strategy:
- DNS-based cutover: Revert the DNS record to point back to the old environment. Ensure TTLs are set low (30–60 seconds) before the migration window.
- Load balancer cutover: Shift traffic back to the old backend pool. This is faster than DNS and doesn't depend on client TTL compliance.
- Blue-green: Keep the old environment running and warm for the entire migration window. Switch back by updating the service routing.
Critical: Keep the old environment running for at least 48 hours after cutover. Don't decommission anything until you've confirmed the new environment is stable through at least one full business cycle (including peak hours).
Go-Live & Cutover
The actual cutover should be the least eventful part of the migration. If you've done phases 1–6 properly, go-live is just flipping a switch.
Pre-Cutover Checklist
- All pods healthy:
kubectl get pods -n productionshows all pods Running with correct replica counts - Monitoring active: Dashboards and alerts are configured and tested for the new environment
- Rollback tested: The rollback procedure has been executed at least once in staging
- Team briefed: Everyone involved knows their role, the timeline, and the rollback triggers
- Communication plan: Stakeholders know the migration window and who to contact
- DNS TTLs lowered: Reduced to 30–60 seconds at least 24 hours before cutover
Cutover Strategies
Canary rollout: Route a small percentage of traffic (1–5%) to the Kubernetes environment. Monitor for 30–60 minutes. Gradually increase to 10%, 25%, 50%, 100%. This is the safest approach for high-traffic services.
Blue-green switch: Run both environments simultaneously, then switch all traffic at once. Simpler than canary but higher risk — if something is wrong, 100% of users are affected immediately.
Rolling migration by service: Migrate services one at a time over days or weeks. Each service gets its own canary rollout. This is the approach we recommend for most organizations — it limits blast radius and gives the team time to learn.
Post-Cutover
- Monitor error rates, latency, and resource utilization for 48+ hours
- Run a post-migration retrospective within one week
- Document lessons learned and update runbooks
- Decommission the old environment only after the stability window passes
- Update DNS TTLs back to normal values
Common Pitfalls
After guiding many migrations, these are the mistakes we see most often:
- Skipping the assessment: Teams jump to writing Dockerfiles without understanding their dependencies. They discover mid-migration that a service depends on a shared NFS mount or a specific kernel module.
- Migrating everything at once: Big-bang migrations have a much higher failure rate. Migrate incrementally, starting with low-risk services.
- Ignoring the database: The database is usually the hardest part of a migration. Plan for it explicitly — hybrid connectivity, latency implications, and data synchronization.
- No rollback plan: "We'll figure it out if something goes wrong" is not a rollback plan. Define triggers, procedures, and responsibilities before go-live.
- Underestimating team training: Kubernetes has a steep learning curve. Budget time for your team to learn
kubectl, understand pod lifecycles, and practice incident response in the new environment.
Planning a Kubernetes Migration?
Our advisory team helps organizations plan and execute Kubernetes migrations with confidence. We provide the roadmap — your team builds the skills to own it long-term.
Get Migration Advisory