30 min read
Advanced
Deployment

MCP Server Kubernetes Deployment

Deploy and orchestrate MCP servers on Kubernetes with auto-scaling, health checks, and production-grade configurations

MCPgee Team

MCP Expert

A containerized MCP server (see Docker deployment tutorial)Kubernetes cluster (local or cloud)kubectl configured and connectedBasic understanding of Kubernetes concepts (pods, services, deployments)

MCP Server Kubernetes Deployment

Introduction

Kubernetes provides the orchestration layer needed to run MCP servers at scale in production. With Kubernetes, you get auto-scaling, self-healing, service discovery, rolling updates, and secrets management out of the box. This tutorial covers deploying MCP servers to Kubernetes, from basic deployments to production-grade configurations.

Before starting, ensure your MCP server is containerized. If not, follow our Docker deployment tutorial first.

Basic Deployment

Step 1: Create the Deployment Manifest

yaml
# mcp-server-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server
  labels:
    app: mcp-server
spec:
  replicas: 2
  selector:
    matchLabels:
      app: mcp-server
  template:
    metadata:
      labels:
        app: mcp-server
    spec:
      containers:
        - name: mcp-server
          image: ghcr.io/your-org/mcp-server:latest
          ports:
            - containerPort: 3000
          env:
            - name: MCP_TRANSPORT
              value: "streamable-http"
            - name: MCP_HOST
              value: "0.0.0.0"
            - name: MCP_PORT
              value: "3000"
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "500m"
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10

Step 2: Create the Service

yaml
# mcp-server-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: mcp-server
spec:
  selector:
    app: mcp-server
  ports:
    - port: 80
      targetPort: 3000
  type: ClusterIP

Step 3: Deploy

bash
kubectl apply -f mcp-server-deployment.yaml
kubectl apply -f mcp-server-service.yaml

# Verify deployment
kubectl get pods -l app=mcp-server
kubectl get svc mcp-server

Exposing MCP Servers

Ingress with TLS

For external access, use an Ingress controller with TLS:

yaml
# mcp-server-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mcp-server-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-buffering: "off"
spec:
  tls:
    - hosts:
        - mcp.example.com
      secretName: mcp-tls
  rules:
    - host: mcp.example.com
      http:
        paths:
          - path: /mcp
            pathType: Prefix
            backend:
              service:
                name: mcp-server
                port:
                  number: 80

Note the proxy timeout and buffering annotations. These are important for MCP's Streamable HTTP transport which uses long-lived connections. Without these settings, the proxy may terminate connections prematurely.

LoadBalancer Service

For cloud providers, use a LoadBalancer service:

yaml
apiVersion: v1
kind: Service
metadata:
  name: mcp-server-lb
spec:
  selector:
    app: mcp-server
  ports:
    - port: 443
      targetPort: 3000
  type: LoadBalancer

Secrets Management

Kubernetes Secrets

Store sensitive configuration securely:

yaml
# mcp-secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: mcp-secrets
type: Opaque
stringData:
  database-url: "postgresql://user:password@db-host:5432/mcpdb"
  api-key: "your-api-key-here"
  jwt-secret: "your-jwt-secret"

Reference secrets in your deployment:

yaml
spec:
  containers:
    - name: mcp-server
      env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: mcp-secrets
              key: database-url
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: mcp-secrets
              key: api-key

External Secrets Operator

For production, use External Secrets to sync from AWS Secrets Manager, HashiCorp Vault, or other providers:

yaml
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: mcp-external-secrets
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: mcp-secrets
  data:
    - secretKey: database-url
      remoteRef:
        key: mcp/production/database-url

Auto-Scaling

Horizontal Pod Autoscaler

Scale MCP server pods based on CPU or custom metrics:

yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mcp-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcp-server
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Vertical Pod Autoscaler

Automatically adjust resource requests and limits:

yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: mcp-server-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcp-server
  updatePolicy:
    updateMode: Auto

ConfigMaps for Runtime Configuration

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: mcp-config
data:
  config.json: |
    {
      "maxConcurrentRequests": 50,
      "requestTimeoutMs": 30000,
      "logLevel": "info",
      "enableMetrics": true
    }

Mount as a file in your pod:

yaml
spec:
  containers:
    - name: mcp-server
      volumeMounts:
        - name: config
          mountPath: /app/config
          readOnly: true
  volumes:
    - name: config
      configMap:
        name: mcp-config

Rolling Updates and Rollbacks

Update Strategy

Configure rolling updates for zero-downtime deployments:

yaml
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

Perform an Update

bash
# Update the image
kubectl set image deployment/mcp-server mcp-server=ghcr.io/your-org/mcp-server:v2.0.0

# Monitor rollout
kubectl rollout status deployment/mcp-server

# Rollback if needed
kubectl rollout undo deployment/mcp-server

Monitoring and Observability

Prometheus Metrics

Add a metrics endpoint to your MCP server:

typescript
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { register, Counter, Histogram } from 'prom-client';
import express from 'express';

const toolCallCounter = new Counter({
  name: 'mcp_tool_calls_total',
  help: 'Total number of MCP tool calls',
  labelNames: ['tool_name', 'status'],
});

const toolDuration = new Histogram({
  name: 'mcp_tool_duration_seconds',
  help: 'Duration of MCP tool calls',
  labelNames: ['tool_name'],
});

// Metrics endpoint
const metricsApp = express();
metricsApp.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});
metricsApp.listen(9090);

ServiceMonitor for Prometheus Operator

yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mcp-server-monitor
spec:
  selector:
    matchLabels:
      app: mcp-server
  endpoints:
    - port: metrics
      interval: 15s

Network Policies

Restrict network access to your MCP servers:

yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: mcp-server-policy
spec:
  podSelector:
    matchLabels:
      app: mcp-server
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - port: 3000
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - port: 5432

Multi-Server Deployment

Deploy multiple MCP servers with shared infrastructure:

yaml
# Namespace for all MCP services
apiVersion: v1
kind: Namespace
metadata:
  name: mcp-servers
---
# File server deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: file-server
  namespace: mcp-servers
spec:
  replicas: 2
  selector:
    matchLabels:
      app: file-server
  template:
    spec:
      containers:
        - name: file-server
          image: ghcr.io/your-org/mcp-file-server:latest
          ports:
            - containerPort: 3000
---
# Database server deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: db-server
  namespace: mcp-servers
spec:
  replicas: 3
  selector:
    matchLabels:
      app: db-server
  template:
    spec:
      containers:
        - name: db-server
          image: ghcr.io/your-org/mcp-db-server:latest
          ports:
            - containerPort: 3000

Security Hardening

For comprehensive MCP security guidance, see our security fundamentals and authentication tutorials.

Key Kubernetes-specific security practices:

  1. Pod Security Standards: Use restricted security context
  2. Network Policies: Limit pod-to-pod communication
  3. RBAC: Minimal service account permissions
  4. Image scanning: Scan container images for vulnerabilities
yaml
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001
    fsGroup: 1001
  containers:
    - name: mcp-server
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop:
            - ALL

Conclusion

Kubernetes provides everything you need to run MCP servers at production scale. From auto-scaling and self-healing to secrets management and network policies, K8s handles the infrastructure so you can focus on building great MCP tools. Start with a simple deployment and add features as your needs grow.

For more deployment options, explore serverless deployment with AWS Lambda or browse our Kubernetes server examples.

Code Examples

Basic Kubernetes Deploymentyaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server
spec:
  replicas: 2
  selector:
    matchLabels:
      app: mcp-server
  template:
    metadata:
      labels:
        app: mcp-server
    spec:
      containers:
        - name: mcp-server
          image: ghcr.io/your-org/mcp-server:latest
          ports:
            - containerPort: 3000
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "500m"
Service and Ingressyaml
apiVersion: v1
kind: Service
metadata:
  name: mcp-server
spec:
  selector:
    app: mcp-server
  ports:
    - port: 80
      targetPort: 3000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mcp-ingress
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
spec:
  rules:
    - host: mcp.example.com
      http:
        paths:
          - path: /mcp
            pathType: Prefix
            backend:
              service:
                name: mcp-server
                port:
                  number: 80
HorizontalPodAutoscaleryaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mcp-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcp-server
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Key Takeaways

  • Kubernetes provides auto-scaling, self-healing, and service discovery for MCP servers
  • Configure proxy timeouts in Ingress for Streamable HTTP long-lived connections
  • Use Kubernetes Secrets and External Secrets Operator for credential management
  • HorizontalPodAutoscaler scales MCP servers based on CPU, memory, or custom metrics
  • Network Policies and Pod Security Standards harden your MCP deployment

Troubleshooting

Pods keep restarting with CrashLoopBackOff

Check pod logs with kubectl logs <pod-name>. Common causes: missing environment variables, incorrect image tag, health check endpoint not responding. Ensure your MCP server starts correctly in the container locally before deploying to Kubernetes.

Streamable HTTP connections are being terminated

Add proxy-read-timeout and proxy-buffering annotations to your Ingress. The default nginx timeout of 60 seconds is too short for MCP streaming connections. Set it to at least 3600 seconds.

Auto-scaler not scaling up under load

Verify the metrics-server is installed in your cluster (kubectl top pods). Check that resource requests are defined in your deployment, as the HPA needs these to calculate utilization percentages.

Next Steps

  • Set up monitoring with Prometheus and Grafana
  • Implement CI/CD pipelines for automated deployments
  • Explore serverless alternatives with AWS Lambda
  • Add service mesh for advanced traffic management

Was this helpful?

Share tutorial:

Stay Updated with MCP Insights

Join 5,000+ developers and get weekly insights on MCP development, new server releases, and implementation strategies delivered to your inbox.

We respect your privacy. Unsubscribe at any time.

MCPgee Team

We write in-depth guides, tutorials, and reviews to help developers get the most out of the Model Context Protocol ecosystem.

Frequently Asked Questions

Explore MCP Servers

Browse our directory of 33,000+ MCP servers. Find the perfect tools for your AI-powered workflows.