How many replicas should I start with for an MCP server?

Start with 2 replicas for high availability. Use a HorizontalPodAutoscaler to scale based on actual demand. For development environments, 1 replica is sufficient.

Can I use Helm charts for MCP server deployments?

Yes, packaging your MCP server deployment as a Helm chart is recommended for production. It makes configuration management, versioning, and multi-environment deployments much easier.

Should MCP servers be stateless?

Yes, design MCP servers to be stateless for best results with Kubernetes. Store any persistent state in external databases or caches. This enables horizontal scaling and seamless rolling updates.

What Kubernetes version is required for MCP deployments?

Any Kubernetes version 1.25+ works. The manifests in this tutorial use standard API versions available in modern Kubernetes distributions including EKS, GKE, AKS, and k3s.

How do I handle MCP session affinity with multiple replicas?

If your MCP server uses sessions, configure session affinity on the Service (sessionAffinity: ClientIP) or use sticky sessions in your Ingress. For stateless servers, no affinity is needed.

30 min read

Advanced

Deployment

MCP Server Kubernetes Deployment

Deploy and orchestrate MCP servers on Kubernetes with auto-scaling, health checks, and production-grade configurations

MCPgee Team

MCP Expert

A containerized MCP server (see Docker deployment tutorial)Kubernetes cluster (local or cloud)kubectl configured and connectedBasic understanding of Kubernetes concepts (pods, services, deployments)

MCP Server Kubernetes Deployment

Introduction

Kubernetes provides the orchestration layer needed to run MCP servers at scale in production. With Kubernetes, you get auto-scaling, self-healing, service discovery, rolling updates, and secrets management out of the box. This tutorial covers deploying MCP servers to Kubernetes, from basic deployments to production-grade configurations.

Before starting, ensure your MCP server is containerized. If not, follow our Docker deployment tutorial first.

Basic Deployment

Step 1: Create the Deployment Manifest

yaml

# mcp-server-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server
  labels:
    app: mcp-server
spec:
  replicas: 2
  selector:
    matchLabels:
      app: mcp-server
  template:
    metadata:
      labels:
        app: mcp-server
    spec:
      containers:
        - name: mcp-server
          image: ghcr.io/your-org/mcp-server:latest
          ports:
            - containerPort: 3000
          env:
            - name: MCP_TRANSPORT
              value: "streamable-http"
            - name: MCP_HOST
              value: "0.0.0.0"
            - name: MCP_PORT
              value: "3000"
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "500m"
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 30
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 10

Step 2: Create the Service

yaml

# mcp-server-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: mcp-server
spec:
  selector:
    app: mcp-server
  ports:
    - port: 80
      targetPort: 3000
  type: ClusterIP

Step 3: Deploy

bash

kubectl apply -f mcp-server-deployment.yaml
kubectl apply -f mcp-server-service.yaml

# Verify deployment
kubectl get pods -l app=mcp-server
kubectl get svc mcp-server

Exposing MCP Servers

Ingress with TLS

For external access, use an Ingress controller with TLS:

yaml

# mcp-server-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mcp-server-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-buffering: "off"
spec:
  tls:
    - hosts:
        - mcp.example.com
      secretName: mcp-tls
  rules:
    - host: mcp.example.com
      http:
        paths:
          - path: /mcp
            pathType: Prefix
            backend:
              service:
                name: mcp-server
                port:
                  number: 80

Note the proxy timeout and buffering annotations. These are important for MCP's Streamable HTTP transport which uses long-lived connections. Without these settings, the proxy may terminate connections prematurely.

LoadBalancer Service

For cloud providers, use a LoadBalancer service:

yaml

apiVersion: v1
kind: Service
metadata:
  name: mcp-server-lb
spec:
  selector:
    app: mcp-server
  ports:
    - port: 443
      targetPort: 3000
  type: LoadBalancer

Secrets Management

Kubernetes Secrets

Store sensitive configuration securely:

yaml

# mcp-secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: mcp-secrets
type: Opaque
stringData:
  database-url: "postgresql://user:password@db-host:5432/mcpdb"
  api-key: "your-api-key-here"
  jwt-secret: "your-jwt-secret"

Reference secrets in your deployment:

yaml

spec:
  containers:
    - name: mcp-server
      env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: mcp-secrets
              key: database-url
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: mcp-secrets
              key: api-key

External Secrets Operator

For production, use External Secrets to sync from AWS Secrets Manager, HashiCorp Vault, or other providers:

yaml

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: mcp-external-secrets
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: ClusterSecretStore
  target:
    name: mcp-secrets
  data:
    - secretKey: database-url
      remoteRef:
        key: mcp/production/database-url

Auto-Scaling

Horizontal Pod Autoscaler

Scale MCP server pods based on CPU or custom metrics:

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mcp-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcp-server
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Vertical Pod Autoscaler

Automatically adjust resource requests and limits:

yaml

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: mcp-server-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcp-server
  updatePolicy:
    updateMode: Auto

ConfigMaps for Runtime Configuration

yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: mcp-config
data:
  config.json: |
    {
      "maxConcurrentRequests": 50,
      "requestTimeoutMs": 30000,
      "logLevel": "info",
      "enableMetrics": true
    }

Mount as a file in your pod:

yaml

spec:
  containers:
    - name: mcp-server
      volumeMounts:
        - name: config
          mountPath: /app/config
          readOnly: true
  volumes:
    - name: config
      configMap:
        name: mcp-config

Rolling Updates and Rollbacks

Update Strategy

Configure rolling updates for zero-downtime deployments:

yaml

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

Perform an Update

bash

# Update the image
kubectl set image deployment/mcp-server mcp-server=ghcr.io/your-org/mcp-server:v2.0.0

# Monitor rollout
kubectl rollout status deployment/mcp-server

# Rollback if needed
kubectl rollout undo deployment/mcp-server

Monitoring and Observability

Prometheus Metrics

Add a metrics endpoint to your MCP server:

typescript

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { register, Counter, Histogram } from 'prom-client';
import express from 'express';

const toolCallCounter = new Counter({
  name: 'mcp_tool_calls_total',
  help: 'Total number of MCP tool calls',
  labelNames: ['tool_name', 'status'],
});

const toolDuration = new Histogram({
  name: 'mcp_tool_duration_seconds',
  help: 'Duration of MCP tool calls',
  labelNames: ['tool_name'],
});

// Metrics endpoint
const metricsApp = express();
metricsApp.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});
metricsApp.listen(9090);

ServiceMonitor for Prometheus Operator

yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mcp-server-monitor
spec:
  selector:
    matchLabels:
      app: mcp-server
  endpoints:
    - port: metrics
      interval: 15s

Network Policies

Restrict network access to your MCP servers:

yaml

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: mcp-server-policy
spec:
  podSelector:
    matchLabels:
      app: mcp-server
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
      ports:
        - port: 3000
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: postgres
      ports:
        - port: 5432

Multi-Server Deployment

Deploy multiple MCP servers with shared infrastructure:

yaml

# Namespace for all MCP services
apiVersion: v1
kind: Namespace
metadata:
  name: mcp-servers
---
# File server deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: file-server
  namespace: mcp-servers
spec:
  replicas: 2
  selector:
    matchLabels:
      app: file-server
  template:
    spec:
      containers:
        - name: file-server
          image: ghcr.io/your-org/mcp-file-server:latest
          ports:
            - containerPort: 3000
---
# Database server deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: db-server
  namespace: mcp-servers
spec:
  replicas: 3
  selector:
    matchLabels:
      app: db-server
  template:
    spec:
      containers:
        - name: db-server
          image: ghcr.io/your-org/mcp-db-server:latest
          ports:
            - containerPort: 3000

Security Hardening

For comprehensive MCP security guidance, see our security fundamentals and authentication tutorials.

Key Kubernetes-specific security practices:

Pod Security Standards: Use restricted security context
Network Policies: Limit pod-to-pod communication
RBAC: Minimal service account permissions
Image scanning: Scan container images for vulnerabilities

yaml

spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001
    fsGroup: 1001
  containers:
    - name: mcp-server
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop:
            - ALL

Conclusion

Kubernetes provides everything you need to run MCP servers at production scale. From auto-scaling and self-healing to secrets management and network policies, K8s handles the infrastructure so you can focus on building great MCP tools. Start with a simple deployment and add features as your needs grow.

For more deployment options, explore serverless deployment with AWS Lambda or browse our Kubernetes server examples.

Code Examples

Basic Kubernetes Deploymentyaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-server
spec:
  replicas: 2
  selector:
    matchLabels:
      app: mcp-server
  template:
    metadata:
      labels:
        app: mcp-server
    spec:
      containers:
        - name: mcp-server
          image: ghcr.io/your-org/mcp-server:latest
          ports:
            - containerPort: 3000
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "500m"

Service and Ingressyaml

apiVersion: v1
kind: Service
metadata:
  name: mcp-server
spec:
  selector:
    app: mcp-server
  ports:
    - port: 80
      targetPort: 3000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mcp-ingress
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
spec:
  rules:
    - host: mcp.example.com
      http:
        paths:
          - path: /mcp
            pathType: Prefix
            backend:
              service:
                name: mcp-server
                port:
                  number: 80

HorizontalPodAutoscaleryaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mcp-server-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcp-server
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Key Takeaways

Kubernetes provides auto-scaling, self-healing, and service discovery for MCP servers
Configure proxy timeouts in Ingress for Streamable HTTP long-lived connections
Use Kubernetes Secrets and External Secrets Operator for credential management
HorizontalPodAutoscaler scales MCP servers based on CPU, memory, or custom metrics
Network Policies and Pod Security Standards harden your MCP deployment

Troubleshooting

Pods keep restarting with CrashLoopBackOff

Check pod logs with kubectl logs <pod-name>. Common causes: missing environment variables, incorrect image tag, health check endpoint not responding. Ensure your MCP server starts correctly in the container locally before deploying to Kubernetes.

Streamable HTTP connections are being terminated

Add proxy-read-timeout and proxy-buffering annotations to your Ingress. The default nginx timeout of 60 seconds is too short for MCP streaming connections. Set it to at least 3600 seconds.

Auto-scaler not scaling up under load

Verify the metrics-server is installed in your cluster (kubectl top pods). Check that resource requests are defined in your deployment, as the HPA needs these to calculate utilization percentages.

Next Steps

Set up monitoring with Prometheus and Grafana
Implement CI/CD pipelines for automated deployments
Explore serverless alternatives with AWS Lambda
Add service mesh for advanced traffic management

Was this helpful?

Share tutorial:

Stay Updated with MCP Insights

Join 5,000+ developers and get weekly insights on MCP development, new server releases, and implementation strategies delivered to your inbox.

We respect your privacy. Unsubscribe at any time.

MCPgee Team

We write in-depth guides, tutorials, and reviews to help developers get the most out of the Model Context Protocol ecosystem.

Browse MCP Servers More Tutorials

Frequently Asked Questions

Recommended MCP Servers

Popular servers related to this tutorial that you can start using right away.

AI/MLCloud Services

37,304

Librechat MCP Server

Enhanced ChatGPT Clone: Features Agents, MCP, DeepSeek, Anthropic, AWS, OpenAI, Responses API, Azure, Groq, o1, GPT-5, M

APIsCloud Services

9,106

AWS Nova Canvas

Provides image generation capabilities using Amazon Nova Canvas through Amazon Bedrock, enabling the creation of visuals

Cloud Services

7,979

Webiny Js MCP Server

Open-source, self-hosted CMS platform on AWS serverless (Lambda, DynamoDB, S3). TypeScript framework with multi-tenancy,

APIsCloud Services

3,763

mcp-server-cloudflare

📇 ☁️ - Manage Cloudflare Workers, KV, R2, Pages, DNS, and cache from your

APIsCloud Services

2,365

skills-mcp-server

A high-performance MCP server that provides BM25-ranked search and structured access to over 1,300 AI skills, enabling c

APIsDeveloper Tools

1,609

kubernetes-mcp-server

A Model Context Protocol (MCP) server that provides safe, read-only access to Kubernetes resources for debugging and ins

Explore MCP Servers

Browse our directory of 33,000+ MCP servers. Find the perfect tools for your AI-powered workflows.

Browse All Servers Read Our Blog

MCP Server Kubernetes Deployment

MCP Server Kubernetes Deployment

Introduction

Basic Deployment

Step 1: Create the Deployment Manifest

Step 2: Create the Service

Step 3: Deploy

Exposing MCP Servers

Ingress with TLS

LoadBalancer Service

Secrets Management

Kubernetes Secrets

External Secrets Operator

Auto-Scaling

Horizontal Pod Autoscaler

Vertical Pod Autoscaler

ConfigMaps for Runtime Configuration

Rolling Updates and Rollbacks

Update Strategy

Perform an Update

Monitoring and Observability

Prometheus Metrics

ServiceMonitor for Prometheus Operator

Network Policies

Multi-Server Deployment

Security Hardening

Conclusion

Code Examples

Key Takeaways

Troubleshooting

Pods keep restarting with CrashLoopBackOff

Streamable HTTP connections are being terminated

Auto-scaler not scaling up under load

Next Steps

Was this helpful?

Stay Updated with MCP Insights

MCPgee Team

Frequently Asked Questions

Related Tutorials

Containerize MCP Servers with Docker

Serverless MCP on AWS Lambda

MCP Server Performance Optimization

Recommended MCP Servers

Librechat MCP Server

AWS Nova Canvas

Webiny Js MCP Server

mcp-server-cloudflare

skills-mcp-server

kubernetes-mcp-server

Explore MCP Servers