Discover the best MCP servers for DevOps workflows. Manage Docker containers, orchestrate Kubernetes clusters, plan Terraform infrastructure, monitor with Grafana, and automate CI/CD with GitHub - all from your AI editor.
DevOps is inherently a multi-tool discipline. On any given day, a DevOps engineer might check container health in Docker, review pod status in Kubernetes, plan infrastructure changes in Terraform, monitor dashboards in Grafana, and investigate alerts in Datadog - all while managing CI/CD pipelines in GitHub. Context switching between these tools is the single biggest productivity drain.
MCP servers eliminate this context switching by connecting your AI assistant directly to your infrastructure tools. Instead of opening five different dashboards, you can ask your AI to check the status of your staging deployment, review the Terraform plan for a new VPC, and investigate why a Kubernetes pod is crash-looping - all in a single conversation.
This guide covers nine essential MCP servers for DevOps, organized by workflow stage: container management, infrastructure as code, cloud providers, CI/CD, and observability. We include detailed workflows for incident response, deployment pipelines, infrastructure audits, and monitoring setup, complete with the actual commands your AI will generate.
The Docker MCP server gives your AI direct access to your Docker environment. It can list running containers, inspect logs, start and stop services, and even help you build optimized Dockerfiles.
{
"mcpServers": {
"docker": {
"command": "npx",
"args": ["-y", "@anthropic/docker-mcp"]
}
}
}
"Show me all running containers, check if the api-gateway container is healthy, and show the last 50 lines of logs from the payment-service container."
For teams running production workloads on Kubernetes, the Kubernetes MCP server brings cluster operations into your AI editor. Check pod status, describe deployments, read events, and troubleshoot issues without switching to a terminal or dashboard.
{
"mcpServers": {
"kubernetes": {
"command": "npx",
"args": ["-y", "@anthropic/kubernetes-mcp"],
"env": {
"KUBECONFIG": "/Users/devops/.kube/config"
}
}
}
}
"Check the status of all pods in the production namespace. If any pods are in CrashLoopBackOff, show me their logs and recent events."
The Terraform MCP server connects your AI to your Terraform configurations and state. It can review plans, suggest resource configurations, check for drift, and help you write secure, efficient infrastructure code.
"Review this Terraform plan for our new VPC setup. Flag any security concerns, check that the CIDR ranges do not overlap with existing VPCs, and suggest cost optimizations."
Each major cloud provider has its own MCP server that connects your AI to cloud-specific services and APIs.
The AWS MCP server provides access to AWS services including EC2, S3, Lambda, RDS, and more. It can check instance status, list S3 buckets, review Lambda function configurations, and help debug CloudWatch logs.
The GCP MCP server connects to Google Cloud Platform services. It can manage Compute Engine instances, query BigQuery datasets, check Cloud Run service status, and review IAM policies.
The Azure MCP server provides access to Azure services including Virtual Machines, Azure Functions, Cosmos DB, and Azure DevOps. It helps manage resources across subscriptions and resource groups.
"Check the health of our production EC2 instances in us-east-1, list any S3 buckets with public access enabled, and show me the last 5 Lambda invocation errors for the payment-processor function."
The GitHub MCP server connects your AI to your repositories, pull requests, issues, and - critically - your CI/CD pipelines via GitHub Actions. It can check workflow runs, review PRs, and help debug failed builds.
"Check the status of the latest CI/CD runs on the main branch. If any have failed, show me the failing step logs and suggest a fix."
The Grafana MCP server brings your monitoring dashboards into your AI conversation. Query metrics, check alert status, and analyze trends without navigating complex dashboard UIs.
"Check if there are any active critical alerts in Grafana. Then show me the CPU and memory usage trends for the api-gateway service over the last 24 hours."
For teams using Datadog for observability, the Datadog MCP server provides access to metrics, traces, logs, and monitors directly from your AI editor.
"Show me all triggered monitors in Datadog. For any critical monitors, pull the related logs from the last hour and help me identify the root cause."
| Server | Category | Best For | Setup Difficulty |
|---|---|---|---|
| Docker | Containers | Local development | Easy |
| Kubernetes | Orchestration | Production clusters | Medium |
| Terraform | IaC | Infrastructure planning | Medium |
| AWS | Cloud | AWS resources | Medium |
| GCP | Cloud | GCP resources | Medium |
| Azure | Cloud | Azure resources | Medium |
| GitHub | CI/CD | Pipeline management | Easy |
| Grafana | Monitoring | Metrics and alerts | Medium |
| Datadog | Observability | Full-stack observability | Medium |
Incident response is where MCP servers deliver the most dramatic time savings. When a production alert fires at 2 AM, the last thing you want is to open six different dashboards while your brain is still booting up. With MCP servers, you can investigate the incident through a single AI conversation that queries all your infrastructure tools simultaneously.
Start by understanding what triggered the alert. Use Grafana MCP or Datadog MCP to check the current alert status and recent metric changes.
"Show me all critical alerts in Grafana that fired in the last 30 minutes. For each alert, show the metric that triggered it, the threshold, and the current value."
Your AI might respond with something like: "There are 2 critical alerts: (1) api-gateway p99 latency at 4,200ms (threshold: 2,000ms), triggered 12 minutes ago. (2) payment-service error rate at 8.3% (threshold: 1%), triggered 8 minutes ago."
Immediately check the health of the affected services using Kubernetes MCP. Your AI will generate and execute the appropriate kubectl commands behind the scenes.
"Check the status of all pods in the production namespace related to api-gateway and payment-service. Are any pods restarting? What do the recent events show?"
Behind the scenes, your AI runs commands equivalent to:
kubectl get pods -n production -l app=api-gateway
kubectl get pods -n production -l app=payment-service
kubectl get events -n production --sort-by='.lastTimestamp' --field-selector reason=BackOff
kubectl describe pod payment-service-7d8f6b5c4-x9k2m -n production
Check GitHub MCP for recent deployments that might have caused the issue.
"Show me the last 5 merged PRs on the main branch of the api-gateway and payment-service repositories. Were any of them deployed in the last 2 hours?"
Use Datadog MCP to search logs for the root cause.
"Search Datadog logs for ERROR level entries from payment-service in the last 30 minutes. Group by error message and show the count for each. Include a sample stack trace for the most common error."
Based on the investigation, take action. If a recent deployment caused the issue, use Kubernetes MCP to initiate a rollback.
"Show me the rollout history for the payment-service deployment in the production namespace. What was the previous image version? Generate the kubectl command to rollback to the previous version."
Your AI generates:
kubectl rollout history deployment/payment-service -n production
kubectl rollout undo deployment/payment-service -n production
kubectl rollout status deployment/payment-service -n production
A well-structured deployment pipeline catches issues before they reach production. MCP servers let you build an AI-assisted deployment checklist that queries every layer of your stack before, during, and after deployment.
Before deploying, use multiple MCP servers to verify readiness across your entire stack.
"Run our pre-deployment checklist: (1) Check GitHub Actions - are all CI checks green on the release/v2.4.0 branch? (2) Check Kubernetes - do we have enough available resources in the production cluster for a rolling update? (3) Check Grafana - is the current error rate below 0.1% and p99 latency below 500ms? (4) Check Terraform - is there any infrastructure drift in the production workspace?"
During a canary deployment, MCP servers let you monitor the canary in real time and compare its metrics against the stable version.
"Monitor the canary deployment of payment-service v2.4.0 in the production namespace. Compare the canary pod's error rate and p99 latency against the stable pods over the last 15 minutes. Alert me if the canary's error rate exceeds 2x the stable rate."
After deployment, run a comprehensive health check across all systems.
"Run post-deployment verification: (1) All pods in the production namespace are Running and Ready. (2) No new error-level logs in the last 5 minutes. (3) Grafana metrics show error rate and latency are within normal ranges. (4) No triggered alerts. Report any issues."
Regular infrastructure audits catch security vulnerabilities, cost inefficiencies, and configuration drift before they become problems. MCP servers make it possible to audit your entire infrastructure stack in a single AI conversation.
Use AWS MCP (or your cloud provider's server) to scan for common security misconfigurations.
"Audit our AWS security posture: (1) List all S3 buckets and flag any with public access. (2) Check EC2 security groups for rules that allow 0.0.0.0/0 on any port other than 80 and 443. (3) List IAM users that have not rotated their access keys in 90 days. (4) Check for RDS instances that are publicly accessible."
"Analyze our AWS infrastructure for cost optimization: (1) Find EC2 instances with average CPU utilization below 10% over the last 30 days - these are candidates for rightsizing. (2) List EBS volumes that are not attached to any instance. (3) Find S3 buckets with no lifecycle policy that have more than 100GB of data. (4) Check for idle Elastic Load Balancers with zero connections in the last 7 days."
Use Kubernetes MCP to audit your cluster configuration.
"Audit the production Kubernetes cluster: (1) Find pods running without resource limits set. (2) List deployments with only 1 replica (no high availability). (3) Check for pods using the 'latest' image tag. (4) Find services of type LoadBalancer that might be exposable. (5) List namespaces with no network policies defined."
Your AI generates the equivalent of:
kubectl get pods -A -o json | jq '.items[] | select(.spec.containers[].resources.limits == null)'
kubectl get deployments -A -o json | jq '.items[] | select(.spec.replicas == 1)'
kubectl get pods -A -o json | jq '.items[] | select(.spec.containers[].image | endswith(":latest"))'
kubectl get networkpolicies -A
Setting up monitoring for a new service involves defining what to measure, configuring dashboards, and setting alert thresholds. MCP servers help you design monitoring setups by analyzing your existing infrastructure and recommending metrics based on best practices.
Start by having your AI inspect the service you want to monitor using Kubernetes MCP and Docker MCP.
"Inspect the new user-auth service deployment in the staging namespace. What container ports are exposed? What health check endpoints are configured? Based on this service's architecture, recommend a monitoring setup following the RED method (Rate, Errors, Duration) and the USE method (Utilization, Saturation, Errors)."
Use Grafana MCP to analyze existing services and recommend alert thresholds for the new service.
"Query the p99 latency and error rate for our existing authentication services over the last 30 days. Based on the baseline, recommend alert thresholds for the new user-auth service. Use 3x the average as the warning threshold and 5x as the critical threshold."
Here is a complete configuration that connects six DevOps MCP servers for a comprehensive infrastructure management setup. This is what a production-ready DevOps MCP configuration looks like:
{
"mcpServers": {
"docker": {
"command": "npx",
"args": ["-y", "@anthropic/docker-mcp"]
},
"kubernetes": {
"command": "npx",
"args": ["-y", "@anthropic/kubernetes-mcp"],
"env": {
"KUBECONFIG": "/home/devops/.kube/config"
}
},
"github": {
"command": "npx",
"args": ["-y", "@anthropic/github-mcp"],
"env": {
"GITHUB_TOKEN": "ghp_your-token-here"
}
},
"grafana": {
"command": "npx",
"args": ["-y", "@anthropic/grafana-mcp"],
"env": {
"GRAFANA_URL": "https://grafana.yourcompany.com",
"GRAFANA_API_KEY": "your-grafana-api-key"
}
},
"datadog": {
"command": "npx",
"args": ["-y", "@anthropic/datadog-mcp"],
"env": {
"DD_API_KEY": "your-datadog-api-key",
"DD_APP_KEY": "your-datadog-app-key"
}
},
"aws": {
"command": "npx",
"args": ["-y", "@aws-labs/mcp"],
"env": {
"AWS_PROFILE": "production-readonly",
"AWS_REGION": "us-east-1"
}
}
}
}
Note the use of a read-only AWS profile. For production infrastructure, always use read-only credentials with your MCP servers. This prevents accidental modifications while still giving you full visibility into your infrastructure.
One of the most valuable aspects of DevOps MCP servers is that your AI generates real, production-ready commands. Here are examples of what Claude generates when you ask common DevOps questions:
When you ask "Why is the payment service returning 503 errors?", your AI generates and executes:
kubectl get pods -n production -l app=payment-service -o wide
kubectl logs payment-service-7d8f6b5c4-x9k2m -n production --tail=100
kubectl describe pod payment-service-7d8f6b5c4-x9k2m -n production
kubectl get events -n production --field-selector involvedObject.name=payment-service-7d8f6b5c4-x9k2m
kubectl get hpa -n production payment-service
When you ask "Plan a new Redis cluster for our staging environment", your AI generates Terraform code like:
resource "aws_elasticache_replication_group" "staging_redis" {
replication_group_id = "staging-redis"
description = "Redis cluster for staging environment"
node_type = "cache.t3.medium"
num_cache_clusters = 2
port = 6379
subnet_group_name = aws_elasticache_subnet_group.staging.name
security_group_ids = [aws_security_group.redis_staging.id]
automatic_failover_enabled = true
at_rest_encryption_enabled = true
transit_encryption_enabled = true
}
When you ask "The dbt container keeps crashing, what is wrong?", your AI generates:
docker ps -a --filter name=dbt
docker logs dbt-runner --tail 200
docker inspect dbt-runner --format '{{.State.ExitCode}}'
docker inspect dbt-runner --format '{{.State.OOMKilled}}'
docker stats dbt-runner --no-stream
Here is a realistic deployment workflow using multiple MCP servers:
MCP servers for DevOps work best in code-oriented editors:
Start with Docker MCP and GitHub MCP - they cover the most common daily tasks with minimal configuration overhead. Add Kubernetes MCP when you manage clusters, and Grafana or Datadog for monitoring visibility. For data pipeline-specific DevOps workflows, see our data engineering guide.
Explore other ways teams use MCP servers.
Find the best MCP servers for academic and market research. Search the web semantically with Exa, store papers in Google Drive, analyze structured data with PostgreSQL, and maintain persistent research notes.
Discover the best MCP servers for customer support teams. Connect Slack for team communication, Jira for ticket tracking, HubSpot for CRM, Notion for knowledge bases, and Zapier for workflow automation.
Discover the best MCP servers for SEO workflows. Analyze SERPs with Brave Search, audit pages with Puppeteer, manage reports in Google Drive, process logs with Filesystem, and query analytics data with PostgreSQL.
Browse our server directory, read setup guides for your editor, and start building your devops workflow today.