Pitfalls
MCP server issues fall into predictable categories. Most are configuration problems, transport misunderstandings, or missing timeout handling. This page covers every failure mode documented in production use, with diagnosis and fix for each.
Server Startup Failures
Symptom: Server shows as "offline" in /mcp. Claude cannot access any of its tools.
| Cause | Fix |
|---|---|
Missing cmd /c on Windows | Use cmd /c npx -y @some/package for all npx-based servers on native Windows |
| Wrong TypeScript module resolution | Set moduleResolution: NodeNext in tsconfig.json |
| Port conflict | Check with lsof -i :PORT (Unix) or netstat -ano | findstr :PORT (Windows) |
| Missing dependencies | Run npm install before starting the server |
| protocolVersion handshake bug | Claude Code has had issues sending the proper protocolVersion field. A server that works with MCP Inspector may still fail with Claude Code. |
Diagnosis steps:
# Check server status in Claude Code
/mcp
# Test the server standalone (should start without errors)
npx -y @some/mcp-server
# Start Claude with debug logging
claude --debug-file /tmp/claude.log
# In another terminal:
tail -f /tmp/claude.logIf the server starts fine standalone but fails inside Claude Code, the issue is almost always in the handshake or transport layer, not your server code.
Timeout Traps
The 16-Hour Hang
A documented production case: Claude Code hung silently for 16+ hours with no warnings, spawning 70+ zombie processes. There is no built-in watchdog for unresponsive MCP servers. If a tool never returns, Claude waits indefinitely.
MCP_TIMEOUT Ignored for HTTP
Claude Code has known issues where MCP_TIMEOUT settings are not respected for HTTP/SSE servers. The client may use default timeout values regardless of your configuration, causing premature disconnections on servers that need longer startup times.
Mitigation: Always Implement Your Own Timeouts
Do not rely on Claude Code's timeout handling. Implement timeouts in every tool handler:
server.registerTool("analytics-query", {
description: "Execute an analytics query (30s timeout)",
inputSchema: z.object({ query: z.string() })
}, async ({ query }): Promise<CallToolResult> => {
const timeout = new Promise((_, reject) =>
setTimeout(() => reject(new Error("Query timed out after 30s")), 30000)
);
try {
const result = await Promise.race([executeQuery(query), timeout]);
return { content: [{ type: "text", text: JSON.stringify(result) }] };
} catch (error) {
return {
content: [{ type: "text", text: error.message }],
isError: true
};
}
});For operations that legitimately take longer than 30 seconds, split them into start/poll pairs:
server.registerTool("start-export", { /* ... */ },
async ({ query }) => {
const jobId = await startExport(query);
return { content: [{ type: "text", text: `Export started. Job ID: ${jobId}. Use check-export to poll status.` }] };
}
);
server.registerTool("check-export", { /* ... */ },
async ({ jobId }) => {
const status = await checkExportStatus(jobId);
return { content: [{ type: "text", text: JSON.stringify(status) }] };
}
);Transport Layer Issues
stdout Pollution (stdio)
The most common MCP failure. Any non-JSON-RPC output on stdout corrupts the protocol stream.
Sources of pollution:
console.log()calls in TypeScript (useconsole.error()instead)print()calls in Python (useloggingwithstream=sys.stderr)- Library warnings that default to stdout
- Dependency initialization messages
// WRONG: corrupts protocol
console.log("Debug: processing request");
// CORRECT: goes to stderr
console.error("Debug: processing request");# WRONG: corrupts protocol
print("Debug: processing request")
# CORRECT: logs to stderr
import sys, logging
logging.basicConfig(stream=sys.stderr)
logging.info("Debug: processing request")A full Python logging setup that keeps stdout clean for the MCP protocol:
import sys
import logging
# Configure ALL logging to stderr before importing anything else
logging.basicConfig(
stream=sys.stderr,
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s"
)
# Suppress noisy third-party loggers
logging.getLogger("urllib3").setLevel(logging.WARNING)
logging.getLogger("httpx").setLevel(logging.WARNING)
# Redirect any stray print() calls to stderr
# This catches prints from dependencies you don't control
import builtins
_original_print = builtins.print
def _stderr_print(*args, **kwargs):
kwargs.setdefault('file', sys.stderr)
_original_print(*args, **kwargs)
builtins.print = _stderr_print
logger = logging.getLogger("mcp-server")
logger.info("Server starting — this goes to stderr, not stdout")Debugging tip: If you see "Unexpected token" errors in Claude Code's debug log, stdout pollution is the first thing to check. Even a single print statement from a third-party dependency can break the entire server.
stdout Buffering
OS-level buffering can delay message delivery on stdio transport. Some languages buffer stdout by default. In Python, use PYTHONUNBUFFERED=1 or sys.stdout.flush() after each write. In Node.js, stdout is unbuffered to TTYs but may buffer when piped -- which is exactly how MCP uses it.
SSE Connection Drops
SSE connections are fragile. Proxies, load balancers, and firewalls drop idle connections. The deprecated SSE transport has no built-in reconnection. Streamable HTTP is more resilient. If you must use SSE, implement reconnection logic in your client.
"Connection closed" on Windows
Missing cmd /c wrapper for npx on native Windows. The fix is mechanical:
# Fails on Windows
claude mcp add --transport stdio my-server -- npx -y @some/package
# Works on Windows
claude mcp add --transport stdio my-server -- cmd /c npx -y @some/packageTool Schema Problems
Poor Descriptions
Claude selects tools based on names and descriptions. A vague description results in the tool never being selected or being selected for the wrong task.
// BAD: Claude cannot determine when to use this
server.registerTool("query", {
description: "Run a query"
});
// GOOD: Claude knows exactly when and how to use this
server.registerTool("query-analytics", {
description: "Execute a read-only SQL query against the analytics database. Returns up to 1000 rows. Use for aggregate metrics, user behavior analysis, and funnel data. Do NOT use for user PII lookups (use query-users instead)."
});Write descriptions as if briefing a new engineer who needs to pick the right tool from a list of 50. Specify what the tool does, what data it accesses, what its limits are, and when to use an alternative.
Description Truncation
Claude Code truncates tool descriptions at 2KB and server instructions at 2KB. If your description exceeds this, the end gets cut off. Put the most important information -- use cases, constraints, and alternatives -- in the first paragraph.
Security Risks
Overly Permissive Servers
A database MCP server with write access when only reads are needed. This is the most common security mistake. Always use read-only credentials for any server connected to production data.
# WRONG: full access to production database
--dsn "postgresql://admin:adminpass@prod.db.com:5432/main"
# CORRECT: read-only user with limited schema access
--dsn "postgresql://readonly:pass@prod.db.com:5432/analytics"Environment Variable Leakage
MCP servers can read all environment variables in the process. Secrets intended for other tools are visible to every server. Use dedicated service accounts with minimal credentials per server.
Credential Anti-Patterns
| Anti-Pattern | Risk | Alternative |
|---|---|---|
| Hardcoded API keys in server code | Keys in source control | Use ${VAR} expansion in .mcp.json |
Committed .mcp.json with secrets | Secrets in git history | Use env var references, set vars externally |
.env files for production servers | Client may spawn server from a different directory | Runtime injection via secrets manager |
| Shared credentials across servers | One compromise exposes all | Dedicated service accounts per server |
Credential rotation: Set 30-90 day rotation schedules. Consider runtime injection from secrets managers (Doppler, 1Password CLI, HashiCorp Vault) using headersHelper or environment variable setup scripts.
A .mcp.json that pulls credentials from 1Password at connection time instead of storing them in env vars:
{
"mcpServers": {
"prod-db": {
"command": "bash",
"args": [
"-c",
"export DATABASE_URL=$(op read 'op://Engineering/prod-db/connection-string'); npx -y @bytebase/dbhub --dsn \"$DATABASE_URL\" --read-only"
]
},
"internal-api": {
"type": "http",
"url": "https://mcp.internal.example.com",
"headersHelper": "bash -c 'echo \"{\\\"Authorization\\\": \\\"Bearer $(op read op://Engineering/mcp-token/credential)\\\"}\"'"
}
}
}The op read call fetches a fresh credential on each server start. No secrets in .env files, no secrets in git history.
Network Binding
Some servers bind to 0.0.0.0 by default, making them accessible to every device on your local network. Verify binding addresses for any server that opens HTTP ports. Bind to 127.0.0.1 (localhost only) unless remote access is explicitly required.
Performance
Slow Tools Block Everything
Claude waits synchronously for tool results. A tool that takes 60 seconds freezes the entire conversation for 60 seconds. There is no parallel tool execution -- each call blocks the agent.
Mitigation strategies:
- Timeouts in every handler (30 seconds max recommended)
- Partial results with truncation flag rather than processing everything
- Pagination for large result sets
anthropic/maxResultSizeCharsannotation for tools that legitimately return large outputs- Start/poll pairs for inherently slow operations
Context Window Bloat from Tool Outputs
Large tool outputs consume context rapidly. A tool returning 20,000 tokens of database rows leaves less room for conversation history and reasoning. Design tools to return summaries, aggregates, or paginated results. Use the output size annotations to set appropriate limits per tool.
Crash Recovery
Claude Code does not automatically restart crashed MCP servers. A crashed server becomes unavailable for the remainder of the session.
Recommendations:
- Catch all exceptions in tool handlers -- return
isError: trueinstead of throwing - Use process supervisors (pm2, systemd) for critical stdio servers in persistent environments
- Monitor server health via PostToolUse hooks that detect repeated failures
- Reconnect manually via
/mcpin Claude Code when a server goes offline
A pm2 ecosystem file for managing critical MCP servers:
// ecosystem.config.js
module.exports = {
apps: [
{
name: "mcp-db",
script: "npx",
args: ["-y", "@bytebase/dbhub", "--dsn", process.env.DATABASE_URL],
autorestart: true,
max_restarts: 10,
restart_delay: 2000,
env: {
DATABASE_URL: "postgresql://readonly:pass@localhost:5432/analytics",
NODE_ENV: "production"
}
},
{
name: "mcp-internal",
script: "node",
args: ["./mcp-servers/internal-api/dist/index.js"],
autorestart: true,
max_restarts: 5,
restart_delay: 3000,
error_file: "/var/log/mcp/internal-error.log",
out_file: "/dev/null" // stdout must stay clean for stdio transport
}
]
};A PostToolUse hook that detects repeated MCP failures and logs a warning:
{
"hooks": {
"PostToolUse": [
{
"matcher": "mcp__.*",
"hooks": [
{
"type": "command",
"command": "bash -c 'INPUT=$(cat); IS_ERROR=$(echo \"$INPUT\" | jq -r \".tool_output.isError // false\"); TOOL=$(echo \"$INPUT\" | jq -r \".tool_name\"); if [ \"$IS_ERROR\" = \"true\" ]; then FAIL_FILE=\"/tmp/mcp-failures-${TOOL}\"; COUNT=$(cat \"$FAIL_FILE\" 2>/dev/null || echo 0); COUNT=$((COUNT + 1)); echo $COUNT > \"$FAIL_FILE\"; if [ $COUNT -ge 3 ]; then echo \"WARNING: $TOOL has failed $COUNT consecutive times. Server may be unhealthy. Check /mcp status.\" >&2; fi; else rm -f \"/tmp/mcp-failures-${TOOL}\" 2>/dev/null; fi'"
}
]
}
]
}
}Debugging MCP Communication
MCP Inspector
The official debugging tool provides a browser UI for testing servers in isolation:
npx @modelcontextprotocol/inspector node build/index.js
# Opens browser UI at http://localhost:6274The Inspector shows tool listings, resource browsing, prompt testing, and a live message log. Test your server here before connecting it to Claude Code. If it works in the Inspector but fails in Claude Code, the issue is in the handshake or transport integration.
Claude Code Debug Log
# Start with debug logging
claude --debug-file /tmp/claude.log
# Or enable mid-session
/debugThe debug log shows every JSON-RPC message between Claude Code and MCP servers. Filter for your server name to see the exact protocol exchange.
In-Session Checks
/mcp-- view all server statuses, authenticate, reconnect failed servers- Ask Claude to list available MCP tools -- verifies tool discovery is working
- Check
/mcpafter an error to see if the server is still connected
Version Compatibility
The MCP specification has breaking changes on roughly 3-month cycles:
| Date | Change |
|---|---|
| 2025-03-26 | Added JSON-RPC batching |
| 2025-06-18 | Removed JSON-RPC batching, added structured tool outputs, added elicitation, enhanced OAuth, deprecated SSE transport |
SDK version risks:
- Python SDK V1 to V2 is a breaking migration (
FastMCPrenamed toMcpServer) - TypeScript SDK maintains backwards compatibility within major versions
- Minor SDK updates may add features but should not break existing servers
Recommendations: Pin SDK versions explicitly in package.json or requirements.txt. Test against MCP Inspector after every SDK update. Monitor the MCP specification changelog before upgrading.