Observability
ToolHive provides comprehensive observability for your MCP server interactions through built-in OpenTelemetry instrumentation. You get complete visibility into how your MCP servers perform, including detailed traces, metrics, and error tracking.
How telemetry works
ToolHive automatically instruments your MCP server interactions without requiring changes to your servers. When you enable telemetry, ToolHive captures detailed information about every request, tool call, and server interaction.
ToolHive's telemetry captures rich, protocol-aware information because it understands MCP operations. You get detailed traces showing tool calls, resource access, and prompt operations rather than generic HTTP requests.
Distributed tracing
Distributed tracing shows you the complete journey of each request through your MCP servers. ToolHive creates comprehensive traces that provide end-to-end visibility across the proxy-container boundary.
Trace structure
Here's what a trace looks like when a client calls a tool in the GitHub MCP server (some fields omitted for brevity):
Span: mcp.tools/call (150ms)
├── service.name: toolhive-mcp-proxy
├── service.version: v0.1.9
├── http.duration_ms: 150.3
├── http.host: localhost:14972
├── http.method: POST
├── http.request_content_length: 256
├── http.response_content_length: 1024
├── http.status_code: 202
├── http.url: /messages?session_id=b1d22d07-b35f-4260-9c0c-b872f92f64b1
├── http.user_agent: claude-code/1.0.53
├── mcp.method: tools/call
├── mcp.request.id: 5
├── mcp.server.name: github
├── mcp.tool.arguments: owner=stacklok, repo=toolhive, pullNumber=1131
├── mcp.tool.name: create_issue
└── mcp.transport: stdio
MCP-specific traces
ToolHive automatically captures traces for all MCP operations, including:
- Tool calls (
mcp.tools/call
) - When AI assistants use tools - Resource access (
mcp.resources/read
) - When servers read files or data - Prompt operations (
mcp.prompts/get
) - When servers retrieve prompts - Connection events (
mcp.initialize
) - When clients connect to servers
Trace attributes
Each trace includes detailed context across three layers:
Service information
service.name: toolhive-mcp-proxy
service.version: v0.1.9
HTTP layer information
http.duration_ms: 150.3
http.host: localhost:14972
http.method: POST
http.request_content_length: 256
http.response_content_length: 1024
http.status_code: 202
http.url: /messages?session_id=b1d22d07-b35f-4260-9c0c-b872f92f64b1
http.user_agent: claude-code/1.0.53
MCP protocol details
Details about the MCP operation being performed (some fields are specific to each operation):
mcp.client.name: claude-code
mcp.method: tools/call
mcp.request.id: 123
mcp.server.name: github
mcp.tool.arguments: owner=stacklok, repo=toolhive, path=pkg/telemetry/middleware.go, start_index=130, max_length=1000
mcp.tool.name: get_file_contents
mcp.transport: stdio
rpc.service: mcp
rpc.system: jsonrpc
Method-specific attributes
-
mcp.tools/call
traces include:mcp.tool.name
- The name of the tool being calledmcp.tool.arguments
- Sanitized tool arguments (sensitive values redacted)
-
mcp.resources/read
traces include:mcp.resource.uri
- The URI of the resource being accessed
-
mcp.prompts/get
traces include:mcp.prompt.name
- The name of the prompt being retrieved
-
mcp.initialize
traces include:mcp.client.name
- The name of the connecting client
Metrics collection
ToolHive automatically collects metrics about your MCP server usage and performance. These metrics help you understand usage patterns, performance characteristics, and identify potential issues.
Metric labels
All metrics include consistent labels for filtering and aggregation:
server
- MCP server name (e.g.,fetch
,github
)transport
- Backend transport type (stdio
,sse
, orstreamable-http
)method
- HTTP method (POST
,GET
)mcp_method
- MCP protocol method (e.g.,tools/call
,resources/read
)status
- Request outcome (success
orerror
)status_code
- HTTP status code (200
,400
,500
)tool
- Tool name for tool-specific metrics
Key metrics
Example metrics from the Prometheus /metrics
endpoint are shown below (some
fields are omitted for brevity):
Request metrics
# HELP toolhive_mcp_requests_total Total number of MCP requests
# TYPE toolhive_mcp_requests_total counter
toolhive_mcp_requests_total{mcp_method="tools/list",method="POST",server="github",status="success",status_code="202",transport="stdio"} 2
# HELP toolhive_mcp_request_duration_seconds Duration of MCP requests in seconds
# TYPE toolhive_mcp_request_duration_seconds histogram
toolhive_mcp_request_duration_seconds_bucket{mcp_method="tools/list",method="POST",server="github",status="success",status_code="202",transport="stdio",le="10000"} 2
toolhive_mcp_request_duration_seconds_bucket{mcp_method="tools/list",method="POST",server="github",status="success",status_code="202",transport="stdio",le="+Inf"} 2
toolhive_mcp_request_duration_seconds_sum{mcp_method="tools/list",method="POST",server="github",status="success",status_code="202",transport="stdio"} 0.000219416
toolhive_mcp_request_duration_seconds_count{mcp_method="tools/list",method="POST",server="github",status="success",status_code="202",transport="stdio"} 2
Connection metrics
# HELP toolhive_mcp_active_connections Number of active MCP connections
# TYPE toolhive_mcp_active_connections gauge
toolhive_mcp_active_connections{connection_type="sse",server="github",transport="stdio"} 3
Tool-specific metrics
# HELP toolhive_mcp_tool_calls_total Total number of MCP tool calls
# TYPE toolhive_mcp_tool_calls_total counter
toolhive_mcp_tool_calls_total{server="github",status="success",tool="get_file_contents"} 15
toolhive_mcp_tool_calls_total{server="github",status="success",tool="list_pull_requests"} 4
toolhive_mcp_tool_calls_total{server="github",status="success",tool="search_issues"} 2
Export options
ToolHive supports multiple export formats to integrate with your existing observability infrastructure.
OTLP export
ToolHive supports OpenTelemetry Protocol (OTLP) export for both traces and metrics to any compatible backend, either directly or via a collector application.
The OpenTelemetry ecosystem includes a wide range of observability backends including open source solutions like Jaeger, self-hosted solutions like Splunk, and SaaS solutions like Datadog, New Relic, and Honeycomb.
Prometheus export
ToolHive can expose Prometheus-style metrics at a /metrics
endpoint, enabling:
- Direct scraping by Prometheus servers
- Service discovery in Kubernetes environments
- Integration with existing Prometheus-based monitoring stacks
Dual export
Both OTLP and Prometheus can be enabled simultaneously, allowing you to:
- Send traces to specialized tracing backends
- Expose metrics for Prometheus scraping
- Maintain compatibility with existing monitoring infrastructure
Data sanitization
ToolHive automatically protects sensitive information in traces:
- Sensitive arguments: Tool arguments containing passwords, tokens, or keys are redacted
- Sensitive key detection: Arguments with keys containing patterns like "password", "token", "secret", "key", "auth", or "credential" are redacted
- Argument truncation: Long arguments are truncated to prevent excessive trace size
For example, a tool call with sensitive arguments:
mcp.tool.arguments: password=secret123, api_key=abc456, title=Bug report
Is sanitized in the trace as:
mcp.tool.arguments: password=[REDACTED], api_key=[REDACTED], title=Bug report
Monitoring examples
These examples show how ToolHive's observability works in practice.
Tool call monitoring
When a client calls the create_issue
tool:
Request:
{
"jsonrpc": "2.0",
"id": "req_456",
"method": "tools/call",
"params": {
"name": "create_issue",
"arguments": {
"title": "Bug report",
"body": "Found an issue with the API"
}
}
}
Generated trace:
Span: mcp.tools/call
├── mcp.method: tools/call
├── mcp.request.id: req_456
├── mcp.tool.name: create_issue
├── mcp.tool.arguments: title=Bug report, body=Found an issue with the API
├── mcp.server.name: github
├── mcp.transport: sse
├── http.method: POST
├── http.status_code: 200
└── duration: 850ms
Generated metrics:
toolhive_mcp_requests_total{mcp_method="tools/call",server="github",status="success"} 1
toolhive_mcp_request_duration_seconds{mcp_method="tools/call",server="github"} 0.85
toolhive_mcp_tool_calls_total{server="github",tool="create_issue",status="success"} 1
Error tracking
Failed requests generate error traces and metrics:
Error trace:
Span: mcp.tools/call
├── mcp.method: tools/call
├── mcp.tool.name: invalid_tool
├── http.status_code: 400
├── span.status: ERROR
├── span.status_message: Tool not found
└── duration: 12ms
Error metrics:
toolhive_mcp_requests_total{mcp_method="tools/call",server="github",status="error",status_code="400"} 1
toolhive_mcp_tool_calls_total{server="github",tool="invalid_tool",status="error"} 1
Key performance indicators
Monitor these key metrics for optimal MCP server performance:
- Request rate:
rate(toolhive_mcp_requests_total[5m])
- Error rate:
rate(toolhive_mcp_requests_total{status="error"}[5m])
- Response time:
histogram_quantile(0.95, toolhive_mcp_request_duration_seconds_bucket)
- Active connections:
toolhive_mcp_active_connections
Setting up dashboards and alerts
This section shows practical examples of integrating ToolHive's observability data with common monitoring tools.
Prometheus integration
Configure Prometheus to scrape ToolHive metrics:
scrape_configs:
- job_name: 'toolhive-mcp-proxy'
static_configs:
- targets: [
'localhost:43832', # Example MCP server
'localhost:51712' # Example MCP server
]
scrape_interval: 15s
metrics_path: /metrics
Grafana dashboard queries
Example queries for monitoring dashboards:
# Request rate by server
sum(rate(toolhive_mcp_requests_total[5m])) by (server)
# Error rate percentage
sum(rate(toolhive_mcp_requests_total{status="error"}[5m])) by (server) /
sum(rate(toolhive_mcp_requests_total[5m])) by (server) * 100
# Response time percentiles
histogram_quantile(0.95, sum(rate(toolhive_mcp_request_duration_seconds_bucket[5m])) by (le, server))
# Tool usage distribution
sum(rate(toolhive_mcp_tool_calls_total[5m])) by (tool, server)
# Active connections
toolhive_mcp_active_connections
Alerting rules
Example Prometheus alerting rules:
groups:
- name: toolhive-mcp-proxy
rules:
- alert: HighErrorRate
expr: rate(toolhive_mcp_requests_total{status="error"}[5m]) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: 'High error rate in MCP proxy'
description: 'Error rate is {{ $value }} errors per second'
- alert: HighResponseTime
expr:
histogram_quantile(0.95, toolhive_mcp_request_duration_seconds_bucket)
> 2.0
for: 5m
labels:
severity: warning
annotations:
summary: 'High response time in MCP proxy'
description: '95th percentile response time is {{ $value }}s'
- alert: ProxyDown
expr: up{job="toolhive-mcp-proxy"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: 'MCP proxy is down'
description: 'ToolHive MCP proxy has been down for more than 1 minute'
Recommendations
Production deployment
- Use appropriate sampling rates (1-10% for high-traffic systems)
- Configure authentication for OTLP endpoints
- Use HTTPS transport in production
- Monitor telemetry overhead with metrics
- Set up alerting on key performance indicators
Development and testing
- Use 100% sampling for complete visibility
- Enable local backends (Jaeger, Prometheus)
- Test with realistic workloads to validate metrics
- Verify trace correlation across service boundaries
Cost optimization
- Tune sampling rates based on traffic patterns
- Use head-based sampling for consistent trace collection
- Monitor backend costs and adjust accordingly
- Filter out health check requests if not needed
Next steps
Now that you understand how ToolHive's observability works, you can:
- Choose a monitoring backend that fits your needs and budget
- Enable telemetry when running your servers:
- using the ToolHive CLI
- using the Kubernetes operator (not yet supported - contributions welcome)
- Set up basic dashboards to track request rates, error rates, and response times
- Configure alerts for critical issues
The telemetry system works automatically once enabled, providing immediate insights into your MCP server performance and usage patterns.