MCP as Observability Interface: Connecting AI Agents to Kernel Tracepoints - News

TL;DR

MCP is becoming the interface between AI agents and infrastructure data. Datadog shipped an MCP Server connecting dashboards to AI agents. Qualys flagged MCP servers as the new shadow IT risk. We think both are right, and we think the architecture should go further: the MCP server should not wrap an existing observability platform. It should BE the observability layer. This post explores how MCP can serve as a direct observability interface to kernel tracepoints, bypassing traditional metric pipelines entirely.

Three signals in one week

Three things happened in the same week of March 2026 that signal where observability is headed.

Datadog shipped an MCP Server. Their implementation connects real-time observability data to AI agents for automated detection and remediation. An AI agent can now query Datadog dashboards, pull metrics, and trigger responses through the Model Context Protocol. This is a big company validating a small protocol.

Qualys published a security analysis of MCP servers. Their TotalAI team called MCP servers “the new shadow IT for AI” and found that over 53% of servers rely on static secrets for authentication. They recommended adding observability to MCP servers: logging capability discovery events, monitoring invocation patterns, alerting on anomalies.

Cloud Native Now covered eBPF for Kubernetes network observability. Microsoft Retina deploys as a DaemonSet, captures network telemetry via eBPF without application changes, and provides kernel-level drop reasons. The article draws a clear line between “monitoring” (predefined questions) and “observability” (asking questions nobody planned for).

The thread connecting all three: AI agents need direct access to infrastructure telemetry, and MCP is becoming the way they get it.

Two approaches to MCP observability

There are two ways to connect observability data to AI agents via MCP.

Approach 1: Wrap existing platforms. Datadog’s strategy. Take existing metrics, logs, and traces, already collected and aggregated, and expose them through MCP tools. The AI agent queries the dashboard API, gets pre-processed data, and acts on it. This makes sense for teams with a mature observability stack that want to add AI-powered automation on top.

Approach 2: Build MCP-native observability. This is what we did with the tracer. Instead of wrapping an existing platform, we built an eBPF agent that traces CUDA Runtime and Driver APIs via uprobes, stores the results in SQLite, and exposes everything through 7 MCP tools. The MCP interface is not an adapter layer; it is the primary interface.

Neither approach is wrong. They solve different problems.

The wrapper approach works well for aggregate analysis: “What was the p99 latency for service X over the last hour?” The data is already summarized, indexed, and queryable.

The native approach works better for root-cause investigation: “Why did this specific GPU request take 14.5x longer than expected?” That requires raw kernel events, CUDA call stacks, and causal chains – not summaries. The AI agent needs to drill down, not roll up.

What MCP-native observability looks like in practice

Here is a concrete example. We traced a vLLM TTFT regression where the first token took 14.5x longer than baseline. The trace database captured every CUDA API call, every kernel context switch, every memory allocation.

When Claude connects to the MCP server and loads this database, it can:

  1. get_trace_stats – See the full trace summary: 12,847 CUDA events, 4 causal chains, total GPU time
  2. get_causal_chains – Read the causal chains that explain why latency spiked, in plain English
  3. run_sql – Run custom queries against the raw event data (“show me all cudaMemcpyAsync calls over 100ms”)
  4. get_stacks – Inspect call stacks for any flagged event

Claude identified the root cause in under 30 seconds: logprobs computation was blocking the decode loop, creating a 256x slowdown on the critical path. That root cause was not visible in any aggregate metric. It only appeared in the raw causal chain between specific CUDA API calls.

A dashboard MCP adapter could not have found this. The data granularity does not survive aggregation.

The security angle matters too

Qualys raised valid concerns about MCP server security. Their finding that 53% of servers rely on static secrets is alarming. Their recommendation to log discovery and invocation events is exactly right.

For MCP servers that touch GPU infrastructure, the attack surface is different. An MCP server with access to CUDA traces can expose timing information, memory layouts, and model architecture details. The security model needs to account for this.

In Ingero, every MCP tool invocation is traced. The same eBPF infrastructure that captures GPU events also captures the MCP interaction itself. This is not a separate logging layer; it is the same observability pipeline. Qualys’s recommendation to “add observability to MCP servers” becomes trivial when the MCP server already IS an observability tool.

Where this is going

We think the MCP-native pattern will expand beyond GPU observability. Consider:

  • Network observability: Instead of wrapping Prometheus in an MCP layer, build an eBPF-based network agent that exposes packet-level data directly to AI agents (Microsoft Retina is halfway there).
  • Security observability: Instead of wrapping a SIEM, build an MCP server that traces syscalls and exposes security events in real time.
  • Cost observability: Instead of querying a cloud billing API through MCP, instrument the actual resource allocation and expose it directly.

The pattern is the same: skip the dashboard, skip the aggregation, give the AI agent direct access to the raw telemetry. Let the agent decide what to aggregate and how.

Try It Yourself

The project is open source. The investigation database from this post is available for download. Claude (or any MCP client) can connect to it and run an investigation:

git clone https://github.com/ingero-io/ingero.git
cd ingero && make build
./bin/ingero mcp --db investigations/pytorch-dataloader-starvation.db

Investigate with AI (recommended)

You can point any MCP-compatible AI client at the trace database and ask questions directly. No code required.

First, create the MCP config file at /tmp/ingero-mcp-dataloader.json:

{
  "mcpServers": {
    "ingero": {
      "command": "./bin/ingero",
      "args": ["mcp", "--db", "investigations/pytorch-dataloader-starvation.db"]
    }
  }
}

With Ollama (local, free):

# Install ollmcp (MCP client for Ollama)
pip install ollmcp

# Investigate with a local model (no data leaves your machine)
ollmcp -m qwen3.5:27b -j /tmp/ingero-mcp-dataloader.json

With Claude Code:

claude --mcp-config /tmp/ingero-mcp-dataloader.json

Then type /investigate and let the model explore. Follow up with questions like “what was the root cause?” or “which processes were competing for CPU time?”

Add it to a Claude Desktop config and ask: “What caused the GPU performance issues in this trace?”

The MCP server exposes 7 tools. Claude will figure out the rest.


Ingero is free & open source software licensed under Apache 2.0 (user-space) + GPL-2.0/BSD-3 (eBPF kernel-space). One binary, zero dependencies, <2% overhead. Give us a start at GitHub!

Related reading