Debugging a Failing Stripe Webhook with Claude and Rockxy
The setup
I was wiring a Stripe checkout flow into a side project — nothing fancy, test mode only, a local Node server calling /v1/charges with a card token from the client. Every charge came back as HTTP 402 with code: "card_declined". The card numbers looked fine at a glance. The Stripe Dashboard showed the attempts. The webhook handler fired. Everything was wired up. Nothing worked.
The classic loop started. Copy the curl output. Paste into a terminal with --verbose. Squint at headers. Re-read the Stripe docs page on test cards I had already read three times. Copy the request body into a notes file so I could compare it to the docs. Ten minutes in, I still did not know which field was wrong.
The frustrating part about this kind of bug is that all the evidence is right in front of you. The request headers, the response body, the exact error code — nothing is hidden. The bug is obvious. You just can't see it yet, because "obvious" takes pattern-matching against a thousand other Stripe integrations you've looked at, and at 2 AM you don't have that pattern-match loaded.
The traditional debugging approach
Here is the usual workflow for this kind of bug.
- Run the failing request again with
curl -vto see raw headers. tail -fthe local server logs while triggering the request.- Open the Stripe Dashboard in a browser to read the decline reason.
- Open the Stripe docs in another tab to cross-reference fields.
- Copy the request body into a scratch buffer. Edit one field. Paste it back. Send again.
- Repeat five times until something succeeds.
The problem is not that any single step is hard. The problem is the sum. You are the human integration layer between five tabs, and you are slow. Thirty minutes later, you have found the bug, forgotten what you changed, and introduced two new ones.
The MCP-native approach
I had Rockxy capturing every request from the Node server. I had Claude Desktop already connected to Rockxy's MCP server (see the 3-minute setup guide). So I typed one question into Claude:
Me: Why did my last request to api.stripe.com/v1/charges fail?
And then I watched Claude work.
The reasoning trace
Claude's first move was a list_flows call, filtered by host:
list_flows({
host: "api.stripe.com",
limit: 5
})
The response came back as an array of flow summaries — five recent POSTs to /v1/charges, all status 402, all around 1.2 KB of response body. Claude picked the most recent and called get_flow_detail:
get_flow_detail({
flow_id: "flw_01HXY7MZ"
})
The response included the full request (method, path, headers, form-encoded body) and full response (status, headers, JSON body). The interesting part was in the response body:
{
"error": {
"type": "card_error",
"code": "card_declined",
"decline_code": "generic_decline",
"message": "Your card was declined.",
"param": "number",
"charge": "ch_3Oa1XY..."
}
}
And the request body, URL-decoded for readability:
amount=2000
currency=usd
source=4000000000000002
description=Test charge
Claude put it together in one paragraph:
Claude: Your request is using the Stripe test card number 4000 0000 0000 0002, which is the documented generic-decline test card. Stripe is doing exactly what it says it will — returning a card_declined error with decline_code: generic_decline. If you want a test card that succeeds, use 4242 4242 4242 4242. If you want to test a different decline path, Stripe publishes specific numbers for insufficient_funds, lost_card, and so on.
That was the bug. I had copied the first test card I saw on the Stripe docs page — which happens to be the example of a card that is designed to fail — and not read the caption next to it. The reason the error surfaced as code: "card_declined" and not something more specific is that I was passing a raw card number, not a tok_chargeDeclined-style test token.
The fix and the replay
I could have stopped here, edited my server code to use 4242 4242 4242 4242, and re-run. But Claude had a faster path: replay the exact same request with just the card number changed.
Me: Replay that request with source swapped to 4242424242424242.
Claude's tool call:
replay_request({
flow_id: "flw_01HXY7MZ",
overrides: {
body: {
source: "4242424242424242"
}
}
})
Rockxy executed the replay against the real Stripe test endpoint. The response came back on a new flow:
HTTP/2 200
content-type: application/json
{
"id": "ch_3Oa1YZAb2cDe...",
"object": "charge",
"amount": 2000,
"currency": "usd",
"paid": true,
"status": "succeeded",
"source": {
"id": "card_1Oa1YZ...",
"last4": "4242",
"brand": "Visa"
}
}
Claude confirmed it:
Claude: Replay returned HTTP 200 with a successful charge object. The only change was the card number. Your server code is correct — the test data was wrong.
The diff
For the sake of being thorough, I asked Claude to diff the two flows — the original 402 and the replayed 200:
Me: Diff those two flows.
diff_flows({
flow_id_a: "flw_01HXY7MZ",
flow_id_b: "flw_01HXY8NQ"
})
The diff output was compact and structural — one changed field in the request, a flipped status code, a completely different response shape:
request.body.source:
- "4000000000000002"
+ "4242424242424242"
response.status:
- 402
+ 200
response.body:
- { error: { code: "card_declined", decline_code: "generic_decline", ... } }
+ { id: "ch_...", status: "succeeded", paid: true, ... }
Side by side, the change I made was a single sixteen-digit string. Every other byte of the request was identical. That is the kind of certainty that is hard to get from tailing logs.
The broader lesson
What would have been a thirty-minute tab-juggling session turned into a ninety-second conversation. But the interesting part is not the speed — it is the shift in workflow.
Traditional API debugging is a pull model. You pull data out of the system (logs, curl output, dashboard screenshots) into a format you can read. Every pull is manual, every format conversion is manual, and the assistant you ask for help only sees the fraction you pulled.
MCP-native debugging is a push model from the tooling side. Rockxy already has the bytes. Claude already has the reasoning. MCP is the thin layer that lets one feed the other without you acting as the glue. You describe what you want to know. The assistant pulls exactly the right detail. If it is wrong, you say so, and it pulls something else.
This matters most on the kind of bug that is obvious in hindsight. A wrong test card. A missing header. A stale auth token. These bugs are embarrassing to spend an hour on. They are the ones that make you feel stupid. An assistant with direct access to the flows catches them in one tool call, because they are trivial to spot when you have the raw bytes and a pattern-matcher with a broad prior.
There's a second thing worth calling out. Traditional debugging creates a paper trail that only lives in your head. You remember what you tried, what failed, what you changed — until the next interruption flushes the context. MCP debugging leaves a trail in the chat. Every tool call, every input, every output. When you come back to the bug after lunch, the conversation is there. When you write a post-mortem, you copy the chat. When a teammate hits the same issue, you can share the transcript. The workflow is self-documenting, not as a feature but as a side effect of how the tools communicate.
Where this breaks down
For the sake of honesty: this workflow is not magic.
If your proxy never captured the failing request in the first place — because your app is bypassing the system proxy, because cert pinning is blocking TLS interception, because the request is over QUIC — Rockxy has nothing to hand Claude. The bug is invisible to both of you. This is the same problem it has always been; MCP does not change it.
If the failing behavior is not visible at the HTTP layer — say, a server that correctly 200s but returns the wrong data because of a bug three services deep — the assistant can only tell you what the response looks like. The diagnosis requires code context, not just wire context, and code context is a different MCP server (a filesystem or repo server) with its own tool surface.
And if the model is wrong about what it sees, you need to catch it. A model that confidently misreads a decline code and suggests the wrong fix will waste your time just like a rushed colleague would. The tool calls are visible, the outputs are visible, but the reasoning between them is still produced by a probabilistic system. Treat it like a pair programmer, not an oracle.
That said — on the class of bug where the wire bytes tell the whole story, which is most of API debugging, the speedup is real.
What to try
If you already use Stripe, Shopify, Auth0, Supabase, or any webhook-heavy API, this workflow will pay for itself the first time you use it. The setup takes three minutes:
- Install Rockxy and enable its MCP server in Settings.
- Add the Rockxy entry to
claude_desktop_config.json. - Restart Claude Desktop.
- Generate some traffic.
- Ask a question.
Rockxy is free and open source under AGPL-3.0. The MCP server runs locally over stdio, so your captured flows never leave your machine — Claude only sees what it asks for, you see every tool call in the chat, and there is no cloud layer between your traffic and your debugging.
For the step-by-step setup, see How to Connect Claude Desktop to Rockxy's MCP Server.