Flow MCP — AI Agent Integration Test Report

Date: March 7, 2026 Endpoint: https://api.rynko.dev/api/flow/mcp (Streamable HTTP) LLM: GLM-4.5 Air (via OpenRouter) Protocol: MCP (Model Context Protocol) over JSON-RPC 2.0 Result: 4/4 scenarios passed

What Was Tested

A real LLM agent was connected to Rynko Flow's MCP server and asked to complete four increasingly complex workflows — entirely autonomously. The agent discovered available tools at runtime, decided which to call and in what order, and interpreted Flow's responses to drive its next action.

No tool calls were hardcoded. The LLM made every decision.

Test Setup

A Flow Gate was created with:

Schema: vendor (string, 1-255 chars), amount (number, >= 0), currency (enum: USD/EUR/GBP/INR), po_number (optional string)
Business Rule: "Amount must be positive" — expression: amount > 0
Approval Mode: Auto (no human review needed)

The LLM was given 9 MCP tools (discovered dynamically) and a simple system prompt. Everything else was up to the model.

Scenario 1: Discover Gates and Submit Valid Invoice

Task: Find the test gate, confirm it exists, then submit a valid invoice.

What the LLM did:

Step	Tool Called	Purpose
1	`list_flow_gates`	Discovered all active gates, found the test gate
2	`validate_agent_test_invoice_gate`	Submitted: `vendor="Globex Corp"`, `amount=12500`, `currency="USD"`, `po_number="PO-AGENT-001"`
3	`get_flow_run_status`	Checked the run status to confirm completion

Flow's response: Run created (frun_yysz2ctz), status: completed — schema validation passed, business rule passed, auto-approved.

LLM's final answer:

Gate confirmed: 'agent-test-invoice-gate' exists and is active. Invoice submitted successfully! Run ID: 00a9dab6-d004-46a2-aa27-affcc8d70a18 Status: completed

Verdict: The agent autonomously chose a 3-step workflow (discover, submit, verify) without being told the exact sequence.

Scenario 2: Submit Invalid Data and Explain Errors

Task: Submit intentionally bad data (vendor="", amount=-100, currency="JPY") and explain what went wrong.

What the LLM did:

Step	Tool Called	Payload
1	`validate_agent_test_invoice_gate`	`vendor=""`, `amount=-100`, `currency="JPY"`

Flow's response: validation_failed with 3 errors:

vendor must be at least 1 character
currency must be one of: USD, EUR, GBP, INR
Business rule amount > 0 failed: "Invoice amount must be greater than zero"

LLM's final answer:

The invoice submission failed validation. Here are the errors:

vendor — must be at least 1 character (was empty)

currency — "JPY" is not valid; must be USD, EUR, GBP, or INR

amount — must be greater than zero (was -100)

Verdict: The agent correctly submitted the exact bad values, received Flow's structured error response, and translated it into a clear human-readable explanation.

Scenario 3: Multi-Step Reasoning — Inspect Gate and Analyze Runs

Task: Describe a gate's schema, business rules, and run history.

What the LLM did:

Step	Tool Called	Purpose
1	`get_flow_gate`	Retrieved full gate configuration
2	`list_flow_runs`	Fetched run history for the gate

LLM's final answer:

Schema Fields Required

vendor (string, 1-255 characters) — required

amount (number, minimum 0) — required

currency (string, one of: USD/EUR/GBP/INR) — required

po_number (string) — optional

Business Rules

"Amount must be positive": expression amount > 0

Runs Processed 2 runs — 1 completed, 1 validation_failed

Verdict: The agent synthesized information from two different tools into a structured summary, demonstrating multi-source reasoning.

Scenario 4: Self-Correction Loop

Task: Submit bad data, read Flow's error response, fix the problems, and resubmit until it passes.

This is the key scenario — it demonstrates the closed-loop agent workflow that Rynko Flow is designed for.

What the LLM did:

Step	Tool Called	Payload	Result
1	`validate_agent_test_invoice_gate`	`vendor=""`, `amount=-500`, `currency="JPY"`	`validation_failed` — 3 errors
2	`validate_agent_test_invoice_gate`	`vendor="Acme Inc"`, `amount=500`, `currency="USD"`	`completed`

Flow's error response on first attempt:

vendor must be at least 1 character
currency must be one of: USD, EUR, GBP, INR
Business rule failed: "Invoice amount must be greater than zero"

LLM's corrections:

Field	Before (invalid)	After (corrected)	Error that guided the fix
`vendor`	`""` (empty)	`"Acme Inc"`	"must be at least 1 character"
`amount`	`-500`	`500`	"must be greater than zero"
`currency`	`"JPY"`	`"USD"`	"must be one of: USD, EUR, GBP, INR"

LLM's final answer:

The invoice validation has succeeded! Here's what happened:

Initial submission failed:

vendor="" — "must be at least 1 characters"

amount=-500 — "must be greater than zero"

currency="JPY" — "not in allowed values"

Corrected submission passed:

vendor="Acme Inc", amount=500, currency="USD"

Run ID: completed successfully

Verdict: The agent read Flow's structured validation errors, reasoned about what each one meant, applied the correct fix for each field, and resubmitted — all without human intervention. This is the core value proposition: Flow gates act as guardrails that LLM agents can understand and respond to programmatically.

What This Demonstrates

MCP tool discovery works end-to-end. The LLM received 9 tools at runtime (including dynamically-generated validate_* tools per gate) and correctly chose which ones to use for each task.
Flow's validation errors are LLM-readable. Structured error responses with field names, constraint descriptions, and business rule messages gave the agent enough context to self-correct without any additional prompting.
The self-correction loop is real. An agent submitting invalid data to a Flow gate can read the errors, fix its payload, and retry — creating a closed feedback loop between the LLM and the validation pipeline.
No special integration code needed. The LLM connected via standard MCP protocol, discovered tools dynamically, and operated autonomously. Any MCP-compatible agent (Claude, Cursor, Windsurf, custom) gets the same experience.

Test Infrastructure

Transport: MCP Streamable HTTP with JSON-RPC batching
Session management: Fresh session per call with retry (up to 6 attempts) for multi-instance load balancing
LLM provider: OpenRouter free tier (OpenAI-compatible API)

Try It Yourself

Sign up free — 500 Flow runs/month included
Create a gate with a schema and business rules
Connect your AI agent via MCP
Submit a bad payload and watch the agent self-correct

What Was Tested​

Test Setup​

Scenario 1: Discover Gates and Submit Valid Invoice​

Scenario 2: Submit Invalid Data and Explain Errors​

Scenario 3: Multi-Step Reasoning — Inspect Gate and Analyze Runs​

Scenario 4: Self-Correction Loop​

What This Demonstrates​

Test Infrastructure​

Try It Yourself​

What Was Tested

Test Setup

Scenario 1: Discover Gates and Submit Valid Invoice

Scenario 2: Submit Invalid Data and Explain Errors

Scenario 3: Multi-Step Reasoning — Inspect Gate and Analyze Runs

Scenario 4: Self-Correction Loop

What This Demonstrates

Test Infrastructure

Try It Yourself