Extract Overview
Rynko Extract uses AI to pull structured data from unstructured documents — PDFs, images, spreadsheets, and more — and return it as clean JSON that matches your schema.
How It Works
- Define a schema — Describe the fields you want to extract (e.g., invoice number, line items, total amount)
- Upload documents — Send PDFs, images, Excel files, or other supported formats
- Get structured data — Receive JSON output matching your schema, with confidence scores per field
┌─────────────┐ ┌───────────── ─┐ ┌──────────────┐
│ Documents │ → │ Rynko │ → │ Structured │
│ PDF, Excel │ │ Extract │ │ JSON Data │
│ Images │ │ (AI) │ │ │
└─────────────┘ └──────────────┘ └──────────────┘
Key Features
Schema-Driven Extraction
Define exactly what fields you need. The AI uses your schema as a guide, ensuring output always matches your expected structure.
Multi-File Support
Upload multiple documents in a single job. Extract merges data across files with configurable conflict resolution:
- Flag conflicts — Mark fields with conflicting values
- Prefer first file — Use the first file's value
- Prefer highest confidence — Use the most confident extraction
Confidence Scoring
Every extracted field includes a confidence score (HIGH / MEDIUM / LOW), helping you decide whether to trust the result or flag it for human review.
Provider-Agnostic
Extract supports multiple AI providers. The default provider is optimized for speed and cost, but you can choose:
- Google Gemini (default) — Fast, cost-effective
- Anthropic Claude — Best for complex reasoning
- OpenAI GPT-4o — Strong general performance
- OpenRouter — Access to multiple models
Standalone vs Gate-Linked
Standalone Extract
Create an Extract config, upload files, get structured data. Use this when you just need to extract data without validation.
# Create a job with a custom schema
curl -X POST https://api.rynko.dev/api/extract/jobs \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "files=@invoice.pdf" \
-F 'schema={"type":"object","properties":{"invoiceNumber":{"type":"string"},"total":{"type":"number"}}}'
Gate-Linked Extract (Stage 0)
Enable Extract on a Flow gate to accept file uploads. The gate's schema becomes the extraction schema — no duplication needed.
File Upload → Stage 0 (Extract) → Stage 1 (Validate) → Stage 2+ (Render, Approve, Deliver)
Structured inputs (JSON/YAML/XML) skip Stage 0 automatically, costing 0 extract credits.
Supported File Types
| Format | Extensions | Notes |
|---|---|---|
.pdf | Text and scanned documents | |
| Images | .png, .jpg, .jpeg, .webp | OCR-capable |
| Excel | .xlsx | Multi-sheet support |
| CSV | .csv | Tabular data |
| JSON | .json | Structured input (skip extraction) |
| XML | .xml | Structured input (skip extraction) |
| Text | .txt | Plain text |
Pricing
During the founders preview, every team gets 100 free extraction credits. Each job consumes 1 credit regardless of file count.
After the beta period, extract credits are available as monthly packs or one-time purchases.
Next Steps
- Extract Quickstart — Get started in 5 minutes
- Extract API Reference — Full endpoint documentation
- Gate Stage 0 Guide — Use Extract with Flow gates