Skip to main content

Extract Overview

Rynko Extract uses AI to pull structured data from unstructured documents — PDFs, images, spreadsheets, and more — and return it as clean JSON that matches your schema.

How It Works

  1. Define a schema — Describe the fields you want to extract (e.g., invoice number, line items, total amount)
  2. Upload documents — Send PDFs, images, Excel files, or other supported formats
  3. Get structured data — Receive JSON output matching your schema, with confidence scores per field
┌─────────────┐    ┌──────────────┐    ┌──────────────┐
│ Documents │ → │ Rynko │ → │ Structured │
│ PDF, Excel │ │ Extract │ │ JSON Data │
│ Images │ │ (AI) │ │ │
└─────────────┘ └──────────────┘ └──────────────┘

Key Features

Schema-Driven Extraction

Define exactly what fields you need. The AI uses your schema as a guide, ensuring output always matches your expected structure.

Multi-File Support

Upload multiple documents in a single job. Extract merges data across files with configurable conflict resolution:

  • Flag conflicts — Mark fields with conflicting values
  • Prefer first file — Use the first file's value
  • Prefer highest confidence — Use the most confident extraction

Confidence Scoring

Every extracted field includes a confidence score (HIGH / MEDIUM / LOW), helping you decide whether to trust the result or flag it for human review.

Provider-Agnostic

Extract supports multiple AI providers. The default provider is optimized for speed and cost, but you can choose:

  • Google Gemini (default) — Fast, cost-effective
  • Anthropic Claude — Best for complex reasoning
  • OpenAI GPT-4o — Strong general performance
  • OpenRouter — Access to multiple models

Standalone vs Gate-Linked

Standalone Extract

Create an Extract config, upload files, get structured data. Use this when you just need to extract data without validation.

# Create a job with a custom schema
curl -X POST https://api.rynko.dev/api/extract/jobs \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "files=@invoice.pdf" \
-F 'schema={"type":"object","properties":{"invoiceNumber":{"type":"string"},"total":{"type":"number"}}}'

Gate-Linked Extract (Stage 0)

Enable Extract on a Flow gate to accept file uploads. The gate's schema becomes the extraction schema — no duplication needed.

File Upload → Stage 0 (Extract) → Stage 1 (Validate) → Stage 2+ (Render, Approve, Deliver)

Structured inputs (JSON/YAML/XML) skip Stage 0 automatically, costing 0 extract credits.

Supported File Types

FormatExtensionsNotes
PDF.pdfText and scanned documents
Images.png, .jpg, .jpeg, .webpOCR-capable
Excel.xlsxMulti-sheet support
CSV.csvTabular data
JSON.jsonStructured input (skip extraction)
XML.xmlStructured input (skip extraction)
Text.txtPlain text

Pricing

During the founders preview, every team gets 100 free extraction credits. Each job consumes 1 credit regardless of file count.

After the beta period, extract credits are available as monthly packs or one-time purchases.

Next Steps