Skip to main content

Extract Schemas Guide

The schema you provide drives extraction quality. A well-defined schema with descriptions helps the AI understand exactly what to extract and where to find it.

Schema Format​

Extract uses standard JSON Schema (type: object) format:

{
"type": "object",
"properties": {
"invoiceNumber": {
"type": "string",
"description": "Invoice number, usually in the header area (e.g., INV-2026-001)"
},
"vendorName": {
"type": "string",
"description": "Name of the company that issued the invoice"
},
"totalAmount": {
"type": "number",
"description": "Total amount due including tax"
},
"lineItems": {
"type": "array",
"description": "Individual line items on the invoice"
},
"issueDate": {
"type": "string",
"description": "Date the invoice was issued, in YYYY-MM-DD format"
}
},
"required": ["invoiceNumber", "totalAmount"]
}

Field Descriptions Matter​

The description field is the single most impactful thing you can do to improve extraction quality. Descriptions tell the AI:

  • What the field represents
  • Where to find it in the document
  • What format to use

Without descriptions:

{ "amount": { "type": "number" } }

The AI might extract the wrong number — subtotal, tax, or line item amount.

With descriptions:

{
"amount": {
"type": "number",
"description": "The grand total amount due, found at the bottom of the invoice. Includes tax and shipping."
}
}

Supported Types​

TypeUse ForExample Values
stringText, IDs, names"INV-001", "Acme Corp"
numberAmounts, quantities1250.00, 42
booleanYes/no flagstrue, false
arrayLists of items[{ "item": "Widget", "qty": 5 }]
objectNested structures{ "street": "123 Main", "city": "NYC" }

Required Fields​

Mark fields as required when they must be present. The AI prioritizes required fields and the confidence scoring flags missing required fields.

{
"required": ["invoiceNumber", "totalAmount", "vendorName"]
}

Schema Discovery​

If you don't know what fields your documents contain, use the discovery endpoint to let the AI analyze a sample and suggest a schema:

curl -X POST https://api.rynko.dev/api/extract/discover \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "files=@sample-document.pdf" \
-F "instructions=This is a purchase order"

The result includes a discovered schema you can use as a starting point.

Schema in the Dashboard​

In the Rynko dashboard, you can edit schemas using the visual schema editor:

  1. Go to Extract > Extracts and select your extract config
  2. Click Edit Schema in the Configuration tab
  3. Use the visual editor to add/remove/edit fields
  4. Or switch to the JSON tab for raw editing
  5. Click Done, then Publish to apply changes

Tips for Better Schemas​

  1. Be specific in descriptions — "Invoice date in YYYY-MM-DD format" beats "date"
  2. Use the right types — number for amounts, not string
  3. Mark required fields — helps the AI prioritize
  4. Start with discovery — let the AI suggest fields, then refine
  5. Keep schemas focused — extract only what you need, not everything in the document
  6. Add location hints — "Found in the header section" or "Located in the summary table"