Extract Schemas Guide
The schema you provide drives extraction quality. A well-defined schema with descriptions helps the AI understand exactly what to extract and where to find it.
Schema Format​
Extract uses standard JSON Schema (type: object) format:
{
"type": "object",
"properties": {
"invoiceNumber": {
"type": "string",
"description": "Invoice number, usually in the header area (e.g., INV-2026-001)"
},
"vendorName": {
"type": "string",
"description": "Name of the company that issued the invoice"
},
"totalAmount": {
"type": "number",
"description": "Total amount due including tax"
},
"lineItems": {
"type": "array",
"description": "Individual line items on the invoice"
},
"issueDate": {
"type": "string",
"description": "Date the invoice was issued, in YYYY-MM-DD format"
}
},
"required": ["invoiceNumber", "totalAmount"]
}
Field Descriptions Matter​
The description field is the single most impactful thing you can do to improve extraction quality. Descriptions tell the AI:
- What the field represents
- Where to find it in the document
- What format to use
Without descriptions:
{ "amount": { "type": "number" } }
The AI might extract the wrong number — subtotal, tax, or line item amount.
With descriptions:
{
"amount": {
"type": "number",
"description": "The grand total amount due, found at the bottom of the invoice. Includes tax and shipping."
}
}
Supported Types​
| Type | Use For | Example Values |
|---|---|---|
string | Text, IDs, names | "INV-001", "Acme Corp" |
number | Amounts, quantities | 1250.00, 42 |
boolean | Yes/no flags | true, false |
array | Lists of items | [{ "item": "Widget", "qty": 5 }] |
object | Nested structures | { "street": "123 Main", "city": "NYC" } |
Required Fields​
Mark fields as required when they must be present. The AI prioritizes required fields and the confidence scoring flags missing required fields.
{
"required": ["invoiceNumber", "totalAmount", "vendorName"]
}
Schema Discovery​
If you don't know what fields your documents contain, use the discovery endpoint to let the AI analyze a sample and suggest a schema:
curl -X POST https://api.rynko.dev/api/extract/discover \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "files=@sample-document.pdf" \
-F "instructions=This is a purchase order"
The result includes a discovered schema you can use as a starting point.
Schema in the Dashboard​
In the Rynko dashboard, you can edit schemas using the visual schema editor:
- Go to Extract > Extracts and select your extract config
- Click Edit Schema in the Configuration tab
- Use the visual editor to add/remove/edit fields
- Or switch to the JSON tab for raw editing
- Click Done, then Publish to apply changes
Tips for Better Schemas​
- Be specific in descriptions — "Invoice date in YYYY-MM-DD format" beats "date"
- Use the right types —
numberfor amounts, notstring - Mark required fields — helps the AI prioritize
- Start with discovery — let the AI suggest fields, then refine
- Keep schemas focused — extract only what you need, not everything in the document
- Add location hints — "Found in the header section" or "Located in the summary table"