Extract Schemas Guide

The schema you provide drives extraction quality. A well-defined schema with descriptions helps the AI understand exactly what to extract and where to find it.

Schema Format

Extract uses standard JSON Schema (type: object) format:

{
  "type": "object",
  "properties": {
    "invoiceNumber": {
      "type": "string",
      "description": "Invoice number, usually in the header area (e.g., INV-2026-001)"
    },
    "vendorName": {
      "type": "string",
      "description": "Name of the company that issued the invoice"
    },
    "totalAmount": {
      "type": "number",
      "description": "Total amount due including tax"
    },
    "lineItems": {
      "type": "array",
      "description": "Individual line items on the invoice"
    },
    "issueDate": {
      "type": "string",
      "description": "Date the invoice was issued, in YYYY-MM-DD format"
    }
  },
  "required": ["invoiceNumber", "totalAmount"]
}

Field Descriptions Matter

The description field is the single most impactful thing you can do to improve extraction quality. Descriptions tell the AI:

What the field represents
Where to find it in the document
What format to use

Without descriptions:

{ "amount": { "type": "number" } }

The AI might extract the wrong number — subtotal, tax, or line item amount.

With descriptions:

{
  "amount": {
    "type": "number",
    "description": "The grand total amount due, found at the bottom of the invoice. Includes tax and shipping."
  }
}

Supported Types

Type	Use For	Example Values
`string`	Text, IDs, names	`"INV-001"`, `"Acme Corp"`
`number`	Amounts, quantities	`1250.00`, `42`
`boolean`	Yes/no flags	`true`, `false`
`array`	Lists of items	`[{ "item": "Widget", "qty": 5 }]`
`object`	Nested structures	`{ "street": "123 Main", "city": "NYC" }`

Required Fields

Mark fields as required when they must be present. The AI prioritizes required fields and the confidence scoring flags missing required fields.

{
  "required": ["invoiceNumber", "totalAmount", "vendorName"]
}

Schema Discovery

If you don't know what fields your documents contain, use the discovery endpoint to let the AI analyze a sample and suggest a schema:

curl -X POST https://api.rynko.dev/api/extract/discover \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "[email protected]" \
  -F "instructions=This is a purchase order"

The result includes a discovered schema you can use as a starting point.

Schema in the Dashboard

In the Rynko dashboard, you can edit schemas using the visual schema editor:

Go to Extract > Extracts and select your extract config
Click Edit Schema in the Configuration tab
Use the visual editor to add/remove/edit fields
Or switch to the JSON tab for raw editing
Click Done, then Publish to apply changes

Tips for Better Schemas

Be specific in descriptions — "Invoice date in YYYY-MM-DD format" beats "date"
Use the right types — number for amounts, not string
Mark required fields — helps the AI prioritize
Start with discovery — let the AI suggest fields, then refine
Keep schemas focused — extract only what you need, not everything in the document
Add location hints — "Found in the header section" or "Located in the summary table"

Schema Format​

Field Descriptions Matter​

Supported Types​

Required Fields​

Schema Discovery​

Schema in the Dashboard​

Tips for Better Schemas​