Skip to main content

Extract Quickstart

Extract structured data from a PDF in under 5 minutes.

Prerequisites

  • A Rynko account with an API key (create one here)
  • A document to extract from (PDF, image, or spreadsheet)

Step 1: Create an Extraction Job

Upload a file with a schema describing what to extract:

curl -X POST https://api.rynko.dev/api/extract/jobs \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "files=@invoice.pdf" \
-F 'schema={"type":"object","properties":{"invoiceNumber":{"type":"string","description":"The invoice number"},"vendorName":{"type":"string","description":"Name of the vendor"},"totalAmount":{"type":"number","description":"Total amount due"},"lineItems":{"type":"array","description":"List of line items"}},"required":["invoiceNumber","totalAmount"]}'

Step 2: Poll for Results

Extraction runs asynchronously. Poll the job status until it completes:

curl https://api.rynko.dev/api/extract/jobs/JOB_ID \
-H "Authorization: Bearer YOUR_API_KEY"

Step 3: Use the Extracted Data

The response includes the extracted data matching your schema:

{
"id": "abc123",
"status": "COMPLETED",
"result": {
"data": {
"invoiceNumber": "INV-2026-001",
"vendorName": "Acme Corp",
"totalAmount": 1250.00,
"lineItems": [
{ "description": "Widget A", "quantity": 10, "price": 50.00 },
{ "description": "Widget B", "quantity": 5, "price": 150.00 }
]
},
"fields": [
{ "field": "invoiceNumber", "confidence": "HIGH", "score": 0.98 },
{ "field": "vendorName", "confidence": "HIGH", "score": 0.95 },
{ "field": "totalAmount", "confidence": "HIGH", "score": 0.99 },
{ "field": "lineItems", "confidence": "MEDIUM", "score": 0.82 }
]
}
}

Tips for Better Extraction

  • Add descriptions to schema fields — helps the AI understand what to look for
  • Use specific typesnumber for amounts, date for dates, array for lists
  • Mark required fields — ensures the AI prioritizes these fields
  • Use the discovery endpoint first if you're unsure what fields exist in your documents

Next Steps