Extract Quickstart
Extract structured data from a PDF in under 5 minutes.
Prerequisites
- A Rynko account with an API key (create one here)
- A document to extract from (PDF, image, or spreadsheet)
Step 1: Create an Extraction Job
Upload a file with a schema describing what to extract:
- cURL
- Node.js
- Python
curl -X POST https://api.rynko.dev/api/extract/jobs \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "files=@invoice.pdf" \
-F 'schema={"type":"object","properties":{"invoiceNumber":{"type":"string","description":"The invoice number"},"vendorName":{"type":"string","description":"Name of the vendor"},"totalAmount":{"type":"number","description":"Total amount due"},"lineItems":{"type":"array","description":"List of line items"}},"required":["invoiceNumber","totalAmount"]}'
import Rynko from '@rynko/sdk';
const rynko = new Rynko({ apiKey: 'YOUR_API_KEY' });
const job = await rynko.extract.create({
files: ['./invoice.pdf'],
schema: {
type: 'object',
properties: {
invoiceNumber: { type: 'string', description: 'The invoice number' },
vendorName: { type: 'string', description: 'Name of the vendor' },
totalAmount: { type: 'number', description: 'Total amount due' },
lineItems: { type: 'array', description: 'List of line items' },
},
required: ['invoiceNumber', 'totalAmount'],
},
});
console.log('Job created:', job.id);
from rynko import RynkoClient
client = RynkoClient(api_key="YOUR_API_KEY")
job = client.extract.create(
files=["./invoice.pdf"],
schema={
"type": "object",
"properties": {
"invoiceNumber": {"type": "string", "description": "The invoice number"},
"vendorName": {"type": "string", "description": "Name of the vendor"},
"totalAmount": {"type": "number", "description": "Total amount due"},
"lineItems": {"type": "array", "description": "List of line items"},
},
"required": ["invoiceNumber", "totalAmount"],
},
)
print(f"Job created: {job.id}")
Step 2: Poll for Results
Extraction runs asynchronously. Poll the job status until it completes:
- cURL
- Node.js
- Python
curl https://api.rynko.dev/api/extract/jobs/JOB_ID \
-H "Authorization: Bearer YOUR_API_KEY"
// Poll until complete
let result = await rynko.extract.get(job.id);
while (result.status === 'QUEUED' || result.status === 'PROCESSING') {
await new Promise((r) => setTimeout(r, 2000));
result = await rynko.extract.get(job.id);
}
console.log('Result:', JSON.stringify(result.result, null, 2));
import time
result = client.extract.get(job.id)
while result.status in ("QUEUED", "PROCESSING"):
time.sleep(2)
result = client.extract.get(job.id)
print(f"Result: {result.result}")
Step 3: Use the Extracted Data
The response includes the extracted data matching your schema:
{
"id": "abc123",
"status": "COMPLETED",
"result": {
"data": {
"invoiceNumber": "INV-2026-001",
"vendorName": "Acme Corp",
"totalAmount": 1250.00,
"lineItems": [
{ "description": "Widget A", "quantity": 10, "price": 50.00 },
{ "description": "Widget B", "quantity": 5, "price": 150.00 }
]
},
"fields": [
{ "field": "invoiceNumber", "confidence": "HIGH", "score": 0.98 },
{ "field": "vendorName", "confidence": "HIGH", "score": 0.95 },
{ "field": "totalAmount", "confidence": "HIGH", "score": 0.99 },
{ "field": "lineItems", "confidence": "MEDIUM", "score": 0.82 }
]
}
}
Tips for Better Extraction
- Add descriptions to schema fields — helps the AI understand what to look for
- Use specific types —
numberfor amounts,datefor dates,arrayfor lists - Mark required fields — ensures the AI prioritizes these fields
- Use the discovery endpoint first if you're unsure what fields exist in your documents
Next Steps
- Extract Overview — Understand the full feature set
- Extract API Reference — Complete endpoint documentation
- Extract Schemas Guide — Schema best practices