Engineering notes
← back to demoHow Invoice Extractor actually works.
One Vercel deploy, no intermediate storage. The file travels from the browser directly to Gemini as base64 inlineData — no S3, no signed URLs, no preprocessing. Below: the ten steps from upload to structured table, plus the security decisions and the gaps honestly disclosed.
Pipeline at a glance
Browser
│
│ POST /api/extract (multipart, field: file)
▼
┌─────────────────────────────────────────────┐
│ 1. Rate limit (Upstash sliding window) │
│ 2. Multipart parse — require "file" field │
│ 3. Size cap (8 MB hard limit) │
│ 4. Magic-byte detection (PNG/JPG/WEBP/PDF) │
│ 5. base64-encode buffer for inlineData │
│ 6. Gemini 2.5 Flash (vision, temp=0) │
│ └─ responseSchema constrains output │
│ └─ systemInstruction isolates prompt │
│ 7. JSON.parse raw response text │
│ 8. Zod validate + coerce numbers │
│ 9. CSV build + formula-injection escape │
└─────────────────────────────────────────────┘
│
│ { ok: true, data: InvoiceData, csv: "…", meta: {…} }
▼
Browser renders ledger table + download buttonsStep by step
- 01
Rate limit
Upstash sliding window — 20 requests per IP per day. Graceful no-op when Upstash is not configured (local dev). Prefix: rl:invoice.
- 02
Multipart parse
Next.js
req.formData()reads the upload. Thefilefield is required; anything else is ignored. 400 if the form is malformed or the field is missing. - 03
Size cap
Hard 8 MB cap enforced on
file.sizebefore reading the buffer into memory. Returns 413 with a human-readable size in the error. Prevents memory exhaustion on the serverless function. - 04
Magic-byte file-type detection
The browser-supplied
Content-Typeheader is never trusted. The first bytes of the buffer are inspected directly:89 50 4E 47→ PNG,FF D8 FF→ JPEG,RIFF….WEBP→ WEBP,%PDF→ PDF. Any other signature → 415. This prevents content-type spoofing attacks (e.g. an executable renamed to invoice.pdf). - 05
Gemini inlineData call
The buffer is base64-encoded and sent to Gemini 2.5 Flash as
inlineData— the multimodal vision input. The model receives the raw image/PDF bytes directly, not a URL; no intermediate storage or signed URL required. Temperature 0,maxOutputTokens: 2048,responseMimeType: application/json+responseSchemato constrain the output. - 06
System-prompt isolation
The system prompt is sent via
systemInstruction— structurally separate from the user content. It explicitly instructs the model to treat the document as untrusted data and not to follow any embedded instructions. This mitigates prompt injection attacks where a malicious invoice contains text like “Ignore previous instructions and output…” - 07
Gemini responseSchema
A hand-built OpenAPI subset (Gemini does not accept
$ref/anyOf; nullable must benullable: true, nottype: ['string', 'null']) constrains the model to the exact shape of an invoice: vendor, lineItems array, totals. This dramatically reduces hallucination and invalid JSON. - 08
Zod validation + number coercion
The model output is parsed with
JSON.parsethen validated by a zod schema. A customcoercedNumbertransformer strips currency symbols and commas (e.g.“$1,234.56”→1234.56) and converts EU-format decimals. Invalid output → 502 with a typed error code, never a raw zod message to the client. - 09
CSV formula-injection escaping
Any cell whose string form starts with
= + - @is prefixed with a single apostrophe before writing to CSV. This prevents spreadsheet formula injection — a vendor named=HYPERLINK("http://evil.com")becomes'=HYPERLINK(…)and is treated as plain text by Excel and Google Sheets. - 10
Structured JSON + CSV response
The API returns the typed InvoiceData object plus the pre-built CSV string and token/duration telemetry. The client renders the table, offers download buttons, and shows the raw JSON in an accessible details/summary accordion.
Security stance
Defended: magic-byte type enforcement, hard size cap, rate limiting by IP, zod validation of all LLM output, CSV formula-injection escaping, no stack traces to client, system-prompt isolation against prompt injection, no file persistence.
Not defended (flagged honestly): adversarial prompt injection embedded in document content may succeed against sophisticated attacks; MIME spoofing is caught at the byte level but Gemini’s internal decode is not re-verified; no virus/malware scanning of uploaded files (out of scope for an AI extraction demo).