Extracting structured data from emails has historically been a brittle, manual process. Developers spent weeks writing Regex to find an invoice total or a shipping date. Those parsers broke the second a user forwarded the email or switched clients.
In 2026, the combination of LLMs and robust webhook infrastructure has made this problem obsolete. By using the Vercel AI SDK with Ironpost webhooks, you can turn messy email threads into clean, typed JSON in a few lines of code.
The problem with email is that "format" is just a suggestion. A forwarded invoice from Gmail looks radically different than one from Outlook. Users inject typos, miss fields, and bury data in nested tables.
Traditional parsing fails because it expects predictability. LLMs are designed for the unpredictable. They read the email, ignore the junk, and identify the data structures.
To build a scalable parsing pipeline, you need Ironpost for ingestion and the Vercel AI SDK for comprehension.
The flow:
generateObject with a strict Zod schema.Install the dependencies:
npm install ai @ai-sdk/openai zod
Building the extraction service:
import { generateObject } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
// Define the Schema
const invoiceSchema = z.object({
vendor: z.string().describe("Company name"),
invoiceId: z.string().nullable(),
totalAmount: z.number().describe("Total cost including tax"),
currency: z.string().default("USD"),
dueDate: z.string().nullable().describe("ISO date string"),
lineItems: z.array(z.object({
description: z.string(),
price: z.number()
})).optional()
});
/**
* Processes an inbound email from Ironpost and extracts data.
*/
export async function extractInvoiceFromEmail(ironpostPayload: any) {
const emailText = ironpostPayload.text;
try {
const { object } = await generateObject({
model: openai('gpt-4o-mini'),
schema: invoiceSchema,
prompt: `
Analyze the inbound email and extract invoice details.
Rules:
- Return null if invoice ID is missing.
- Normalize dates to ISO-8601.
- Ignore previous history in the thread.
Email Content:
${emailText}
`,
});
return object;
} catch (error) {
console.error("Extraction failed:", error);
throw error;
}
}
Context vs Noise Ironpost strips HTML at the edge. This saves thousands of tokens. Sending raw HTML costs more and distracts the model with CSS and layout tags.
Hallucination Prevention
The describe() tags in Zod are crucial. They aren't just docs; the SDK passes these descriptions to the LLM. This gives the model explicit instructions, which reduces hallucinations.
Error Handling
Never rely 100% on extraction for high-stakes data. Save the extracted JSON with a pending_review flag if confidence is low or the amount exceeds a threshold.
The combination of Ironpost and the Vercel AI SDK turns email into a first-class data source. You don't have to build custom integrations for every vendor. You provide an email address, and the machines handle the data entry.
Use stateless webhooks and structured LLM outputs. It is robust, cheap, and flexible. Start building your data pipeline today with the Ironpost free tier.
Stop wrestling with legacy SMTP and stateful inboxes. Get your first programmatic identity and start building autonomous agents today.
Launch Your First Agent