← Back to Index
Published on March 22, 2026

Parse Incoming Emails with the Vercel AI SDK and Node.js

Extracting structured data from emails has historically been a brittle, manual process. Developers spent weeks writing Regex to find an invoice total or a shipping date. Those parsers broke the second a user forwarded the email or switched clients.

In 2026, the combination of LLMs and robust webhook infrastructure has made this problem obsolete. By using the Vercel AI SDK with Ironpost webhooks, you can turn messy email threads into clean, typed JSON in a few lines of code.

The End of Regex

The problem with email is that "format" is just a suggestion. A forwarded invoice from Gmail looks radically different than one from Outlook. Users inject typos, miss fields, and bury data in nested tables.

Traditional parsing fails because it expects predictability. LLMs are designed for the unpredictable. They read the email, ignore the junk, and identify the data structures.

Modern Architecture

To build a scalable parsing pipeline, you need Ironpost for ingestion and the Vercel AI SDK for comprehension.

The flow:

  1. The Event: An email hits an Ironpost address.
  2. The Transform: Ironpost's edge worker intercepts, strips HTML slop, and converts it to a flat JSON object.
  3. The Webhook: The clean payload is pushed to your server.
  4. The Extraction: Your server passes the text into generateObject with a strict Zod schema.

Code Implementation

Install the dependencies:

npm install ai @ai-sdk/openai zod

Building the extraction service:

import { generateObject } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

// Define the Schema
const invoiceSchema = z.object({
  vendor: z.string().describe("Company name"),
  invoiceId: z.string().nullable(),
  totalAmount: z.number().describe("Total cost including tax"),
  currency: z.string().default("USD"),
  dueDate: z.string().nullable().describe("ISO date string"),
  lineItems: z.array(z.object({
    description: z.string(),
    price: z.number()
  })).optional()
});

/**
 * Processes an inbound email from Ironpost and extracts data.
 */
export async function extractInvoiceFromEmail(ironpostPayload: any) {
  const emailText = ironpostPayload.text;

  try {
    const { object } = await generateObject({
      model: openai('gpt-4o-mini'),
      schema: invoiceSchema,
      prompt: `
        Analyze the inbound email and extract invoice details.
        
        Rules:
        - Return null if invoice ID is missing.
        - Normalize dates to ISO-8601.
        - Ignore previous history in the thread.

        Email Content:
        ${emailText}
      `,
    });

    return object;

  } catch (error) {
    console.error("Extraction failed:", error);
    throw error;
  }
}

Advanced Logic

Context vs Noise Ironpost strips HTML at the edge. This saves thousands of tokens. Sending raw HTML costs more and distracts the model with CSS and layout tags.

Hallucination Prevention The describe() tags in Zod are crucial. They aren't just docs; the SDK passes these descriptions to the LLM. This gives the model explicit instructions, which reduces hallucinations.

Error Handling Never rely 100% on extraction for high-stakes data. Save the extracted JSON with a pending_review flag if confidence is low or the amount exceeds a threshold.

Conclusion

The combination of Ironpost and the Vercel AI SDK turns email into a first-class data source. You don't have to build custom integrations for every vendor. You provide an email address, and the machines handle the data entry.

Use stateless webhooks and structured LLM outputs. It is robust, cheap, and flexible. Start building your data pipeline today with the Ironpost free tier.

Ready to build for the machine-to-machine era?

Stop wrestling with legacy SMTP and stateful inboxes. Get your first programmatic identity and start building autonomous agents today.

Launch Your First Agent