From PDF to Dashboard: Building an AI Financial Pipeline

The problem was straightforward: a finance team receiving 200+ vendor invoices and bank statements per month as PDFs, manually extracting line items into a spreadsheet, then manually producing a variance report. Ten hours a week, every week, for years.

The extraction layer

I tried traditional PDF parsing with pdfplumber and regex first — worked on 60% of invoices, failed on anything with a non-standard table layout or scanned image content. Not usable in production.

I then sent PDF text (or image for scanned docs) directly to Claude and asked for structured JSON output. Accuracy on the test set: 94% of line items correctly extracted, including from scanned PDFs via vision. I use Claude for most invoices and fall back to Gemini for documents where Claude's vision struggled.

Airflow orchestration

The DAG: fetch_new_pdfs → extract_line_items → validate_schema → load_to_bigquery_raw → dbt_run → generate_variance_report → send_to_finance_team. Each step is a separate task so retries only redo their own work.

validate_schema runs Great Expectations checks on the extracted JSON before loading. Documents that fail go to a review queue rather than being silently dropped.

The dbt layer in BigQuery

Three layers: staging (type casting, vendor name normalization via a seed table of known aliases), intermediate (joins line items to budget allocation table), and mart (fct_monthly_spend and fct_variance for dollar and percentage variance per line).

Automated variance commentary

The variance report used to be written manually by finance analysts every month. I replaced that with an LLM call at the end of the Airflow DAG — it receives the top 10 variances and the previous month's commentary as context, and generates a structured narrative. The finance team reviews and edits before sending. The first draft is 80% done, reducing commentary from 2 hours to 20 minutes.

Results after four months

- Manual extraction: from 10 hrs/week to under 30 minutes

- Revenue identified: +15% from invoices previously not logged

- Forecast accuracy: +20% from structured historical data

- Validation failures caught before loading: 23 documents that would have silently corrupted the spreadsheet

The LLM extraction layer sounds risky but performed best in practice. The dbt transformation layer is where most of the reliability work lives.