AI-Driven Financial Insight Pipeline

Category:

Data Analytics

Technology:

Python, SQL, Airflow, AI-tools

Skillset:

NotebookLM


AI-Driven Financial Insight Pipeline

Project Overview

I developed a comprehensive AI-powered pipeline to streamline financial data integration and enhance revenue forecasting for a business that manufactures and sells number plates across India. The primary challenge was consolidating internal sales data with external transaction records from a payment gateway (PhonePe), which initially only provided PDF exports.

Key Components and Workflow

1. Data Ingestion and Processing

  • Internal data sources: invoices, transactions, and order details stored in SQL databases.

  • External data: PhonePe payment gateway, which only provided transaction records as PDF exports.

  • To process these PDFs, I leveraged NotebookLM, which extracted and structured the raw transaction data into clean, usable tables. This bridged the gap between unstructured files and structured financial datasets.


2. Pipeline Orchestration with Airflow

  • Implemented Apache Airflow to automate the end-to-end workflow:


    1. Retrieve raw files (PDFs/CSVs).

    2. Run data extraction via NotebookLM.

    3. Load structured data into SQL for storage.

    4. Apply predictive models for forecasting.


  • Airflow ensured the pipeline ran reliably on a schedule with logging and monitoring.


3. Predictive Modeling with AI

  • Leveraged Gemini and GPT-based models to forecast revenue using both historical sales and payment gateway data.

  • Incorporated external macroeconomic signals (e.g., seasonal demand, regional sales variation) suggested by AI to enrich model accuracy.

  • AI also auto-documented assumptions in natural language, enabling business users to understand forecasts without needing technical details.


4. Automated Variance Analysis and Insights

  • Designed the pipeline to run variance analysis between forecasts and actuals automatically.

  • Used AI to generate narrative insights explaining variances

  • Delivered results through interactive, queryable reports so executives could ask “what-if” questions.


Outcomes and Impact



  • +20% improvement in revenue forecast accuracy.

  • Identified three top-performing product lines, enabling targeted marketing and boosting sales revenue by 15% in six months.

  • Reduced manual reporting effort by automating data extraction and variance analysis, saving the finance team 10+ hours per week.

  • Delivered board-ready executive summaries and dashboards that updated automatically as new data arrived.


Technologies Used

  • Python, SQL, Apache Airflow

  • Gemini, GPT, NotebookLM

  • PDF data extraction, predictive modeling, automated reporting