AI-Driven Financial Insight Pipeline
Project Overview
I developed a comprehensive AI-powered pipeline to streamline financial data integration and enhance revenue forecasting for a business that manufactures and sells number plates across India. The primary challenge was consolidating internal sales data with external transaction records from a payment gateway (PhonePe), which initially only provided PDF exports.
Key Components and Workflow
1. Data Ingestion and Processing
Internal data sources: invoices, transactions, and order details stored in SQL databases.
External data: PhonePe payment gateway, which only provided transaction records as PDF exports.
To process these PDFs, I leveraged NotebookLM, which extracted and structured the raw transaction data into clean, usable tables. This bridged the gap between unstructured files and structured financial datasets.
2. Pipeline Orchestration with Airflow
Implemented Apache Airflow to automate the end-to-end workflow:
Retrieve raw files (PDFs/CSVs).
Run data extraction via NotebookLM.
Load structured data into SQL for storage.
Apply predictive models for forecasting.
Airflow ensured the pipeline ran reliably on a schedule with logging and monitoring.
3. Predictive Modeling with AI
Leveraged Gemini and GPT-based models to forecast revenue using both historical sales and payment gateway data.
Incorporated external macroeconomic signals (e.g., seasonal demand, regional sales variation) suggested by AI to enrich model accuracy.
AI also auto-documented assumptions in natural language, enabling business users to understand forecasts without needing technical details.
4. Automated Variance Analysis and Insights
Designed the pipeline to run variance analysis between forecasts and actuals automatically.
Used AI to generate narrative insights explaining variances
Delivered results through interactive, queryable reports so executives could ask “what-if” questions.
Outcomes and Impact
+20% improvement in revenue forecast accuracy.
Identified three top-performing product lines, enabling targeted marketing and boosting sales revenue by 15% in six months.
Reduced manual reporting effort by automating data extraction and variance analysis, saving the finance team 10+ hours per week.
Delivered board-ready executive summaries and dashboards that updated automatically as new data arrived.
Technologies Used
Python, SQL, Apache Airflow
Gemini, GPT, NotebookLM
PDF data extraction, predictive modeling, automated reporting