
Real-Time Weather ETL Pipeline
An end-to-end, fully Dockerized ETL pipeline that ingests live weather data from the Weatherstack API, orchestrates ingestion and transformation with Airflow, stores and transforms data in PostgreSQL via dbt, and presents findings through Tableau dashboards — all running as containerized services.
Problem
Weather data is publicly available but fragmented across APIs with inconsistent formats. Building a reliable, repeatable pipeline that ingests, cleans, transforms, and visualizes this data — without manual steps — requires proper orchestration and a clean separation between raw and modeled layers.
Solution
A six-component Dockerized pipeline: Airflow DAGs orchestrate task sequencing and handle retries, Python scripts fetch Weatherstack API data on schedule, PostgreSQL stores raw and transformed records, dbt models define three analytics tables (staging, daily averages, weather report), and Tableau connects directly to PostgreSQL for live dashboard updates.
Architecture
Apache Airflow DAGs manage task sequencing: ingest → validate → transform → export. DAG retry logic and dependency management handle transient API failures.
Python scripts (psycopg2, pandas) fetch current conditions from the Weatherstack API on a scheduled interval and write raw records to PostgreSQL.
PostgreSQL stores raw ingestion data in a dev schema. Incremental loading ensures only new records are written — no full-table overwrites.
dbt models build three layers: stg_weather_data (typed, cleaned staging), daily_average (aggregated daily metrics), and weather_report (final analytics table).
Tableau connects directly to PostgreSQL and reads from the weather_report model. Dashboards update automatically as new data lands — no CSV exports needed.
Docker Compose orchestrates all services: Airflow webserver, scheduler, PostgreSQL, and dbt runner — with a single compose up bringing the full stack online.
Highlights
- Fully Dockerized — single docker compose up brings the entire pipeline online.
- Airflow DAG with retry logic, dependency management, and scheduled runs.
- dbt three-layer modeling: staging, daily averages, and final weather report.
- Incremental data loading — no full-table overwrites, append-only ingestion.
- Tableau connected directly to PostgreSQL for live, self-updating dashboard outputs.
- Pipeline health checks verify data freshness at each DAG step.