All work
Real-Time Weather ETL Pipeline
2024·Sole engineer·shipped

Real-Time Weather ETL Pipeline

An end-to-end, fully Dockerized ETL pipeline that ingests live weather data from the Weatherstack API, orchestrates ingestion and transformation with Airflow, stores and transforms data in PostgreSQL via dbt, and presents findings through Tableau dashboards — all running as containerized services.

Problem

Weather data is publicly available but fragmented across APIs with inconsistent formats. Building a reliable, repeatable pipeline that ingests, cleans, transforms, and visualizes this data — without manual steps — requires proper orchestration and a clean separation between raw and modeled layers.

Solution

A six-component Dockerized pipeline: Airflow DAGs orchestrate task sequencing and handle retries, Python scripts fetch Weatherstack API data on schedule, PostgreSQL stores raw and transformed records, dbt models define three analytics tables (staging, daily averages, weather report), and Tableau connects directly to PostgreSQL for live dashboard updates.

Architecture

Orchestration

Apache Airflow DAGs manage task sequencing: ingest → validate → transform → export. DAG retry logic and dependency management handle transient API failures.

Ingestion

Python scripts (psycopg2, pandas) fetch current conditions from the Weatherstack API on a scheduled interval and write raw records to PostgreSQL.

Storage

PostgreSQL stores raw ingestion data in a dev schema. Incremental loading ensures only new records are written — no full-table overwrites.

Transformation

dbt models build three layers: stg_weather_data (typed, cleaned staging), daily_average (aggregated daily metrics), and weather_report (final analytics table).

Visualization

Tableau connects directly to PostgreSQL and reads from the weather_report model. Dashboards update automatically as new data lands — no CSV exports needed.

Infrastructure

Docker Compose orchestrates all services: Airflow webserver, scheduler, PostgreSQL, and dbt runner — with a single compose up bringing the full stack online.

Highlights

  • Fully Dockerized — single docker compose up brings the entire pipeline online.
  • Airflow DAG with retry logic, dependency management, and scheduled runs.
  • dbt three-layer modeling: staging, daily averages, and final weather report.
  • Incremental data loading — no full-table overwrites, append-only ingestion.
  • Tableau connected directly to PostgreSQL for live, self-updating dashboard outputs.
  • Pipeline health checks verify data freshness at each DAG step.