Thursday, June 12 — Building Blocks

🛠️ What I Worked On

  • Finalizing the Airflow setup — DAGs, environment config, and local testing. Airflow 3 UI is quite different from earlier ones which are in the videos

🚀 To do:

Building a full-fledged ETL pipeline from scratch. Here’s what I want to include:

  • Source: External APIs as data sources (JSON/CSV responses)
  • Ingestion: Use AWS Lambda / API Gateway or direct cron-based fetch
  • Streaming (optional): AWS Kinesis for real-time ingestion (stretch goal)
  • Processing & Orchestration: Airflow + possibly AWS Glue
  • Storage & Query: Snowflake as the data warehouse
  • Transformations: Use dbt or Snowflake’s native SQL/Snowpark
  • Dashboard: Lightweight Streamlit app or Snowsight for visualization

What I studied/researched :