Friday, June 13 — Is it Possible ?

🛠️ A few weeks ago, even the thought of cracking an interview like this would have been unimaginable. But now my mindset is changing.


🚀 Job description:

  • Strong proficiency in SQL and at least one programming language (e.g., Python, Java).
  • Experience with data pipeline tools and frameworks
  • Experience with cloud-based data warehousing solutions (Snowflake).
  • Experience with AWS Kinesis, SNS, SQS
  • Excellent problem-solving and analytical skills.

🚀 You will:

  • Design, develop, and maintain robust and scalable data pipelines that ingest, transform, and load data from various sources into data warehouse.
  • Collaborate with business stakeholders to understand data requirements and translate them into technical solutions.
  • Implement data quality checks and monitoring to ensure data accuracy and integrity.
  • Optimize data pipelines for performance and efficiency.
  • Troubleshoot and resolve data pipeline issues.
  • Stay up-to-date with emerging technologies and trends in data engineering.

🚀 Key Skills:

  • Data pipeline architecture
  • Data warehousing
  • ETL (Extract, Transform, Load)
  • Data modeling
  • SQL
  • Python or Java or Go
  • Cloud computing
  • Business intelligence

🛠️ Though I started off very confident, this YT video got me really scared:

🔍🚀🛠️ Wrapped up a mini ETL project using Apache Airflow, NASA’s APOD API, and PostgreSQL.


🚀 Code Summary in 4 Lines

  1. Used SimpleHttpOperator to fetch the Astronomy Picture of the Day from NASA’s API.
  2. Transformed the response using a Python task to keep only the relevant fields.
  3. Created a Postgres table (if not exists) via PostgresHook.
  4. Inserted the cleaned data into the table using a parameterized SQL query.

🔍 Airflow Concepts Applied

  • DAG: The pipeline is defined as a DAG scheduled to run daily (@daily), ensuring idempotent ETL behavior.
  • Hooks: PostgresHook allows executing SQL queries securely and cleanly from Python tasks.
  • Operators: SimpleHttpOperator handled the API request; @task decorators handled transformation and loading.
  • Connections: Airflow’s Connection UI is used to store both the API key (nasa_api) and DB credentials (my_postgres_connection), keeping the code clean and secure.
  • Task Dependencies: DAG flow enforces the order: create table → extract → transform → load.

Studied/Researched