Day 7 – It’s Possible | My Learning Journey

Friday, June 13 — Is it Possible ?

Strong proficiency in SQL and at least one programming language (e.g., Python, Java).
Experience with data pipeline tools and frameworks
Experience with cloud-based data warehousing solutions (Snowflake).
Experience with AWS Kinesis, SNS, SQS
Excellent problem-solving and analytical skills.

Design, develop, and maintain robust and scalable data pipelines that ingest, transform, and load data from various sources into data warehouse.
Collaborate with business stakeholders to understand data requirements and translate them into technical solutions.
Implement data quality checks and monitoring to ensure data accuracy and integrity.
Optimize data pipelines for performance and efficiency.
Troubleshoot and resolve data pipeline issues.
Stay up-to-date with emerging technologies and trends in data engineering.

Used SimpleHttpOperator to fetch the Astronomy Picture of the Day from NASA’s API.
Transformed the response using a Python task to keep only the relevant fields.
Created a Postgres table (if not exists) via PostgresHook.
Inserted the cleaned data into the table using a parameterized SQL query.

DAG: The pipeline is defined as a DAG scheduled to run daily (@daily), ensuring idempotent ETL behavior.
Hooks: PostgresHook allows executing SQL queries securely and cleanly from Python tasks.
Operators: SimpleHttpOperator handled the API request; @task decorators handled transformation and loading.
Connections: Airflow’s Connection UI is used to store both the API key (nasa_api) and DB credentials (my_postgres_connection), keeping the code clean and secure.
Task Dependencies: DAG flow enforces the order: create table → extract → transform → load.