Roadmap to become a Data Engineer

Roadmap to become a Data Engineer

STEP BY STEP APPROACH TO GET CLOSER TO YOUR DREAM JOB PROFILE!

Table of contents

No heading

No headings in the article.

Hey troubleshooters! dropping new series for budding data engineers to step into the world of data which is commonly refer as DATA IS THE NEW OIL. Without wasting time let's get deep dive into a fascinating journey along with the modular approach to learn and get hands-on practice with real-time projects.

MODULE 1: TASTE OF DATA ENGINEERING FIELD THROUGH SOME YOUTUBE VIDEOS

MODULE 2: DATA FUNDAMENTALS YOU NEED TO KNOW

  1. DATA ARCHITECTURE

  2. DATA MODELLING

  3. DATA PREPROCESSING

  4. DATA EXTRACTION

  5. DATA TRANSFORMATION

  6. DATA LOADING

  7. DATA VISUALISATION

  8. DATA LAKE

  9. DATA WAREHOUSING

  10. DATA MINING

  11. DATA ORCHESTRATION

  12. DATA STORAGE

  13. DATA COMPUTATION

  14. DATA MANAGEMENT

  15. DATA PROCESSING

  16. DATA PIPELINE

  17. DATA DEPLOYMENT

  18. DATA FILE FORMATS

  19. DATA SOURCE

  20. DATA EXPLORATION LIBRARIES

  21. DATA FRAMEWORK

  22. METADATA

MODULE 3: PYTHON DEEP DIVE

3.1. PYTHON BASICS

  • PYTHON FUNDAMENTALS

  • OPERATORS, CONDITIONAL STATEMENTS AND LOOPS

  • STRINGS WITH SOME SAMPLE PROBLEMS

  • LISTS

  • TUPLES, SETS AND DICTIONARY

  • FUNCTIONS

3.2 PYTHON INTERMEDIATE

  • OBJECT-ORIENTED PROGRAMMING(OOP) - CLASS & OBJECTS

  • OBJECT-ORIENTED PROGRAMMING(OOP) -ENCAPSULATION & STATIC KEYWORD

  • OBJECT-ORIENTED PROGRAMMING(OOP) - INHERITANCE & POLYMORPHISM

  • OBJECT-ORIENTED PROGRAMMING(OOP) - DATA ABSTRACTION

  • FILE HANDLING, SERIALISATION & DESERIALISATION

  • RECURSION USING PYTHON

  • EXCEPTION HANDLING, MODULES & PACKAGES

  • DECORATORS & NAMESPACES

  • ITERATORS IN PYTHON

  • GENERATORS IN PYTHON

  • MUTABILITY, GARBAGE COLLECTION & VARIABLE REFERENCING

  • LAMBDA FUNCTIONS

  • THREADING & MULTIPROCESSING

  • PYTHON MULTITHREADING

  • BUILDING GUI USING PYTHON

  • WALLPAPER VIEWER APPLICATION USING PYTHON

  • CALCULATOR GUI APPLICATION USING USING PYTHON

  • NEWS APPLICATION IN PYTHON

  • SOME PYTHON PROBLEM SETS TO SKILL-UP

3.3 PYTHON ADVANCED

  • NUMPY FUNDAMENTALS

  • ADVANCED NUMPY

  • NUMPY TRICKS

  • PYTHON FLASK WEB DEV

  • PANDAS SERIES

  • PANDAS DATA FRAME

  • PANDAS SERIES METHODS

  • PYTHON REST APIs

  • GROUP BY OBJECT IN PANDAS

  • MERGING, JOINING & CONCATENATING IN PANDAS

  • PYTHON STREAMLIT (BUILD INTERACTIVE WEB APP)

  • CASE STUDIES: DATA ANALYSIS WITH PYTHON PANDAS

  • MULTI-INDEX SERIES & DATA FRAME IN PANDAS

  • VECTORISED STRING OPERATION, DATA TIME IN PANDAS

  • PYTHON PANDAS TIME SERIES ANALYSIS

  • PLOTTING USING MATPLOTLIB

  • ADVANCED MATPLOTLIB

BONUS TIPS: PAID COURSE FOR PYTHON IN DATA ENGINEERING

ANOTHER CRASH COURSE (FREE) BY CODEWITHHARRY ALONG WITH PROJECTS

MODULE 4: DATA STRUCTURES AND ALGORITHMS USING PYTHON

BOOK RECOMMENDATION FOR DSA PREPARATION

MODULE 5: STRUCTURED QUERY LANGUAGE

  • DATABASE FUNDAMENTALS

  • SQL DDL COMMANDS

  • SQL DML COMMANDS

  • SQL GROUPING & SORTINH

  • SQL JOINS

  • SUBQUERIES IN SQL

  • SQL CASE STUDIES

  • MAKING DASHBOARDS USING PYTHON AND SQL

  • WINDOW FUNCTIONS IN SQL

  • TABLEAU PROJECT FOR BEGINNERS

  • WALKTHROUGH OF COMPLETE SQL WITH EXAMPLES WILL BE THERE IN THE UPCOMING BLOGS WITH PROPER EXPLANATION

MODULE 6: LINUX AND SHELL SCRIPTING

MODULE 7: DATA ENGINEERING PROJECT PORTFOLIO

PROJECT 1:

  1. BUILDING DATA MODEL AND DATABASE

  2. CREATING THE DATABASE & BUILDING TABLES WITH PYTHON

  3. DEPLOYING THE DATA MODEL INTO THE DATABASE

PROJECT 2:

  1. WHAT IS THE DATA WAREHOUSE

  2. TECH SIDE OF THE DATA WAREHOUSE

  3. SCHEMA DESIGN AND USING SQL FOR DATA ANALYSIS

  4. HOW TO BUILD A STAR SCHEMA AND UNDERSTANDING OF QUERY TIME ANALYSIS

MODULE 8: DATA ENGINEERING BRIEFING

  • BIG DATA FUNDAMENTALS

  • DATABASES VS DATA WAREHOUSES VS DATA LAKES

  • DATA WAREHOUSES

  • DATA PIPELINES

  • DIFFERENT DATA FILE FORMATS IN BIG DATA ENGINEERING

  • ETL, OLTP VS OLAP

  • DATA PROCESSING - REAL-TIME VS BATCH

  • STOCK MARKET REAL-TIME DATA ANALYSIS USING KAFKA (MINI PROJECT)

MODULE 9: BIG DATA FRAMEWORKS

  1. APACHE SPARK

  2. APACHE KAFKA

MODULE 10: DATA ORCHESTRATION

AIRFLOW - ALL FUNDAMENTALS AND DEEP DIVE INTO CORE CONCEPTS

MODULE 11: TWITTER DATA PIPELINE (MINI PROJECT)

  • END-TO-END DATA ENGINEERING PROJECT USING AIRFLOW AND PYTHON

  • EXTRACTING DATA USING TWITTER API

  • USING PYTHON TO TRANSFORM DATA

  • DEPLOY THE CODE ON AIRFLOW/EC2

  • SAVE THE FINAL RESULT ON AMAZON S3

MODULE 12: CLOUD COMPUTING AND HANDS- ON PROJECT

  1. AWS OFFICIAL DOC FOR LEARNING PURPOSE

  2. WHAT IS CLOUD COMPUTING?

  3. AWS ACCOUNT SETUP

  4. ON-PREMISE VS CLOUD SERVERS AND SELF- MANAGED VS CLOUD MANAGED

  5. UNDERSTAND AWS REDSHIFT BASICS & ARCHITECTURE IN DETAIL

  6. WHAT ARE THE DATA PIPELINE AND LOADING DATA IN THE DATA WAREHOUSE?

  7. HOW TO CREATE REDSHIFT CLUSTER AND LOAD DATA?

  8. DEVOPS? DATAOPS? INFRASTRUCTURE A CODE?

  9. BUILD DATA PIPELINE USING PYTHON INFRASTRUCTURE AS CODE

  10. BUILDING DATA PIPELINE & LOADING DATA INTO REDSHIFT USING 'COPY' COMMAND

  11. OPTIMISING REDSHIFT DATA WAREHOUSE USING 'DIST' & 'SORT' KEY

    PROJECT 1: COVID-19 DATA ANALYSIS END-TO-END PROJECT

    PROJECT 2: YOUTUBE DATA ANALYSIS END-TO-END PROJECT

MODULE 13: DATA WAREHOUSING TOOL

SNOWFLAKE

MODULE 14: LEARN MODERN DATA STACK

  1. LEARN BASICS :

    %[analyticsindiamag.com/modern-data-stack-and..

  2. Dbt :

  1. AIRBYTE:

  1. FIVETRAN

    REFER TO OFFICIAL DOC FOR FURTHER LEARNING

MODULE 15: DATAOPS

  1. KUBERNETES

  2. DOCKER

MODULE 16: REAL- WORLD CASE STUDIES

Netflix -

AWS-

GCP -

MODULE 17: PROJECTS! PROJECTS! PROJECTS!

  • SPOTIFY END-TO-END DATA ENGINEERING PROJECT

  • iPHONE END-T0-END DATA ENGINEERING PROJECT

Hey you read till here thanks a bunch, hope you liked it upcoming blogs will be centric upon each and every module mentioned above in a detailed version until then explore and share. and yes don't forget to give feedback that would be highly appreciable:)

Did you find this article valuable?

Support Hritika Pal by becoming a sponsor. Any amount is appreciated!