Roadmap to become a Data Engineer
STEP BY STEP APPROACH TO GET CLOSER TO YOUR DREAM JOB PROFILE!
Table of contents
No headings in the article.
Hey troubleshooters! dropping new series for budding data engineers to step into the world of data which is commonly refer as DATA IS THE NEW OIL. Without wasting time let's get deep dive into a fascinating journey along with the modular approach to learn and get hands-on practice with real-time projects.
MODULE 1: TASTE OF DATA ENGINEERING FIELD THROUGH SOME YOUTUBE VIDEOS
MODULE 2: DATA FUNDAMENTALS YOU NEED TO KNOW
DATA ARCHITECTURE
DATA MODELLING
DATA PREPROCESSING
DATA EXTRACTION
DATA TRANSFORMATION
DATA LOADING
DATA VISUALISATION
DATA LAKE
DATA WAREHOUSING
DATA MINING
DATA ORCHESTRATION
DATA STORAGE
DATA COMPUTATION
DATA MANAGEMENT
DATA PROCESSING
DATA PIPELINE
DATA DEPLOYMENT
DATA FILE FORMATS
DATA SOURCE
DATA EXPLORATION LIBRARIES
DATA FRAMEWORK
METADATA
MODULE 3: PYTHON DEEP DIVE
3.1. PYTHON BASICS
PYTHON FUNDAMENTALS
OPERATORS, CONDITIONAL STATEMENTS AND LOOPS
STRINGS WITH SOME SAMPLE PROBLEMS
LISTS
TUPLES, SETS AND DICTIONARY
FUNCTIONS
3.2 PYTHON INTERMEDIATE
OBJECT-ORIENTED PROGRAMMING(OOP) - CLASS & OBJECTS
OBJECT-ORIENTED PROGRAMMING(OOP) -ENCAPSULATION & STATIC KEYWORD
OBJECT-ORIENTED PROGRAMMING(OOP) - INHERITANCE & POLYMORPHISM
OBJECT-ORIENTED PROGRAMMING(OOP) - DATA ABSTRACTION
FILE HANDLING, SERIALISATION & DESERIALISATION
RECURSION USING PYTHON
EXCEPTION HANDLING, MODULES & PACKAGES
DECORATORS & NAMESPACES
ITERATORS IN PYTHON
GENERATORS IN PYTHON
MUTABILITY, GARBAGE COLLECTION & VARIABLE REFERENCING
LAMBDA FUNCTIONS
THREADING & MULTIPROCESSING
PYTHON MULTITHREADING
BUILDING GUI USING PYTHON
WALLPAPER VIEWER APPLICATION USING PYTHON
CALCULATOR GUI APPLICATION USING USING PYTHON
NEWS APPLICATION IN PYTHON
SOME PYTHON PROBLEM SETS TO SKILL-UP
3.3 PYTHON ADVANCED
NUMPY FUNDAMENTALS
ADVANCED NUMPY
NUMPY TRICKS
PYTHON FLASK WEB DEV
PANDAS SERIES
PANDAS DATA FRAME
PANDAS SERIES METHODS
PYTHON REST APIs
GROUP BY OBJECT IN PANDAS
MERGING, JOINING & CONCATENATING IN PANDAS
PYTHON STREAMLIT (BUILD INTERACTIVE WEB APP)
CASE STUDIES: DATA ANALYSIS WITH PYTHON PANDAS
MULTI-INDEX SERIES & DATA FRAME IN PANDAS
VECTORISED STRING OPERATION, DATA TIME IN PANDAS
PYTHON PANDAS TIME SERIES ANALYSIS
PLOTTING USING MATPLOTLIB
ADVANCED MATPLOTLIB
BONUS TIPS: PAID COURSE FOR PYTHON IN DATA ENGINEERING
ANOTHER CRASH COURSE (FREE) BY CODEWITHHARRY ALONG WITH PROJECTS
MODULE 4: DATA STRUCTURES AND ALGORITHMS USING PYTHON
BOOK RECOMMENDATION FOR DSA PREPARATION
MODULE 5: STRUCTURED QUERY LANGUAGE
DATABASE FUNDAMENTALS
SQL DDL COMMANDS
SQL DML COMMANDS
SQL GROUPING & SORTINH
SQL JOINS
SUBQUERIES IN SQL
SQL CASE STUDIES
MAKING DASHBOARDS USING PYTHON AND SQL
WINDOW FUNCTIONS IN SQL
TABLEAU PROJECT FOR BEGINNERS
WALKTHROUGH OF COMPLETE SQL WITH EXAMPLES WILL BE THERE IN THE UPCOMING BLOGS WITH PROPER EXPLANATION
MODULE 6: LINUX AND SHELL SCRIPTING
MODULE 7: DATA ENGINEERING PROJECT PORTFOLIO
PROJECT 1:
BUILDING DATA MODEL AND DATABASE
CREATING THE DATABASE & BUILDING TABLES WITH PYTHON
DEPLOYING THE DATA MODEL INTO THE DATABASE
PROJECT 2:
WHAT IS THE DATA WAREHOUSE
TECH SIDE OF THE DATA WAREHOUSE
SCHEMA DESIGN AND USING SQL FOR DATA ANALYSIS
HOW TO BUILD A STAR SCHEMA AND UNDERSTANDING OF QUERY TIME ANALYSIS
MODULE 8: DATA ENGINEERING BRIEFING
BIG DATA FUNDAMENTALS
DATABASES VS DATA WAREHOUSES VS DATA LAKES
DATA WAREHOUSES
DATA PIPELINES
DIFFERENT DATA FILE FORMATS IN BIG DATA ENGINEERING
ETL, OLTP VS OLAP
DATA PROCESSING - REAL-TIME VS BATCH
STOCK MARKET REAL-TIME DATA ANALYSIS USING KAFKA (MINI PROJECT)
MODULE 9: BIG DATA FRAMEWORKS
APACHE SPARK
APACHE KAFKA
MODULE 10: DATA ORCHESTRATION
AIRFLOW - ALL FUNDAMENTALS AND DEEP DIVE INTO CORE CONCEPTS
MODULE 11: TWITTER DATA PIPELINE (MINI PROJECT)
END-TO-END DATA ENGINEERING PROJECT USING AIRFLOW AND PYTHON
EXTRACTING DATA USING TWITTER API
USING PYTHON TO TRANSFORM DATA
DEPLOY THE CODE ON AIRFLOW/EC2
SAVE THE FINAL RESULT ON AMAZON S3
MODULE 12: CLOUD COMPUTING AND HANDS- ON PROJECT
AWS OFFICIAL DOC FOR LEARNING PURPOSE
WHAT IS CLOUD COMPUTING?
AWS ACCOUNT SETUP
ON-PREMISE VS CLOUD SERVERS AND SELF- MANAGED VS CLOUD MANAGED
UNDERSTAND AWS REDSHIFT BASICS & ARCHITECTURE IN DETAIL
WHAT ARE THE DATA PIPELINE AND LOADING DATA IN THE DATA WAREHOUSE?
HOW TO CREATE REDSHIFT CLUSTER AND LOAD DATA?
DEVOPS? DATAOPS? INFRASTRUCTURE A CODE?
BUILD DATA PIPELINE USING PYTHON INFRASTRUCTURE AS CODE
BUILDING DATA PIPELINE & LOADING DATA INTO REDSHIFT USING 'COPY' COMMAND
OPTIMISING REDSHIFT DATA WAREHOUSE USING 'DIST' & 'SORT' KEY
PROJECT 1: COVID-19 DATA ANALYSIS END-TO-END PROJECT
PROJECT 2: YOUTUBE DATA ANALYSIS END-TO-END PROJECT
MODULE 13: DATA WAREHOUSING TOOL
SNOWFLAKE
MODULE 14: LEARN MODERN DATA STACK
LEARN BASICS :
Dbt :
- AIRBYTE:
FIVETRAN
REFER TO OFFICIAL DOC FOR FURTHER LEARNING
MODULE 15: DATAOPS
KUBERNETES
DOCKER
MODULE 16: REAL- WORLD CASE STUDIES
Netflix -
AWS-
GCP -
MODULE 17: PROJECTS! PROJECTS! PROJECTS!
SPOTIFY END-TO-END DATA ENGINEERING PROJECT
iPHONE END-T0-END DATA ENGINEERING PROJECT
Hey you read till here thanks a bunch, hope you liked it upcoming blogs will be centric upon each and every module mentioned above in a detailed version until then explore and share. and yes don't forget to give feedback that would be highly appreciable:)