SQL : 1. create Pivot tables, 2. find the sequence gap and aggregate the result python: 1. Tricky python questions like function and their output 2. Normal and basic level of coding Pyspark: 1. Please check this topics of pyspark Windows functions , date_diff, case when , watermark and aggregation
Senior Data Engineer Interview Questions
2,553 senior data engineer interview questions shared by candidates
How is Kafka different from MQ
Round 1 Coding Questions: Write a program to count the number of binary 1s in a number. Example: For 7 → Output should be 3. Write a program to check whether the given input contains valid parentheses. SQL Question: List all employees whose salary is greater than the average salary of their respective department. Scala Theory Question: What is implicit in Scala? Spark Questions: How do you copy data from HDFS to the local file system? What is the command for spark-submit? What is the difference between reduceByKey and groupByKey? What happens during a broadcast join in Spark, and how does it reduce shuffling? What happens when we use the collect() action in Spark? How do you define the minimum and maximum number of executors in Spark?
Times X{open, close} , Y{open, close} rec_type, status, time x1, open, 930 x1, close 1030 x2, open, 1035 y1, open, 1040 y2, open, 1041 x2, close, 1100 x3, open, 1110 x3, close, 1115 y1, close, 1120 y2, close, 1121 |---x1, open, 930 |---x1, close 1030 |-----x2, open, 1035 | y1, open, 1040----| | y2, open, 1041 ---+---| |-----x2, close, 1100 | | |---x3, open, 1110 | | |---x3, close, 1115 | | y1, close, 1120---| | y2, close, 1121-------| Find the pairs of x-type and y-type where they have any time overlap between them.
Java OOP concepts Difference between Interface and Abstract class. Java Memory Management Optimization. Checked vs Unchecked Exception. Definition of Microservices What is new in Python 3? Different types of Python Structures Definition of Monkey Patching Mostly definitions around Software engineering practices.
Various Python, Snowflake, and dbt questions. More heavily weighted to Python, as noted above. Some ETL vs ELT and OLAP vs OLTP questions.
Design an ETL pipeline that loads data every hour from X system to Y System with consistency and reliability describing edge cases.
interview 1 : 30 min screen call with engineering lead interview 2: 60 coderpad coding round. I was asked to optimized text search. Interview 3: 60 min debugging interview in python. they will give some code with bugs and then they will ask you to fix code to pass test cases Interview 4: 45 min data quality. i was given some football match dataset to draw insights to figure out who will buy the tickeys Interview 5: 45 min ML infra. they will ask you how would you push something developed by data science to production Interview 6: 2hr behavioral interview. they will start with your undergrad and will ask each and every thing about your schools, universities etc etc
Primarily on the data warehousing and cloud part
Previous data engineering experience, data pipelines and tech stack related questions.
Viewing 1991 - 2000 interview questions