Slowly changing dimension type 2,Unix commands,Self join queries, joins
Etl Engineer Interview Questions
1,579 etl engineer interview questions shared by candidates
Data warehouse concept,ETL,Facts,About project,SCD,SQL Advance
Difference between Pandas Dataframe and Numpy Araay?
SQL Challenges: a. Given two tables, "Sales" and "Customers," write a SQL query to calculate the total sales amount for each customer in the last quarter of the year, considering only customers who have made at least three purchases during that period. b. Write a SQL query to identify the top 10% of products with the highest sales revenue in the "Products" table, excluding products that have been discontinued. Python Coding Challenge: You have been provided with two CSV files: "employees.csv" and "departments.csv." The "employees.csv" file contains information about employees, including their names, salaries, department IDs, and hire dates. The "departments.csv" file contains department IDs and their respective names. Write a Python script to perform the following tasks: a. Calculate the average salary for each department and store the results in a new CSV file named "average_salary_per_department.csv." b. Identify the department with the highest average salary and print its name along with the value. c. Determine the number of employees hired in each year and output the results in descending order. ETL Workflow Design: Describe the end-to-end workflow you would design for an ETL process that involves extracting data from a real-time streaming source, transforming the data to fit a specific data model, and loading it into a data warehouse. Consider handling late-arriving data, data consistency, and error logging in your design. Data Quality and Validation: Explain how you would ensure data quality during the ETL process. What techniques and checks would you implement to identify and handle anomalies, missing values, and inconsistencies in the data? Performance Optimization: Discuss strategies for optimizing the performance of an ETL pipeline when dealing with large volumes of data. How would you approach parallel processing, partitioning, and indexing to enhance the overall performance?
ETL tool related questions if you have hands-on
Sp, sql queries, talend etl scenarios, brief description on my projects.
Basic DWH concepts and swl queries
SQL,ssis questions
SQL scenario based queries and unix
Flow of project and flow of etl testing.
Viewing 1031 - 1040 interview questions