1) Scan through the below steps and import needed Python libraries (don’t worry, you can import them later if you forget one) 2) Load the data from the csv file 3) Perform basic commands to understand the data 4) Bin the following features: a) 'currentterm' into [0 to 11], [11 and more] b) 'mrr_entry' into [0 to 14.99], [14.99 to 500], [500 to 5K], [5K and more] c) 'account_age’ into [0 to 90], [90 to 180], [180 to 360], [360 and more] d) 'days_left_in_term’ into [0 to 30], [30 to 360], [360 and more] 5) Set 'churn_next_90' as your target column 6) Set 'zoom_account_no' as an ID column, this should not be a feature 7) Set 'ahs_date' as a date column, this should not be a feature 8) Treat the binned features from step (4) and the following features as categorical features: a) 'sales_group', b) 'employee_count', c) 'coreproduct' 9) Perform feature selection using your preferred method and ML algorithm. Choose 10 features and continue to step (10). 10) Divide the new data frame (with 10 features) into test and train subset 11) Use a different algorithm from part (9) and perform cross-validation method for parameter tuning. Print out the results. 12) Based on results from (11), fit your model on the train subset 13) Test your fitted model using the test subset 14) Print feature importance, accuracy score (roc_auc_score), and confusion matrix (crosstab) from step (13) 15) Save your trained model using pickle
Data Scientist Senior Interview Questions
3,383 data scientist senior interview questions shared by candidates
A/B testing
When do you graduate? What roles are you looking for?
The technical rounds and the case study were focused on traditional ML. How would you deal with columns containing hundreds of categories? How would you deal with class imbalance? How does xgboost deal with nan values? What is the difference between oversampling and class weights? What hyper-parameters did you use in your models? How did you decide between one hot encoding and target encoding? What was the loss function used and what the score function used and why they were chosen? Then there were questions regarding one of the projects you did and questions on that? Behavioural round - How would you deal with a low performer in your team? What challenges you have faced? What do you consider as failure? Hypothetical scenarios on linking your models to business KPIs ? How would you manage a project?
Survival model analysis, please walk me through an example
LPs, data processing, tech talk
Leet code questions - reverse a string
The conversation with the HR responsible was very nice and professional. She asked about data science in general and why I am looking for another job, etc. However, the conversation with the IT person was not that good. Most of the question were not related (it seems they were from his field). Before the interview, I checked to whom I am going to discuss and I found that he is an IT Engineer at the company (basically lower in the hierarchy than the offered position). This did not cause any problem to me but at the interview, the interviewer was surprisingly very offensive.
What is k-means clustering? Does it find a global optimum?
What is Time Series Forecasting?
Viewing 2621 - 2630 interview questions