2. What is MLOps?
Answer

MLOps is the practice of applying DevOps principles to Machine Learning systems.

It covers:

Data Management
Model Development
Model Versioning
Deployment
Monitoring
Retraining
Lifecycle
Data Collection
↓
Data Validation
↓
Feature Engineering
↓
Model Training
↓
Model Validation
↓
Deployment
↓
Monitoring
↓
Retraining
3. Difference between DevOps and MLOps?
DevOps MLOps
Focuses on application code Focuses on data + model + code
CI/CD CI/CD/CT
Version code Version code + data + models
Functional testing Model testing
Performance monitoring Model drift monitoring
4. What is CI/CD/CT in MLOps?
CI

Continuous Integration

Code Commit
↓
Unit Tests
↓
Build
CD

Continuous Delivery

Build
↓
Deploy
CT

Continuous Training

New Data
↓
Retrain Model
↓
Validate
↓
Deploy
5. How do you version ML models?
Tools
MLflow
DVC
S3
Git

Example:

import mlflow

mlflow.sklearn.log_model(model,"customer_churn")

Version:

v1
v2
v3
6. Explain MLflow
Components
Tracking
Projects
Models
Registry
Example
with mlflow.start_run():
mlflow.log_param("lr",0.01)
mlflow.log_metric("accuracy",0.95)

Interview Follow-up:

Why MLflow?

Answer:

Track experiments, compare runs, register models, and manage deployments.

7. What is Data Drift?
Answer

Input data distribution changes over time.

Example:

Training:

Age: 20-40

Production:

Age: 50-80

Model performance drops.

8. What is Concept Drift?
Answer

Relationship between features and target changes.

Example:

Before Covid:
Online spending low

After Covid:
Online spending high

Same inputs but different outcomes.

9. How do you detect drift?
Methods
PSI

Population Stability Index

KL Divergence
Wasserstein Distance
KS Test

Example:

from scipy.stats import ks_2samp

ks_2samp(train_data,prod_data)
10. How do you monitor models?
Metrics

Business Metrics

Revenue
Conversion
CTR

Model Metrics

Accuracy
Precision
Recall
F1

System Metrics

CPU
Memory
Latency
Throughput

Tools:

Prometheus
Grafana
ELK
11. Explain Model Retraining Pipeline
New Data
↓
Validation
↓
Feature Engineering
↓
Training
↓
Evaluation
↓
Deployment

Trigger:

Weekly
Monthly
Drift detection
12. What is Feature Store?
Answer

Central repository for ML features.

Benefits:

Reuse features
Consistency
Online serving
Offline training

Tools:

Feast
Tecton
13. Explain Docker in MLOps
Dockerfile
FROM python:3.11

COPY . /app

WORKDIR /app

RUN pip install -r requirements.txt

CMD ["python","app.py"]

Benefits:

Portability
Reproducibility
14. Difference between Docker and Kubernetes?
Docker Kubernetes
Containerization Orchestration
Single container Multiple containers
Packaging Scaling
15. How do you deploy ML models on Kubernetes?
Steps
Build Docker Image
↓
Push to Registry
↓
Create Deployment
↓
Create Service
↓
Expose API

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
name: model
spec:
replicas: 3
16. What is Canary Deployment?
Answer

Deploy new model to small percentage of users.

90% → Old Model

10% → New Model

If successful:

100% New Model
17. Blue-Green Deployment?
Answer
Blue = Production

Green = New Version

Switch traffic instantly.

Benefits:

Zero downtime
Easy rollback
18. How would you deploy a model with zero downtime?

Answer:

Kubernetes Rolling Update
Blue-Green Deployment
Canary Deployment
19. How do you handle large datasets?
Techniques
Spark
Partitioning
Parallel Processing

Example:

df.repartition(100)
20. What if training data is 1 TB?
Answer

Never load into memory.

Use:

Spark
Batch Processing
Distributed Training
21. What if model training takes 12 hours?
Answer

Options:

Distributed Training
GPU
Hyperparameter Optimization
Incremental Learning
22. Explain Kubernetes HPA

Horizontal Pod Autoscaler

CPU 70%

Scale:

3 Pods → 10 Pods

Example:

kubectl autoscale deployment model
23. What happens if a pod crashes?
Answer

Kubernetes automatically recreates it.

Controller:

ReplicaSet

maintains desired state.

24. How do you secure ML APIs?
Methods

Authentication

JWT
OAuth

Encryption

HTTPS
TLS

Secrets

Kubernetes Secrets
AWS Secrets Manager
25. Explain FastAPI deployment
from fastapi import FastAPI

app = FastAPI()

@app.get("/")
def predict():
return {"prediction":1}

Run:

uvicorn app:app
26. What is Model Explainability?
Techniques
SHAP
LIME
Feature Importance

Example:

import shap

Shows why prediction happened.

27. Scenario: Accuracy dropped from 95% to 70%
Approach

Check:

Data Drift
Concept Drift
Data Quality
Pipeline Failures
Feature Changes

Then:

Retrain
Validate
Redeploy
28. Scenario: Prediction API latency increased
Investigate
CPU
Memory
Network
Database
Model Size

Optimization:

Caching
Autoscaling
Quantization
GPU inference
29. Scenario: Production model gives different results than training
Root Causes
Feature mismatch
Data preprocessing mismatch
Version mismatch
Missing transformations

Solution:

Use same pipeline object.

30. Design an End-to-End MLOps Architecture
Data Sources
↓
Kafka
↓
Spark
↓
Feature Store
↓
Training Pipeline
↓
MLflow
↓
Model Registry
↓
Docker
↓
Kubernetes
↓
FastAPI
↓
Prometheus/Grafana
↓
Retraining Pipeline
Advanced EPAM Follow-up Questions
Why use Kubernetes instead of ECS?
Multi-cloud support
Better ecosystem
Advanced autoscaling
Service mesh support
Why MLflow over DVC?
Experiment tracking
Model registry
Deployment integration
How

Question

2. What is MLOps?
Answer

MLOps is the practice of applying DevOps principles to Machine Learning systems.

It covers:

Data Management
Model Development
Model Versioning
Deployment
Monitoring
Retraining
Lifecycle
Data Collection
      ↓
Data Validation
      ↓
Feature Engineering
      ↓
Model Training
      ↓
Model Validation
      ↓
Deployment
      ↓
Monitoring
      ↓
Retraining
3. Difference between DevOps and MLOps?
DevOps	MLOps
Focuses on application code	Focuses on data + model + code
CI/CD	CI/CD/CT
Version code	Version code + data + models
Functional testing	Model testing
Performance monitoring	Model drift monitoring
4. What is CI/CD/CT in MLOps?
CI

Continuous Integration

Code Commit
   ↓
Unit Tests
   ↓
Build
CD

Continuous Delivery

Build
 ↓
Deploy
CT

Continuous Training

New Data
 ↓
Retrain Model
 ↓
Validate
 ↓
Deploy
5. How do you version ML models?
Tools
MLflow
DVC
S3
Git

Example:

import mlflow

mlflow.sklearn.log_model(model,"customer_churn")

Version:

v1
v2
v3
6. Explain MLflow
Components
Tracking
Projects
Models
Registry
Example
with mlflow.start_run():
    mlflow.log_param("lr",0.01)
    mlflow.log_metric("accuracy",0.95)

Interview Follow-up:

Why MLflow?

Answer:

Track experiments, compare runs, register models, and manage deployments.

7. What is Data Drift?
Answer

Input data distribution changes over time.

Example:

Training:

Age: 20-40

Production:

Age: 50-80

Model performance drops.

8. What is Concept Drift?
Answer

Relationship between features and target changes.

Example:

Before Covid:
Online spending low

After Covid:
Online spending high

Same inputs but different outcomes.

9. How do you detect drift?
Methods
PSI

Population Stability Index

KL Divergence
Wasserstein Distance
KS Test

Example:

from scipy.stats import ks_2samp

ks_2samp(train_data,prod_data)
10. How do you monitor models?
Metrics

Business Metrics

Revenue
Conversion
CTR

Model Metrics

Accuracy
Precision
Recall
F1

System Metrics

CPU
Memory
Latency
Throughput

Tools:

Prometheus
Grafana
ELK
11. Explain Model Retraining Pipeline
New Data
 ↓
Validation
 ↓
Feature Engineering
 ↓
Training
 ↓
Evaluation
 ↓
Deployment

Trigger:

Weekly
Monthly
Drift detection
12. What is Feature Store?
Answer

Central repository for ML features.

Benefits:

Reuse features
Consistency
Online serving
Offline training

Tools:

Feast
Tecton
13. Explain Docker in MLOps
Dockerfile
FROM python:3.11

COPY . /app

WORKDIR /app

RUN pip install -r requirements.txt

CMD ["python","app.py"]

Benefits:

Portability
Reproducibility
14. Difference between Docker and Kubernetes?
Docker	Kubernetes
Containerization	Orchestration
Single container	Multiple containers
Packaging	Scaling
15. How do you deploy ML models on Kubernetes?
Steps
Build Docker Image
 ↓
Push to Registry
 ↓
Create Deployment
 ↓
Create Service
 ↓
Expose API

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: model
spec:
  replicas: 3
16. What is Canary Deployment?
Answer

Deploy new model to small percentage of users.

90% → Old Model

10% → New Model

If successful:

100% New Model
17. Blue-Green Deployment?
Answer
Blue = Production

Green = New Version

Switch traffic instantly.

Benefits:

Zero downtime
Easy rollback
18. How would you deploy a model with zero downtime?

Answer:

Kubernetes Rolling Update
Blue-Green Deployment
Canary Deployment
19. How do you handle large datasets?
Techniques
Spark
Partitioning
Parallel Processing

Example:

df.repartition(100)
20. What if training data is 1 TB?
Answer

Never load into memory.

Use:

Spark
Batch Processing
Distributed Training
21. What if model training takes 12 hours?
Answer

Options:

Distributed Training
GPU
Hyperparameter Optimization
Incremental Learning
22. Explain Kubernetes HPA

Horizontal Pod Autoscaler

CPU > 70%

Scale:

3 Pods → 10 Pods

Example:

kubectl autoscale deployment model
23. What happens if a pod crashes?
Answer

Kubernetes automatically recreates it.

Controller:

ReplicaSet

maintains desired state.

24. How do you secure ML APIs?
Methods

Authentication

JWT
OAuth

Encryption

HTTPS
TLS

Secrets

Kubernetes Secrets
AWS Secrets Manager
25. Explain FastAPI deployment
from fastapi import FastAPI

app = FastAPI()

@app.get("/")
def predict():
    return {"prediction":1}

Run:

uvicorn app:app
26. What is Model Explainability?
Techniques
SHAP
LIME
Feature Importance

Example:

import shap

Shows why prediction happened.

27. Scenario: Accuracy dropped from 95% to 70%
Approach

Check:

Data Drift
Concept Drift
Data Quality
Pipeline Failures
Feature Changes

Then:

Retrain
Validate
Redeploy
28. Scenario: Prediction API latency increased
Investigate
CPU
Memory
Network
Database
Model Size

Optimization:

Caching
Autoscaling
Quantization
GPU inference
29. Scenario: Production model gives different results than training
Root Causes
Feature mismatch
Data preprocessing mismatch
Version mismatch
Missing transformations

Solution:

Use same pipeline object.

30. Design an End-to-End MLOps Architecture
Data Sources
     ↓
Kafka
     ↓
Spark
     ↓
Feature Store
     ↓
Training Pipeline
     ↓
MLflow
     ↓
Model Registry
     ↓
Docker
     ↓
Kubernetes
     ↓
FastAPI
     ↓
Prometheus/Grafana
     ↓
Retraining Pipeline
Advanced EPAM Follow-up Questions
Why use Kubernetes instead of ECS?
Multi-cloud support
Better ecosystem
Advanced autoscaling
Service mesh support
Why MLflow over DVC?
Experiment tracking
Model registry
Deployment integration
How

Anonymous · Accepted Answer

2. What is MLOps?
Answer

MLOps is the practice of applying DevOps principles to Machine Learning systems.

It covers:

Data Management
Model Development
Model Versioning
Deployment
Monitoring
Retraining
Lifecycle
Data Collection
      ↓
Data Validation
      ↓
Feature Engineering
      ↓
Model Training
      ↓
Model Validation
      ↓
Deployment
      ↓
Monitoring
      ↓
Retraining
3. Difference between DevOps and MLOps?
DevOps	MLOps
Focuses on application code	Focuses on data + model + code
CI/CD	CI/CD/CT
Version code	Version code + data + models
Functional testing	Model testing
Performance monitoring	Model drift monitoring
4. What is CI/CD/CT in MLOps?
CI

Continuous Integration

Code Commit
   ↓
Unit Tests
   ↓
Build
CD

Continuous Delivery

Build
 ↓
Deploy
CT

Continuous Training

New Data
 ↓
Retrain Model
 ↓
Validate
 ↓
Deploy
5. How do you version ML models?
Tools
MLflow
DVC
S3
Git

Example:

import mlflow

mlflow.sklearn.log_model(model,"customer_churn")

Version:

v1
v2
v3
6. Explain MLflow
Components
Tracking
Projects
Models
Registry
Example
with mlflow.start_run():
    mlflow.log_param("lr",0.01)
    mlflow.log_metric("accuracy",0.95)

Interview Follow-up:

Why MLflow?

Answer:

Track experiments, compare runs, register models, and manage deployments.

7. What is Data Drift?
Answer

Input data distribution changes over time.

Example:

Training:

Age: 20-40

Production:

Age: 50-80

Model performance drops.

8. What is Concept Drift?
Answer

Relationship between features and target changes.

Example:

Before Covid:
Online spending low

After Covid:
Online spending high

Same inputs but different outcomes.

9. How do you detect drift?
Methods
PSI

Population Stability Index

KL Divergence
Wasserstein Distance
KS Test

Example:

from scipy.stats import ks_2samp

ks_2samp(train_data,prod_data)
10. How do you monitor models?
Metrics

Business Metrics

Revenue
Conversion
CTR

Model Metrics

Accuracy
Precision
Recall
F1

System Metrics

CPU
Memory
Latency
Throughput

Tools:

Prometheus
Grafana
ELK
11. Explain Model Retraining Pipeline
New Data
 ↓
Validation
 ↓
Feature Engineering
 ↓
Training
 ↓
Evaluation
 ↓
Deployment

Trigger:

Weekly
Monthly
Drift detection
12. What is Feature Store?
Answer

Central repository for ML features.

Benefits:

Reuse features
Consistency
Online serving
Offline training

Tools:

Feast
Tecton
13. Explain Docker in MLOps
Dockerfile
FROM python:3.11

COPY . /app

WORKDIR /app

RUN pip install -r requirements.txt

CMD ["python","app.py"]

Benefits:

Portability
Reproducibility
14. Difference between Docker and Kubernetes?
Docker	Kubernetes
Containerization	Orchestration
Single container	Multiple containers
Packaging	Scaling
15. How do you deploy ML models on Kubernetes?
Steps
Build Docker Image
 ↓
Push to Registry
 ↓
Create Deployment
 ↓
Create Service
 ↓
Expose API

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: model
spec:
  replicas: 3
16. What is Canary Deployment?
Answer

Deploy new model to small percentage of users.

90% → Old Model

10% → New Model

If successful:

100% New Model
17. Blue-Green Deployment?
Answer
Blue = Production

Green = New Version

Switch traffic instantly.

Benefits:

Zero downtime
Easy rollback
18. How would you deploy a model with zero downtime?

Answer:

Kubernetes Rolling Update
Blue-Green Deployment
Canary Deployment
19. How do you handle large datasets?
Techniques
Spark
Partitioning
Parallel Processing

Example:

df.repartition(100)
20. What if training data is 1 TB?
Answer

Never load into memory.

Use:

Spark
Batch Processing
Distributed Training
21. What if model training takes 12 hours?
Answer

Options:

Distributed Training
GPU
Hyperparameter Optimization
Incremental Learning
22. Explain Kubernetes HPA

Horizontal Pod Autoscaler

CPU > 70%

Scale:

3 Pods → 10 Pods

Example:

kubectl autoscale deployment model
23. What happens if a pod crashes?
Answer

Kubernetes automatically recreates it.

Controller:

ReplicaSet

maintains desired state.

24. How do you secure ML APIs?
Methods

Authentication

JWT
OAuth

Encryption

HTTPS
TLS

Secrets

Kubernetes Secrets
AWS Secrets Manager
25. Explain FastAPI deployment
from fastapi import FastAPI

app = FastAPI()

@app.get("/")
def predict():
    return {"prediction":1}

Run:

uvicorn app:app
26. What is Model Explainability?
Techniques
SHAP
LIME
Feature Importance

Example:

import shap

Shows why prediction happened.

27. Scenario: Accuracy dropped from 95% to 70%
Approach

Check:

Data Drift
Concept Drift
Data Quality
Pipeline Failures
Feature Changes

Then:

Retrain
Validate
Redeploy
28. Scenario: Prediction API latency increased
Investigate
CPU
Memory
Network
Database
Model Size

Optimization:

Caching
Autoscaling
Quantization
GPU inference
29. Scenario: Production model gives different results than training
Root Causes
Feature mismatch
Data preprocessing mismatch
Version mismatch
Missing transformations

Solution:

Use same pipeline object.

30. Design an End-to-End MLOps Architecture
Data Sources
     ↓
Kafka
     ↓
Spark
     ↓
Feature Store
     ↓
Training Pipeline
     ↓
MLflow
     ↓
Model Registry
     ↓
Docker
     ↓
Kubernetes
     ↓
FastAPI
     ↓
Prometheus/Grafana
     ↓
Retraining Pipeline
Advanced EPAM Follow-up Questions
Why use Kubernetes instead of ECS?
Multi-cloud support
Better ecosystem
Advanced autoscaling
Service mesh support
Why MLflow over DVC?
Experiment tracking
Model registry
Deployment integration
How

EPAM Systems

EPAM Systems Interview Question

Interview Answer

Followed companies

Job searches

Bowls

Want the inside scoop on your own company?