Skip to contentSkip to footer
  • Community
  • Jobs
  • Companies
  • Salaries
  • For Employers
      Notifications

      Loading...

      Elevate your career

      Discover your earning potential, land dream jobs, and share work-life insights anonymously.

      employer cover photo
      employer logo
      employer logo

      EPAM Systems

      Engaged Employer

      About
      Reviews
      Pay & benefits
      Jobs
      Interviews
      Interviews
      Related searches: EPAM Systems reviews | EPAM Systems jobs | EPAM Systems salaries | EPAM Systems benefits | EPAM Systems conversations
      EPAM Systems interviewsEPAM Systems MLOP Engineer interviewsEPAM Systems interview


      Glassdoor

      • About / Press
      • Awards
      • Blog
      • Research
      • Contact Us
      • Guides

      Employers

      • Free Employer Account
      • Employer Center
      • Employers Blog

      Information

      • Help
      • Guidelines
      • Terms of Use
      • Privacy & Ad Choices
      • Do Not Sell Or Share My Information
      • Cookie Consent Tool
      • Security

      Work With Us

      • Advertisers
      • Careers
      Download the App

      • Browse by:
      • Companies
      • Jobs
      • Locations
      • Communities
      • Recent Posts

      Copyright © 2008-2026. Glassdoor LLC. "Glassdoor," "Worklife Pro," "Bowls," and logo are proprietary trademarks of Glassdoor LLC.

      Company Bowl sample

      Want the inside scoop on your own company?

      Check out your Company Bowl for anonymous work chats.

      Bowls

      Get actionable career advice tailored to you by joining more bowls.

      Followed companies

      Stay ahead in opportunities and insider tips by following your dream companies.

      Job searches

      Get personalized job recommendations and updates by starting your searches.

      MLOP Engineer Interview

      Jun 3, 2026
      Anonymous Interview Candidate
      Hyderabad
      No offer
      Positive experience
      Difficult interview

      Application

      I interviewed at EPAM Systems (Hyderabad)

      Interview

      2. What is MLOps? Answer MLOps is the practice of applying DevOps principles to Machine Learning systems. It covers: Data Management Model Development Model Versioning Deployment Monitoring Retraining Lifecycle Data Collection ↓ Data Validation ↓ Feature Engineering ↓ Model Training ↓ Model Validation ↓ Deployment ↓ Monitoring ↓ Retraining 3. Difference between DevOps and MLOps? DevOps MLOps Focuses on application code Focuses on data + model + code CI/CD CI/CD/CT Version code Version code + data + models Functional testing Model testing Performance monitoring Model drift monitoring 4. What is CI/CD/CT in MLOps? CI Continuous Integration Code Commit ↓ Unit Tests ↓ Build CD Continuous Delivery Build ↓ Deploy CT Continuous Training New Data ↓ Retrain Model ↓ Validate ↓ Deploy 5. How do you version ML models? Tools MLflow DVC S3 Git Example: import mlflow mlflow.sklearn.log_model(model,"customer_churn") Version: v1 v2 v3 6. Explain MLflow Components Tracking Projects Models Registry Example with mlflow.start_run(): mlflow.log_param("lr",0.01) mlflow.log_metric("accuracy",0.95) Interview Follow-up: Why MLflow? Answer: Track experiments, compare runs, register models, and manage deployments. 7. What is Data Drift? Answer Input data distribution changes over time. Example: Training: Age: 20-40 Production: Age: 50-80 Model performance drops. 8. What is Concept Drift? Answer Relationship between features and target changes. Example: Before Covid: Online spending low After Covid: Online spending high Same inputs but different outcomes. 9. How do you detect drift? Methods PSI Population Stability Index KL Divergence Wasserstein Distance KS Test Example: from scipy.stats import ks_2samp ks_2samp(train_data,prod_data) 10. How do you monitor models? Metrics Business Metrics Revenue Conversion CTR Model Metrics Accuracy Precision Recall F1 System Metrics CPU Memory Latency Throughput Tools: Prometheus Grafana ELK 11. Explain Model Retraining Pipeline New Data ↓ Validation ↓ Feature Engineering ↓ Training ↓ Evaluation ↓ Deployment Trigger: Weekly Monthly Drift detection 12. What is Feature Store? Answer Central repository for ML features. Benefits: Reuse features Consistency Online serving Offline training Tools: Feast Tecton 13. Explain Docker in MLOps Dockerfile FROM python:3.11 COPY . /app WORKDIR /app RUN pip install -r requirements.txt CMD ["python","app.py"] Benefits: Portability Reproducibility 14. Difference between Docker and Kubernetes? Docker Kubernetes Containerization Orchestration Single container Multiple containers Packaging Scaling 15. How do you deploy ML models on Kubernetes? Steps Build Docker Image ↓ Push to Registry ↓ Create Deployment ↓ Create Service ↓ Expose API Deployment: apiVersion: apps/v1 kind: Deployment metadata: name: model spec: replicas: 3 16. What is Canary Deployment? Answer Deploy new model to small percentage of users. 90% → Old Model 10% → New Model If successful: 100% New Model 17. Blue-Green Deployment? Answer Blue = Production Green = New Version Switch traffic instantly. Benefits: Zero downtime Easy rollback 18. How would you deploy a model with zero downtime? Answer: Kubernetes Rolling Update Blue-Green Deployment Canary Deployment 19. How do you handle large datasets? Techniques Spark Partitioning Parallel Processing Example: df.repartition(100) 20. What if training data is 1 TB? Answer Never load into memory. Use: Spark Batch Processing Distributed Training 21. What if model training takes 12 hours? Answer Options: Distributed Training GPU Hyperparameter Optimization Incremental Learning 22. Explain Kubernetes HPA Horizontal Pod Autoscaler CPU > 70% Scale: 3 Pods → 10 Pods Example: kubectl autoscale deployment model 23. What happens if a pod crashes? Answer Kubernetes automatically recreates it. Controller: ReplicaSet maintains desired state. 24. How do you secure ML APIs? Methods Authentication JWT OAuth Encryption HTTPS TLS Secrets Kubernetes Secrets AWS Secrets Manager 25. Explain FastAPI deployment from fastapi import FastAPI app = FastAPI() @app.get("/") def predict(): return {"prediction":1} Run: uvicorn app:app 26. What is Model Explainability? Techniques SHAP LIME Feature Importance Example: import shap Shows why prediction happened. 27. Scenario: Accuracy dropped from 95% to 70% Approach Check: Data Drift Concept Drift Data Quality Pipeline Failures Feature Changes Then: Retrain Validate Redeploy 28. Scenario: Prediction API latency increased Investigate CPU Memory Network Database Model Size Optimization: Caching Autoscaling Quantization GPU inference 29. Scenario: Production model gives different results than training Root Causes Feature mismatch Data preprocessing mismatch Version mismatch Missing transformations Solution: Use same pipeline object. 30. Design an End-to-End MLOps Architecture Data Sources ↓ Kafka ↓ Spark ↓ Feature Store ↓ Training Pipeline ↓ MLflow ↓ Model Registry ↓ Docker ↓ Kubernetes ↓ FastAPI ↓ Prometheus/Grafana ↓ Retraining Pipeline Advanced EPAM Follow-up Questions Why use Kubernetes instead of ECS? Multi-cloud support Better ecosystem Advanced autoscaling Service mesh support Why MLflow over DVC? Experiment tracking Model registry Deployment integration How

      Interview questions [1]

      Question 1

      2. What is MLOps? Answer MLOps is the practice of applying DevOps principles to Machine Learning systems. It covers: Data Management Model Development Model Versioning Deployment Monitoring Retraining Lifecycle Data Collection ↓ Data Validation ↓ Feature Engineering ↓ Model Training ↓ Model Validation ↓ Deployment ↓ Monitoring ↓ Retraining 3. Difference between DevOps and MLOps? DevOps MLOps Focuses on application code Focuses on data + model + code CI/CD CI/CD/CT Version code Version code + data + models Functional testing Model testing Performance monitoring Model drift monitoring 4. What is CI/CD/CT in MLOps? CI Continuous Integration Code Commit ↓ Unit Tests ↓ Build CD Continuous Delivery Build ↓ Deploy CT Continuous Training New Data ↓ Retrain Model ↓ Validate ↓ Deploy 5. How do you version ML models? Tools MLflow DVC S3 Git Example: import mlflow mlflow.sklearn.log_model(model,"customer_churn") Version: v1 v2 v3 6. Explain MLflow Components Tracking Projects Models Registry Example with mlflow.start_run(): mlflow.log_param("lr",0.01) mlflow.log_metric("accuracy",0.95) Interview Follow-up: Why MLflow? Answer: Track experiments, compare runs, register models, and manage deployments. 7. What is Data Drift? Answer Input data distribution changes over time. Example: Training: Age: 20-40 Production: Age: 50-80 Model performance drops. 8. What is Concept Drift? Answer Relationship between features and target changes. Example: Before Covid: Online spending low After Covid: Online spending high Same inputs but different outcomes. 9. How do you detect drift? Methods PSI Population Stability Index KL Divergence Wasserstein Distance KS Test Example: from scipy.stats import ks_2samp ks_2samp(train_data,prod_data) 10. How do you monitor models? Metrics Business Metrics Revenue Conversion CTR Model Metrics Accuracy Precision Recall F1 System Metrics CPU Memory Latency Throughput Tools: Prometheus Grafana ELK 11. Explain Model Retraining Pipeline New Data ↓ Validation ↓ Feature Engineering ↓ Training ↓ Evaluation ↓ Deployment Trigger: Weekly Monthly Drift detection 12. What is Feature Store? Answer Central repository for ML features. Benefits: Reuse features Consistency Online serving Offline training Tools: Feast Tecton 13. Explain Docker in MLOps Dockerfile FROM python:3.11 COPY . /app WORKDIR /app RUN pip install -r requirements.txt CMD ["python","app.py"] Benefits: Portability Reproducibility 14. Difference between Docker and Kubernetes? Docker Kubernetes Containerization Orchestration Single container Multiple containers Packaging Scaling 15. How do you deploy ML models on Kubernetes? Steps Build Docker Image ↓ Push to Registry ↓ Create Deployment ↓ Create Service ↓ Expose API Deployment: apiVersion: apps/v1 kind: Deployment metadata: name: model spec: replicas: 3 16. What is Canary Deployment? Answer Deploy new model to small percentage of users. 90% → Old Model 10% → New Model If successful: 100% New Model 17. Blue-Green Deployment? Answer Blue = Production Green = New Version Switch traffic instantly. Benefits: Zero downtime Easy rollback 18. How would you deploy a model with zero downtime? Answer: Kubernetes Rolling Update Blue-Green Deployment Canary Deployment 19. How do you handle large datasets? Techniques Spark Partitioning Parallel Processing Example: df.repartition(100) 20. What if training data is 1 TB? Answer Never load into memory. Use: Spark Batch Processing Distributed Training 21. What if model training takes 12 hours? Answer Options: Distributed Training GPU Hyperparameter Optimization Incremental Learning 22. Explain Kubernetes HPA Horizontal Pod Autoscaler CPU > 70% Scale: 3 Pods → 10 Pods Example: kubectl autoscale deployment model 23. What happens if a pod crashes? Answer Kubernetes automatically recreates it. Controller: ReplicaSet maintains desired state. 24. How do you secure ML APIs? Methods Authentication JWT OAuth Encryption HTTPS TLS Secrets Kubernetes Secrets AWS Secrets Manager 25. Explain FastAPI deployment from fastapi import FastAPI app = FastAPI() @app.get("/") def predict(): return {"prediction":1} Run: uvicorn app:app 26. What is Model Explainability? Techniques SHAP LIME Feature Importance Example: import shap Shows why prediction happened. 27. Scenario: Accuracy dropped from 95% to 70% Approach Check: Data Drift Concept Drift Data Quality Pipeline Failures Feature Changes Then: Retrain Validate Redeploy 28. Scenario: Prediction API latency increased Investigate CPU Memory Network Database Model Size Optimization: Caching Autoscaling Quantization GPU inference 29. Scenario: Production model gives different results than training Root Causes Feature mismatch Data preprocessing mismatch Version mismatch Missing transformations Solution: Use same pipeline object. 30. Design an End-to-End MLOps Architecture Data Sources ↓ Kafka ↓ Spark ↓ Feature Store ↓ Training Pipeline ↓ MLflow ↓ Model Registry ↓ Docker ↓ Kubernetes ↓ FastAPI ↓ Prometheus/Grafana ↓ Retraining Pipeline Advanced EPAM Follow-up Questions Why use Kubernetes instead of ECS? Multi-cloud support Better ecosystem Advanced autoscaling Service mesh support Why MLflow over DVC? Experiment tracking Model registry Deployment integration How
      1 Answer

      Top companies for "Compensation and Benefits" near you

      avatar
      DataArt
      4.0★Compensation & Benefits
      avatar
      IBM
      3.6★Compensation & Benefits
      avatar
      Amadeus
      3.7★Compensation & Benefits
      avatar
      Proxify
      3.8★Compensation & Benefits