NVIDIA Interview Question

How do you design a real-time predictor on the GPU server failures?