NVIDIA Interview Question

Tell me how you can conserve GPU memory when running inference on LLMs.