Dear HPC Community,
I’m currently working on a research project involving large-scale legal text processing using court verdicts.
We are considering deploying a large language model (e.g., Qwen, DeepSeek, or a quantized version of LLaMA) locally on the Bamboo cluster to perform batch inference on a large corpus (over 2 million documents).
Before proceeding, I would like to ask:
Is it feasible to locally deploy a large model (e.g., 7B parameters, quantized) on the A100 GPU nodes in Bamboo?
Are there any storage, containerization (e.g., Singularity), or dependency limitations we should be aware of when preparing the environment?
Has anyone in the community successfully run similar workloads (LLMs, inference pipelines) on Bamboo?
Would the HPC team recommend any specific best practices for running such large models efficiently in this cluster?
Any advice, shared experience, or technical pointers would be highly appreciated!
Hanzhang