Feasibility of Deploying Large Language Models Locally on Bamboo Cluster

Dear HPC Community,

I’m currently working on a research project involving large-scale legal text processing using court verdicts.

We are considering deploying a large language model (e.g., Qwen, DeepSeek, or a quantized version of LLaMA) locally on the Bamboo cluster to perform batch inference on a large corpus (over 2 million documents).

Before proceeding, I would like to ask:

Is it feasible to locally deploy a large model (e.g., 7B parameters, quantized) on the A100 GPU nodes in Bamboo?

Are there any storage, containerization (e.g., Singularity), or dependency limitations we should be aware of when preparing the environment?

Has anyone in the community successfully run similar workloads (LLMs, inference pipelines) on Bamboo?

Would the HPC team recommend any specific best practices for running such large models efficiently in this cluster?

Any advice, shared experience, or technical pointers would be highly appreciated!

Hanzhang

Hello,

What tools are you planning to use to deploy your LLM?
Here is a post / tutorial on how to deploy Llama using Ollama on Baobab: Seeking Help with Running Ollama on HPC Clusters

I assume the methodology should be similar with Bamboo.

Best regards,
Jamil