Primary informations
Username: zaghir
Cluster: baobab
Description
Hello everyone,
I’m currently working with HPC clusters and I’m facing some challenges with running Ollama (https://ollama.com/), which involves instantiating an AI model (e.g. Llama3) and performing inferences using GPUs.
The primary issue is that the installation of Ollama requires both sudo privileges and service-related permissions with systemctl. Obtaining such permissions is typically problematic (Adrien from the HPC team tried to find a way to install it through Easybuild, but in vain).
We attempted to use Apptainer as a workaround, but we encountered issues related to the TMPDIR configuration. Our TMPDIR is set to /scratch instead of /tmp, and Ollama does not seem to function properly with this setup.
Apptainer> ollama serve &
[1] 2497999
Apptainer> Couldn't find '/home/users/z/zaghir/.ollama/id_ed25519'. Generating new private key.
Your new public key is:
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIGQnzA7pbOXXkts3USdarkYZgJ862sfht0v/NxRBiIfH
2024/07/25 10:27:09 routes.go:1100: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/users/z/zaghir/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-07-25T10:27:09.612+02:00 level=INFO source=images.go:784 msg="total blobs: 0"
time=2024-07-25T10:27:09.613+02:00 level=INFO source=images.go:791 msg="total unused blobs removed: 0"
time=2024-07-25T10:27:09.617+02:00 level=INFO source=routes.go:1147 msg="Listening on 0.0.0.0:11434 (version 0.2.8)"
Error: unable to initialize llm library failed to generate tmp dir: stat /scratch: no such file or directory
Has anyone successfully run Ollama on HPC clusters, or does anyone have suggestions for other efficient implementations that might be more compatible with HPC environments?
For those unfamiliar with Ollama, it works the following way:
Step 1- instantiate the serv (if Ollama is installed):
ollama serve
or daemon version: ollama serve &
Step 2- Instantiate and make Llama3 expect prompt through CLI:
ollama run llama3
(I usually do Step 2 with python integration using GitHub - ollama/ollama-python: Ollama Python library, which is already installed. But the Step 1 still requires Ollama).
Any guidance or advice would be greatly appreciated.
Thank you!
EDIT: Added outputted error from the Apptainer solution.