Dear HPC,
My username is sainij. I have around 1TB of microbial datasets collected from the gut of participants associated with Alzheimer disease. This dataset is generated from Shotgun DNA sequencing technology, and I would like to make effort to process these samples together. However, this step is intensive and requires crazy amount of RAM because of the size of the datasets.
My initial strategy if to use tools (example MEGAHIT) which had low RAM requirements. I initially used 500GB of ram with 8 CPUs per task using shared-bigmem, however, I still had out of memory issue. I tried to use public-bigmem to give higher RAM, but it seems to be down?
I kindly ask you for your advice on the matter. Thank you for your assistance.
Kind regards,
Jaspreet
Following is the script I am working;
#!/bin/bash
#SBATCH -e M-%A-error
#SBATCH -o M-%A-out
#SBATCH --nodes 1
#SBATCH --ntasks 1
#SBATCH --cpus-per-task 16
#SBATCH --partition=public-bigmem
#SBATCH --mem 995GB
#SBATCH --mail-type=ALL
#SBATCH --time=48:00:00
#SBATCH --mail-user=jaspreet.saini@unige.ch
megahit -1 Gmad_cat_R1.BBnorm.fastq.gz -2 Gmad_cat_R2.BBnorm.fastq.gz -o megahit_output_gmad -t 16 --continue