Running Cd Hit is slow #144

mdMoinuddinSheam · 2024-03-24T17:34:44Z

Hi,

I am running cd hit on HPC and I have around 3.5 million sequences. The code I am using is:

`#!/bin/bash
#SBATCH --job-name="cd_copy2"
#SBATCH --partition=Orion
#SBATCH --nodes=3
#SBATCH --ntasks-per-node=32
#SBATCH --time=30-00:00:00

module load cd-hit/4.8.1

cd-hit-est -i /projects/luo_lab/ncldv/cd_hit/aloha1_megahit_copy_2/aloha1_megahit.fa.gz
-o /projects/luo_lab/ncldv/cd_hit/aloha1_megahit_copy_2/aloha1/aloha1_megahit.fa.gz
-c 0.95 -n 8 -M 160000 -s 0.9 -aS 0 `

However, it is taking around 30 days to cluster this. Is there any way to make it faster? Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running Cd Hit is slow #144

Running Cd Hit is slow #144

mdMoinuddinSheam commented Mar 24, 2024

Running Cd Hit is slow #144

Running Cd Hit is slow #144

Comments

mdMoinuddinSheam commented Mar 24, 2024