Author: Alex Vompe
Date: 12/5/25
Step 1.1: login to roar collab. Request an account if you don’t have one.
ssh [user ID]@submit.hpc.psu.edu
salloc --partition=sla-prio --ntasks=1 --cpus-per-task=12 --mem=128G
Step 1.2: navigate to your work directory
cd $HOME/work
Step 1.3: install the required programs and make a conda environment for assembly and a conda environment for MAG analysis
module load anaconda
conda create -n assembly -c bioconda -c conda-forge fastp megahit bowtie2 samtools
conda activate assembly
Step 1.4: navigate to your scratch directory, make a directory for this workshop, and copy over the data
cd $HOME/scratch
mkdir DAWG_MGS_2
cd DAWG_MGS_2
cp -r /scratch/azv5523/DAWG_MGS_2/reads ./
mkdir megahit
cp /scratch/azv5523/DAWG_MGS_2/megahit/contigs.fa ./megahit/
mkdir alignments
cp /scratch/azv5523/DAWG_MGS_2/alignments/sample1.sam ./alignments/
cp -r /scratch/azv5523/DAWG_MGS_2/CheckM2_database/ ./
Step 1.5: QC the reads with fastp
mkdir fastp_qc
fastp --in1 reads/SRR7595115_1.fastq.gz --in2 reads/SRR7595115_2.fastq.gz --out1 fastp_qc/SRR7595115_1.fastq.gz --out2 fastp_qc/SRR7595115_2.fastq.gz --trim_poly_g --html fastp_qc/SRR7595115_report.html --json fastp_qc/SRR7595115_report.json --thread 12
megahit -1 fastp_qc/SRR7595115_1.fastq.gz -2 fastp_qc/SRR7595115_2.fastq.gz -o ./megahit
Note: we will NOT run this during the workshop, as it takes hours to days. Use the “contigs.fa” file in the megahit directory that I assembled for you.
I recommend running SPAdes for better quality assemblies, but this takes even longer:
spades.py -1 left.fastq.gz -2 right.fastq.gz -o output_folder --meta
Add metabat2 to our environment:
conda install -c bioconda/label/cf201901 metabat2
bowtie2-build megahit/contigs.fa contigs_index This will
take ~15 mins. Run this only if there is plenty of
time, and want to see the bowtie2 index format. No need to run
this if not, as we already provide the SAM file.
bowtie2 -x contigs_index -1 reads/SRR7595115_1.fastq.gz -2 reads/SRR7595115_2.fastq.gz -S alignments/sample1.sam -p 12##DO
NOT RUN THIS (takes hours, use the SAM file we provided for the
commands below).
samtools view -bS alignments/sample1.sam | samtools sort -o alignments/sample1.bam --threads 12
samtools index alignments/sample1.bam
jgi_summarize_bam_contig_depths --outputDepth output_depth.txt alignments/sample1.bam
mkdir -p metabat2_bins
metabat2 -i megahit/contigs.fa -o metabat2_bins/bin -a output_depth.txt --numThreads 12 --seed 42
conda deactivate
mamba create -n checkm2 -c bioconda -c conda-forge checkm2
conda activate checkm2
mkdir checkm2
checkm2 predict --threads 30 --input ./metabat2_bins/ --output-directory ./checkm2/ -x fa --database_path ./CheckM2_database/uniref100.KO.1.dmnd
Bin 11 seems to be the highest quality. Let’s download it and upload to Proksee.
Run the Bakta annotation, and add it as a layer to the map.
7.1. Install Anvi’o to your working directory using a docker image:
singularity shell -B </path/to/your/datafolder>:/data --pwd /data /scratch/pmt5304/dawg/fa25/w3/anvio_7.sif
7.2. Navigate and execute either the Anvi’o phylogenomics or pangenomics workflow, depending on your needs (available here):
Phylogenomics: https://merenlab.org/2017/06/07/phylogenomics/
Pangenomics: https://merenlab.org/2016/11/08/pangenomics-v2/
STOP just before running
anvi-interactive, and exit the apptainer (type “exit” then
hit <enter>).
7.3. Tunnel a port # of your choice for running the interactive analysis in a web browser (Chrome works best):
ssh -L 12345:localhost:12345 <userid>@submit01.hpc.psu.edu
cd $HOME/scratch/DAWG_MGS_2
nano dummy_job.sh
Enter the following script and save + exit using ctrl+o <enter>, ctrl+x <enter>:
#!/bin/bash
#SBATCH --job-name=anvio_server. # Job name
#SBATCH --nodes=1 # Number of nodes
#SBATCH --ntasks=1 # Number of tasks (processes)
#SBATCH --mem=10G # Memory per node
#SBATCH --time=4:00:00 # Wall-clock time limit (HH:MM:SS)
# Commands to be executed
sleep 4h
Run the script:
sbatch dummy_job.sh
Run squeue -u $USERto find the node the job is running
on (e.g. p-sc-2369).
Log in to the node:
ssh -L 12345:localhost:12345 p-sc-2369
Now, you are ready to run anvi-interactive. Set the port # to the one you chose (12345 in this case).