← Back to Portfolio

Bulk RNA seq pipeline

Snakemake-based

BioinformaticsPipelineAutomationPython
Bulk RNA seq pipeline

A Snakemake-based pipeline for preprocessing, quality control, alignment, quantification, and transformation of bulk RNA-seq data. Designed for flexibility and reproducibility, it supports both single-end and paired-end reads and produces normalized count matrices ready for downstream analysis.

Features


Requirements


Directory Structure

.
├── config.yaml               # Set sample IDs, paths, paired-end flag, etc.
├── Snakefile                 # Main Snakemake workflow
├── aggregate_counts.py       # Aggregates counts and generates TPM, log2TPM, VST matrices
├── input/
│   └── sample_ids.txt        # List of SRR IDs or sample names
├── raw_data/                 # Input FASTQ files
├── trimmed_data/             # Output from Fastp
├── QC/
│   ├── raw/                  # FastQC + MultiQC for raw data
│   └── trimmed/              # FastQC + MultiQC for trimmed data
├── alignment/                # Aligned BAM files (sorted)
├── counts/                   # Count matrices (raw, TPM, log2TPM, VST)
└── ref_genomes/
    └── hg38/                 # Contains HISAT2 index and GTF file