NTHRYS
PDF

RNA-Seq & Microarray Data Analysis Training | STAR/Salmon, DESeq2/limma, Batch & Enrichment

Master RNA-Seq & microarray pipelines: QC, alignment/quantification, DE analysis, batch correction, and pathway enrichment with reproducible reports.

NTHRYS >> Services >> Academic Services >> Training Programs >> Bioinformatics Training >> Genomics, Transcriptomics, Molecular Systems

RNA-Seq & Microarray Analysis — Hands-on

Gain expertise in RNA-Seq and microarray analysis—QC, alignment/quantification, differential expression, batch correction, and pathway enrichment.

RNA-Seq & Microarray Data Analysis
Help Desk · WhatsApp
Session 1
RNA-Seq Workflow Data Analysis (Model + Non-Model Organisms)

Master the complete RNA-Seq transcriptomics workflow from raw reads to biological interpretation, with two execution pathways for model organisms (reference alignment and alignment-free quantification) and a dedicated pipeline for non-model organisms using de novo transcriptome assembly.

  • Workflow Map (Decision Tree)
  • Start: FASTQ FastQC Trimmomatic Branch: Model vs Non-Model
    Model Organism (Reference available)
    HISAT2 featureCounts DESeq2 / edgeR clusterProfiler
    Alternative (alignment-free) : Salmon / KallistoDESeq2 / edgeR
    Non-Model Organism (No reference)
    Trinity CD-HIT Trinity-stats BUSCO Kallisto
    • 1) Workflow Foundations & Project Setup — convert biological question to an analysis design you can defend
      • Study design: condition labels, biological replicates, contrasts, pairing, covariates
      • Metadata: sample sheet schema (sample_id, condition, batch, lane, library_type, strandedness)
      • Reference selection: genome build + GTF/GFF, transcriptome FASTA (consistency matters)
      • Folder structure: raw/trimmed/aligned/counts/qc/results/reports + naming conventions
      • Reproducibility: tool versions, parameters, run logs, checksums (MD5/SHA)
      Design Matrix Metadata Sheet Runbook Template
    • 2) Raw Read Quality Control (FASTQ) — identify problems before you “waste compute” downstream
      • Phred scores, per-base quality, per-sequence quality, N content
      • Adapter contamination, overrepresented sequences, k-mer bias
      • GC distribution shifts; duplication levels; read length distribution
      • MultiQC aggregation: multi-sample dashboards for fast comparisons
      FastQC MultiQC seqtk
      QC Gate: document issues + decide trimming/filters before moving forward.
    • 3) Trimming & Read Cleanup — remove technical artifacts while preserving biological signal
      • Adapter trimming strategies; quality trimming (sliding window vs fixed cutoffs)
      • Minimum length thresholds; handling orphan reads in paired-end
      • Post-trim QC to verify improvement (before vs after)
      • Common mistakes: over-trimming, inconsistent paired-end outputs, wrong adapter set
      Trimmomatic Cutadapt fastp
      QC Gate: adapter content reduced + acceptable read retention.
    • 4A) Model Organism Lane — Reference-Based Alignment
      • Genome indexing and splice-aware alignment concepts
      • HISAT2 alignment: parameters that matter (paired-end, sensitivity, splicing)
      • BAM processing: sorting/indexing + alignment summary interpretation
      • Mapping QC: alignment rate, multi-mapping, insert size trends, strandness checks
      HISAT2 samtools RSeQC (QC)
      QC Gate: mapping rate + strandness consistent with library prep.
    • 4B) Model Organism Lane — Alignment-Free Quantification (Alternative Route)
      • When to choose pseudoalignment (speed, transcript-level quantification)
      • Transcriptome indexing; abundance estimation; bootstraps (Kallisto)
      • Outputs: TPM, estimated counts; merging across samples
      • Common pitfalls: wrong transcriptome version, annotation mismatch, inflated isoforms
      Salmon Kallisto tximport
      QC Gate: assignment rate acceptable + no obvious reference mismatch.
    • 5) Quantification for DE — Gene Counts (featureCounts)
      • Gene model concepts: exons, genes, transcripts; how GTF drives counting
      • Strandedness and feature type selection; paired-end counting choices
      • Count matrix construction and hygiene (sample naming, gene ID consistency)
      • Exploratory analysis: sample clustering, PCA/MDS, outlier detection
      featureCounts GTF/GFF Count Matrix
      QC Gate: replicates cluster together; obvious outliers explained.
    • 6) Differential Expression (DE) — statistically correct results you can interpret
      • Normalization concepts; dispersion; model design; contrasts
      • Multiple testing correction (FDR) and effect size interpretation (log2FC)
      • Plots: MA, volcano, heatmaps, PCA/MDS; result filtering best practices
      • Result sanity checks: known markers, directionality, replicate consistency
      DESeq2 edgeR limma-voom
      QC Gate: DE results stable under reasonable thresholds; plots consistent.
    • 7) Functional Interpretation & Enrichment (clusterProfiler)
      • Gene ID mapping (ENSEMBL/ENTREZ) and gene universe selection
      • ORA vs GSEA — when to use each; common biases and controls
      • GO/KEGG/Reactome enrichment; dotplots and reporting structure
      • How to write an interpretation narrative for results + limitations
      clusterProfiler GO / KEGG / Reactome MSigDB
      QC Gate: correct gene universe + correct IDs + biologically plausible themes.
    • 8) Non-Model Organism Lane — De novo Transcriptome Pipeline
      • Trinity assembly: compute planning, memory/CPU considerations, output structure
      • De-redundancy: CD-HIT clustering to reduce transcript inflation
      • Assembly assessment: Trinity-stats (N50, contig distribution) , read representation
      • BUSCO: completeness and duplication interpretation; what “good” means
      • Quantification: Kallisto quant on assembled transcriptome → DE workflow
      Trinity CD-HIT Trinity-stats BUSCO Kallisto
      QC Gate: BUSCO completeness acceptable; duplication explained; stable quant.
    • Deliverables (What You Take Home)
      • QC pack: FastQC + MultiQC reports, trimming decisions, acceptance notes
      • Processed data: trimmed FASTQs, aligned BAMs (model lane) , count/TPM matrices
      • DE pack: DE tables, MA/volcano/heatmaps, sample clustering (PCA/MDS)
      • Enrichment pack: GO/KEGG/Reactome tables + dotplots + interpretation template
      • Non-model pack (Tier 3 emphasis) : assembled transcriptome + CD-HIT + Trinity-stats + BUSCO report
      • Reproducibility pack: runbook (commands + parameters + versions) + folder template
    • What’s Included in Each Tier
    • Coverage Tier 1 (Basic) Tier 2 (Pro) Tier 3 (Enterprise)
      Model Lane (Alignment) Full practical Full practical + QC gates + troubleshooting Full practical + team SOP + audit checklist
      Alignment-Free Lane Demo + interpretation Full practical + tximport + comparisons Full practical + standardization for teams
      Non-Model (De novo) Concept overview Guided demo + QC interpretation Full practical: Trinity → CD-HIT → BUSCO → Kallisto → DE
      Capstone / Report Templates + guided structure Publication-ready plots + narrative framework Team delivery pack + one dataset review
    Session 2
    Fee: Rs 16300
    QC & Preprocessing
    • FASTQ structure, quality scores & initial QC (MultiQC)
    • FastQC MultiQC seqtk
    • Adapter/quality trimming & contamination screening
    • Cutadapt fastp FastQ Screen
    • Design matrices, metadata & sample randomization
    • Theory
    Session 3
    Fee: Rs 18400
    Alignment/Quant & Differential Expression
    • Splice-aware alignment & count generation
    • STAR HISAT2 featureCounts
    • Alignment-free quantification & transcript-level summaries
    • Salmon Kallisto tximport
    • Differential expression pipelines & visualization
    • DESeq2 edgeR limma-voom
    Session 4
    Fee: Rs 31200
    Batch, Splicing & Microarrays
    • Batch correction, QC metrics & exploratory analysis
    • sva/ComBat PCA/UMAP RSeQC
    • Differential splicing & isoform usage
    • DEXSeq rMATS SUPPA2
    • Microarray preprocessing & DE analysis (RMA/limma)
    • oligo/affy RMA limma
    Session 4
    Fee: Rs 44000
    Mini Capstone: Omics Report & Enrichment
    • GSEA/ORA & pathway interpretation (KEGG/Reactome/GO)
    • clusterProfiler MSigDB Enrichr
    • End-to-end automation with provenance & dashboards
    • Snakemake R Markdown/Quarto pandas
    • Publication-ready figures: MA, volcano, heatmaps, PCA
    • ggplot2 ComplexHeatmap EnhancedVolcano