Online Internships

CAT-INDEX Online · In-Silico Only

Online Internship Categories — In-Silico Focus

Click a category to jump to its section.

C-001 — Bioinformatics Pipelines (NGS/Omics)
C-002 — Clinical Genomics & Variant Interpretation
C-003 — Single-Cell & Spatial Transcriptomics Analysis
C-004 — Metagenomics & Microbiome Analytics
C-005 — Proteomics/Metabolomics (LC-MS/MS) Data Analysis
C-006 — Systems Biology, Networks & Pathway Interactomics
C-007 — Computational Chemistry & Cheminformatics (QSAR/ADMET)
C-008 — Structural Biology, Docking & Molecular Dynamics
C-009 — Pharmacometrics & PK/PD Modeling
C-010 — In-Silico Toxicology & Safety Prediction
C-011 — AI/ML for Biosciences (LLMs, CV, Tabular)
C-012 — BioNLP & Literature Mining (Text Mining/NLP)
C-013 — Health Informatics, RWE & Epidemiological Modeling
C-014 — Biostatistics, Experimental Design & Reproducibility
C-015 — Scientific Dashboards & Data Visualization (R/Python/BI)
C-016 — Bio-Data Engineering (ETL, Pipelines, Cloud/HPC)
C-017 — Knowledge Graphs & Ontologies for Biomedicine
C-018 — Statistical Genetics, GWAS & Polygenic Risk
C-019 — Synthetic Biology (In-Silico Design & CAD)
C-020 — Biomedical Signal/Image Analysis (MRI/Pathology/CV)
C-021 — LIMS, ELN & Lab Automation Software (Simulation)
C-022 — Digital QA/QC, Compliance Analytics & e-Validation
C-023 — Oncology Informatics & Biomarker Analytics
C-024 — Agri/Plant Bioinformatics & Crop Informatics

In-Silico · Online NGS · Multi-Omics Pipelines · Cloud/HPC

Check below focused areas and choose one to apply

NGS QC: FastQC / MultiQC / Adapter Trimming
Read Alignment (DNA) : BWA-MEM / Bowtie2
Read Alignment (RNA) : STAR / HISAT2
Reference Indexing & Genome Resources
WGS Variant Calling: GATK Best Practices
WES Variant Calling & Panel Workflows
DeepVariant & DRAGEN-style Pipelines (concepts)
Variant Filtering, Recalibration & Quality Gates
Structural Variants: Manta / DELLY
Copy Number Variation: CNVkit / GATK-gCNV
Loss of Heterozygosity & Purity/Plody Basics
RNA-seq Quantification: Salmon / Kallisto
DGE Analysis: DESeq2 / edgeR / limma-voom
Transcript Assembly & Isoforms: StringTie
Alternative Splicing Analysis: rMATS
ChIP-seq Peak Calling & QC: MACS2
ATAC-seq Footprinting & Motifs: HOMER
DNA Methylation/WGBS: Bismark Pipelines
Bisulfite QC & Differential Methylation
Single-Cell RNA-seq: Seurat Basics
Single-Cell RNA-seq: Scanpy Basics
Single-Cell QC/Integration/Batch Correction
Spatial Transcriptomics: Space Ranger → Seurat
Metagenomics 16S: QIIME2 End-to-End
Shotgun Metagenomics: HUMAnN / MetaPhlAn
De-novo Assembly: SPAdes / MEGAHIT
Genome Annotation: Prokka / PGAP
Functional Enrichment: GSEA / Enrichr / clusterProfiler
Gene Set Curation & Pathway Databases
Multi-Omics Integration: MOFA / mixOmics
Network Biology & Co-expression Modules
Clinical Variant Interpretation: ClinVar / COSMIC
Annotation & Databases: VEP / ANNOVAR
IGV / UCSC Track Hubs / Visualization
Reproducible Reports: Quarto / R Markdown
Data Versioning & FAIR: DVC / Metadata
Workflow Engines: Snakemake Fundamentals
Workflow Engines: Nextflow Fundamentals
Containerization: Conda / Docker / Singularity
HPC Scheduling: SLURM Job Arrays & Logs
Cloud Basics: AWS/GCP/Azure for Omics
File Orchestration: NF-Tower / Snakemake Reports
Quality Dashboards & MultiQC Custom Panels
Sample Sheet Design & Cohort Metadata
Batch Effects & Confounders Handling
Benchmarking & Synthetic Controls
Security & Compliance Considerations
Regulatory Data Submission (ENA/GEO/SRA)
Packaging & Sharing Pipelines (Git/GitHub)
Capstone: End-to-End Omics Pipeline Build

In-Silico · Online Clinical Genomics Variant Interpretation

Check below focused areas and choose one to apply

Clinical NGS QC: coverage, uniformity, duplication
Target capture/WES/WGS assay characteristics
Read alignment & recalibration (BWA-GATK)
Germline SNV/Indel calling (GATK Best Practices)
Somatic SNV/Indel calling (Mutect2/VarDict)
Low-allele fraction variant detection
Structural variants (Manta/DELLY/LUMPY)
Copy-number variants (ExomeDepth/gCNV)
Mitochondrial variants & heteroplasmy
Phasing & compound heterozygosity
Trio analysis & inheritance models
Repeat expansions & STR detection
RNA-seq for splicing/allele-specific expression
Splice impact prediction & validation routes
Variant normalization & left-alignment
ClinVar/OMIM/HGMD interrogation (concepts)
Population frequencies (gnomAD) & sub-pops
In-silico predictors (REVEL, CADD, SpliceAI)
Gene-disease validity & constraint metrics
ACMG/AMP classification framework (germline)
ACMG evidence codes: PM/PP/PS/BP/BS
Somatic guidelines (AMP/ASCO/CAP tiers)
Actionability (OncoKB/CIViC/COSMIC)
LOH, TMB, MSI pipelines (clinical context)
Copy-number driven biomarkers (HER2, MET, etc.)
Fusion detection (STAR-Fusion/Arriba) overview
Clinical reporting templates & phrasing
VUS management & reclassification policy
Secondary/incidental findings (ACMG SF v3)
Carrier screening panel curation
Pharmacogenomics (CPIC, PharmGKB)
Newborn screening & rare disease workflows
Mendelian gene panels (virtual panels)
Gene curation with ClinGen SOPs
Reference transcripts & MANE Select
HGVS nomenclature & transcript selection
Data traceability, audit & LIMS linkage
Verification/validation (CLSI, CAP) concepts
QC dashboards & run acceptance criteria
Proficiency testing & inter-lab concordance
Ethical/legal: consent, privacy, reporting scope
Regulatory submissions & documentation
Cloud/HPC in clinical settings (guardrails)
Automated evidence collection (VEP/ANNOVAR)
Knowledge base building & evidence tagging
Variant database hygiene & versioning
Family-based reanalysis strategy
Tumor-normal vs tumor-only trade-offs
Clinical MTB case write-ups
Capstone: end-to-end clinical interpretation

In-Silico · Online Single-Cell · Spatial scRNA-seq · STomics

Check below focused areas and choose one to apply

Single-cell experimental design & sample types
Cell/nuclei isolation & quality considerations (concepts)
Unique molecular identifiers (UMIs) & barcodes
Library types: 3’/5’ tag, full-length, plate vs droplet
Raw data structure: FASTQ + feature-barcode matrices
Initial QC: sequencing depth & read structure checks
Cell-level QC: library size, feature counts, mito/ribo%
Empty droplets & ambient RNA (SoupX-style concepts)
Doublet detection strategies (DoubletFinder/scrublet)
Normalization strategies (log, sctransform)
Feature selection & HVG detection
Dimensionality reduction: PCA, UMAP, t-SNE
Graph construction & clustering (Louvain/Leiden)
Cluster annotation with canonical markers
Differential expression across clusters/conditions
Pseudobulk aggregation for robust testing
Trajectory inference & pseudotime (Monocle/Slingshot)
RNA velocity (scVelo concepts)
Cell–cell communication analysis (CellChat/NicheNet)
Integration of multiple batches/experiments
Reference-based label transfer & mapping
CITE-seq & multi-modal data (RNA+ADT)
scATAC-seq QC & peak calling overview
Linking chromatin accessibility to gene expression
Regulon analysis (SCENIC-style concepts)
Single-cell multi-omics integration strategies
Data subsetting & re-clustering workflows
Handling large datasets & sparse matrices
Metadata management & experimental covariates
Spatial technologies overview (Visium, MERFISH, etc.)
Tissue image / spot alignment & QC
Spatial normalization & smoothing concepts
Spatially variable gene detection
Deconvolution of spots into cell-type mixtures
Spatial clustering & neighborhood analysis
Colocalization & ligand–receptor patterns in space
Integration of single-cell and spatial data
Building spatial maps of cell states
Creating marker panels & signatures
Benchmarking pipelines with public datasets
Report-ready figures for manuscripts
Reproducible single-cell workflows (best practices)
Project structure & versioning for scRNA-seq
Sharing count matrices & metadata (FAIR)
Ethical handling of clinical single-cell data
Automation with Snakemake/Nextflow (high level)
Cloud/HPC strategies for large scRNA-seq
Documentation & notebooks for reviewers
Quality dashboards & interactive exploration (Shiny/Dash)
Capstone: full single-cell + spatial analysis report

In-Silico · Online Metagenomics Microbiome Analytics

Check below focused areas and choose one to apply

Microbiome study design & sample types
DNA extraction biases & quality checks (concepts)
16S/18S/ITS amplicon vs shotgun metagenomics
Library prep considerations & sequencing depth
Raw reads QC: quality, adapters, contaminants
Host read removal & decontamination strategies
Amplicon processing with QIIME2 end-to-end
DADA2/UNOISE pipelines for ASV inference
OTUs vs ASVs and choice of reference DB
Taxonomic assignment (SILVA, Greengenes, GTDB)
Alpha diversity metrics & rarefaction curves
Beta diversity, distance metrics & ordination
PERMANOVA and other community-level tests
Differential abundance (DESeq2/ANCOM-BC concepts)
Compositionality & appropriate normalisation
Contaminant detection (decontam-style workflows)
Longitudinal & repeated-measures microbiome data
Shotgun taxonomic profiling (Kraken2/Bracken)
Shotgun taxonomic profiling (MetaPhlAn)
Functional profiling of pathways (HUMAnN)
Resistome profiling (ARG databases concepts)
Viruses, phages & virome analysis basics
Mycobiome & fungal community profiling basics
Metagenome assembly (MEGAHIT/SPAdes overview)
Binning metagenome-assembled genomes (MAGs)
Bin refinement, de-duplication & quality checks
CheckM/GTDB-Tk for MAG quality & taxonomy
Pangenome & strain-resolved analysis (concepts)
Metatranscriptomics (RNA) workflows overview
Metaproteomics/metabolomics integration (high level)
Host–microbiome data integration
Microbiome & clinical covariates (metadata models)
Batch effects & technical confounders handling
Environment/soil/water microbiome specifics
Animal/human gut microbiome specifics
Food/fermentation microbiome analytics
AMR surveillance using metagenomics
Functional enrichment & pathway interpretation
Network analysis & co-occurrence patterns
Biomarker discovery & machine learning (intro)
Reproducible QIIME2 pipelines & artefacts
Reproducible shotgun pipelines (Nextflow/Snakemake)
Reporting standards & MIxS/FAIR principles
Submission to public repositories (ENA/SRA/EBI)
Dashboard-style visual summaries for stakeholders
Project structure, naming & version control
Privacy, ethics & human-associated microbiome data
Basic cloud/HPC strategies for large microbiomes
End-to-end case study: 16S workflow
End-to-end case study: shotgun microbiome study

In-Silico · Online Systems Biology Networks & Pathways

Check below focused areas and choose one to apply

Systems biology principles: networks, feedback, robustness
Pathways vs networks vs gene sets (conceptual distinctions)
Interaction types: PPI, TF–target, metabolic, signalling
Pathway databases: KEGG, Reactome, WikiPathways
Interaction databases: STRING, BioGRID, IntAct
Gene set collections: GO, MSigDB, custom panels
Building co-expression networks (correlation-based)
WGCNA basics: adjacency, TOM & scale-free topology
Module detection & dynamic tree cut
Module–trait relationships & hub gene identification
Regulatory network enrichment (TFs & motifs)
Gene regulatory network inference (ARACNe/GENIE3 concepts)
Bayesian, ODE & logical network models (overview)
Over-representation analysis (ORA) for pathways
GSEA-style enrichment on ranked gene lists
Gene set variation analysis (GSVA, ssGSEA)
Topology-aware methods (SPIA/CAMERA/ROAST concepts)
Combining multiple enrichment results coherently
Multi-omics integration in pathway context (MOFA/mixOmics concepts)
Integrating transcriptomics with proteomics in networks
Integrating metabolomics to metabolic networks
Differential network & network rewiring analysis
Module preservation across datasets/cohorts
Network propagation & diffusion concepts
Random walk with restart (RWR) on biological graphs
Disease gene prioritisation using interaction networks
Disease modules & subnetwork extraction
Network-based drug–target & drug–disease mapping
Drug repurposing via network proximity
Network-constrained feature selection & ML (intro)
Graph representations: adjacency matrices & edgelists
Graph metrics: degree, centrality, clustering, paths
Community detection & graph clustering (Louvain etc.)
Heterogeneous & bipartite biological networks
Single-cell networks & gene modules per cell type
Spatial transcriptomics neighborhood networks
Ligand–receptor signalling networks (CellChat/NicheNet concepts)
Host–pathogen interaction networks
Host–microbiome & multi-kingdom interaction graphs
Knowledge graphs & ontology-backed biomolecular networks
FAIR & reusable network objects (graphML, igraph, tidygraph)
Network visualization styles & layout selection
Cytoscape workflows: import, style, export
Scripting Cytoscape (RCy3/py2cytoscape concepts)
Automated reporting from network analyses (R Markdown/Quarto)
Reproducible pipelines for enrichment + network analysis
Evaluating database bias, coverage & confidence scores
Best practices for network figure design in publications
Project organisation, versioning & documentation
Capstone: end-to-end systems biology & network report

In-Silico · Online Comp Chem Cheminformatics · QSAR/ADMET

Check below focused areas and choose one to apply

Chemical structure formats & representations (SMILES, SDF, MOL)
2D structure drawing & curation best practices
Chemical databases & registries (ChEMBL, PubChem, internal)
Standardization, salt-stripping & tautomer handling
Physicochemical descriptors (logP, pKa, TPSA, HBD/HBA)
Topological & fragment-based molecular descriptors
3D descriptors & conformer generation concepts
Fingerprint types (MACCS, Morgan, Daylight-like)
Similarity searches & diversity analysis
Chemical space visualization (PCA, t-SNE, UMAP)
QSAR workflow design & dataset preparation
Endpoint curation & assay harmonization
Train/validation/test splits & time-split concepts
Class imbalance & resampling strategies
Linear QSAR (MLR, PLS) basics
Non-linear ML models (RF, GBM, SVM, NN) for QSAR
Regression vs classification QSAR models
Feature selection & dimensionality reduction
Model validation: internal & external metrics
Y-randomization & leakage checks
Applicability domain (AD) concepts & methods
Interpretability: feature importance & SHAP-style ideas
Virtual screening pipeline design
Hit triage and ranking strategies
Statistical vs physics-based scoring synergy
ADMET endpoints: solubility & permeability models
Metabolism & clearance (hepatic, renal) modelling
CYP450 & DDI risk in-silico predictions
Toxicity endpoints: hERG, hepatotoxicity, genotoxicity
Rule-based filters (Lipinski, Veber, PAINS)
Multi-parameter optimization (MPO) scores
Structure-based design overview (docking basics)
Binding site preparation & protonation states (concepts)
Ligand preparation & tautomers for docking
Docking workflow & scoring function concepts
Post-docking analysis & rescoring ideas
Free energy approaches (MM/GBSA, FEP) overview
Conformational analysis & flexibility
Molecular mechanics vs quantum mechanics basics
QM for pKa/tautomer & reactivity insight (high level)
Cheminformatics data engineering & pipelines
Compound ID management & tracking
Data quality, outliers & curation SOPs
Cheminformatics for library design & expansion
Scaffold analysis & series expansion
Integrating QSAR/ADMET with medicinal chemistry cycles
Regulatory context for in-silico models (OECD principles)
Documentation, versioning & model cards
Basic automation of QSAR/ADMET workflows
Capstone: build & document a QSAR/ADMET model

In-Silico · Online Structural Biology Docking · Molecular Dynamics

Check below focused areas and choose one to apply

Structural biology techniques overview (X-ray, NMR, cryo-EM)
PDB/mmCIF formats, headers & annotations
Structure validation: clashes, geometry & Ramachandran
Missing residues, alternate conformations & occupancy
Biological assembly vs asymmetric unit
Protonation states, tautomers & pH-dependent features (concepts)
Binding site identification & pocket detection
Non-covalent interactions: H-bonds, hydrophobics, salt-bridges
Ligand structure preparation & standardisation
Protein preparation pipelines (cleanup & optimisation)
Grid generation & binding box definition
Docking search algorithms (systematic, stochastic, genetic)
Scoring functions (force-field, empirical, knowledge-based)
Covalent docking concepts & warhead considerations
Docking protocol validation & enrichment metrics
Redocking & cross-docking workflows
Consensus scoring & rescoring strategies
Post-docking filtering & visual triage
Virtual screening campaigns & library handling
Pose visualisation & interaction diagrams
Molecular dynamics fundamentals & force fields
Topology/building for proteins & ligands
Solvation, ion placement & box types
Energy minimisation & relaxation
Equilibration: NVT, NPT & restraints
Production MD setup & run management
Trajectory handling, stripping & storage
MD analysis: RMSD, RMSF, radius of gyration
Hydrogen bonds, contacts & distances over time
Binding pocket stability & induced-fit insight
MM/PBSA & MM/GBSA binding free energy estimates
Free energy methods overview (FEP, TI, umbrella sampling concepts)
Enhanced sampling basics (metadynamics, replica exchange concepts)
Coarse-grained MD concepts & when to use
Membrane protein system setup (lipid bilayers) overview
Nucleic acids & protein–DNA/RNA complexes handling
Allosteric site identification & analysis
Fragment-based design in a structural context
Water networks & displacement analysis
Structure-based pharmacophore modelling
Ensemble docking & induced-fit strategies
Homology modelling & template selection
Loop modelling & structure refinement
Protein–protein docking overview & scoring
Variant & resistance mutation analysis (structural)
Integrating cryo-EM maps & models (concepts)
Automating structural workflows via scripts
Best practices for reproducibility in SBDD work
Documenting protocols, parameters & seeds
Preparing high-quality structural figures for publication
Capstone: SBDD mini-project combining docking + MD

In-Silico · Online Pharmacometrics PK/PD Modeling · MIDD

Check below focused areas and choose one to apply

PK fundamentals: ADME, concentration–time profiles
Noncompartmental analysis (NCA) basics
One- and two-compartment PK models (IV bolus)
Infusion and extravascular dosing models
Clearance, volume, half-life & exposure relationships
Absorption models: first-order, zero-order, transit compartments
Lag time, flip–flop kinetics & absorption complexities
Bioavailability (F) and bioequivalence metrics (AUC, Cmax)
Linear vs nonlinear PK (capacity-limited, Michaelis–Menten)
Time-varying processes (autoinduction, tolerance) basics
Population PK (popPK) concepts & variability sources
Random effects: inter-individual, inter-occasion, residual error
Covariate model building strategies (graphical, stepwise, full)
Handling body size, age & organ function as covariates
Allometric scaling & maturation functions in paediatrics
Dataset structure for popPK/PKPD (wide vs long formats)
BLQ data handling (M1–M4 methods, concepts)
Model diagnostics: GOF plots, residuals, IWRES, CWRES
Prediction- and simulation-based diagnostics (VPC, pcVPC)
Bootstrap & parameter precision assessment
Shrinkage, identifiability & parameter correlation
PK/PD structural models: direct effect models
Indirect response & turnover models
Emax & sigmoid Emax models for continuous PD
Effect compartment & hysteresis loop concepts
Exposure–response relationships for efficacy
Exposure–safety & tolerability modelling
Dose–response modelling for binary & ordered endpoints
Time-to-event (TTE) modelling basics (hazard, survival)
Therapeutic drug monitoring (TDM) & Bayesian forecasting
Drug–drug interaction (DDI) modelling (inhibition/induction)
Special populations: renal/hepatic impairment, paediatrics
Physiologically based PK (PBPK) concepts & use-cases
Model-based meta-analysis (MBMA) basics
Clinical trial simulation for design optimisation
Adaptive and seamless design concepts (model-informed)
Regulatory guidance (FDA/EMA) on PK/PD & ER analyses (overview)
Model-informed drug development (MIDD) case-study flow
Workflows in NONMEM/Monolix/nlmixr (high level)
Using PsN-style tools for automation & diagnostics
R-based workflows: data prep, fitting, diagnostics & plots
Good modelling practice & analysis plans (SAPs/MAPs)
Documentation, model history & decision traceability
Version control for scripts, models & datasets
Reproducible reports in R Markdown/Quarto
Communicating PK/PD results to non-modellers
Visual storytelling: spaghetti plots, CI bands, forest plots
Simple PK/PD dashboards for clinicians & teams
Cross-functional collaboration with clinicians & statisticians
Capstone: end-to-end popPK + exposure–response analysis

In-Silico · Online Toxicology Safety Prediction · Risk

Check below focused areas and choose one to apply

Basics of toxicology: dose–response & risk concepts
In-silico toxicology landscape & applications
Chemical structure curation for tox modelling
Toxicity endpoints: acute, chronic, genotoxicity, carcinogenicity
Organ-specific toxicity: hepatotoxicity, nephrotoxicity, cardiotoxicity
hERG & QT prolongation risk prediction (concepts)
In-silico mutagenicity & Ames test surrogates
Skin sensitisation & irritation/corrosion models
Respiratory & inhalation toxicity prediction
Developmental & reproductive toxicity (DART) concepts
Endocrine disruption & nuclear receptor alerts
Tox read-across principles & category formation
Structural alerts & rule-based profilers (PAINS/ToxAlerts style)
QSAR models for toxicity endpoints (classification/regression)
Descriptor & fingerprint choices for tox QSAR
Applicability domain (AD) for tox models
Model validation vs OECD principles (overview)
Consensus modelling & model stacking for safety
Handling imbalanced toxicity datasets
Negative vs positive control selection & curation
ADME vs tox: integrating clearance & exposure
Margin of safety & safety indices (basic calculations)
In-vitro to in-vivo extrapolation (IVIVE) concepts
Benchmark dose (BMD) modelling basics
PBPK/PD for safety assessment (intro)
Occupational & environmental exposure scenarios
REACH-style chemical safety assessment (overview)
ICH M7 & genotoxic impurities (concepts)
TTC (threshold of toxicological concern) concepts
In-silico DILI (drug-induced liver injury) risk
Idiosyncratic toxicity hypotheses & flags
Nanomaterial toxicity: in-silico considerations
Ecotoxicology endpoints & QSARs (fish, daphnia, algae)
Bioaccumulation & persistence (PBT/vPvB criteria, concepts)
Mixture toxicity & combination effects (intro)
Off-target & polypharmacology in safety space
Off-target activity mining from bioactivity databases
Using omics & pathway data in toxicology
Network-based toxicology & adverse outcome pathways (AOPs)
Automated profiling with open-source tox platforms
Uncertainty communication & model disclaimers
Regulatory acceptance of in-silico toxicology (high level)
Building transparent tox model reports
Reproducible pipelines for tox data & models
Visualisations: tox radar charts & traffic-light tables
Prioritising compounds for testing using in-silico scores
Integrating in-silico tox into medicinal chemistry cycles
Data management, versioning & audit trails in tox projects
Ethics of reducing animal use with computational methods
Capstone: design an in-silico safety screening cascade

In-Silico · Online AI/ML for Biosciences LLMs · CV · Tabular

Check below focused areas and choose one to apply

Problem framing & ML workflow for bioscience use-cases
Bioscience data types: tabular, images, sequences, text, graphs
Data cleaning & QC for lab/clinical datasets
Handling missing values, outliers & batch effects
Train/validation/test splits & data leakage prevention
Feature engineering for omics & lab measurements
Classical ML: linear/logistic regression for biomarkers
Tree-based models: Random Forest & Gradient Boosting (concepts)
Regularisation (L1/L2, elastic net) & overfitting control
Model evaluation metrics (AUC, PR, accuracy, RMSE, etc.)
Imbalanced classes & rare-event modelling strategies
Cross-validation, nested CV & robust performance estimation
Calibration, decision thresholds & clinical utility curves
Uncertainty estimation & confidence intervals (concepts)
Dimensionality reduction (PCA, t-SNE, UMAP) for bioscience data
Clustering & unsupervised learning for patient/assay stratification
ML on tabular omics/clinical datasets end-to-end
Feature selection & stability analysis
Simple AutoML-style pipelines for small teams
Model interpretability: global feature importance & partial dependence
SHAP/LIME-style local explanations (concepts)
Computer vision basics for microscopy & pathology images
CNN concepts: convolutions, pooling & feature maps
Transfer learning with pre-trained vision models
Data augmentation & stain/illumination variability handling
Object detection & segmentation for cells/tissues (high level)
Quality control of imaging datasets & annotations
Evaluation of CV models (IoU, Dice & pixel-wise metrics)
Intro to sequence models (1D CNNs, RNNs, transformers – concepts)
LLM foundations & prompt design for biosciences
Using LLMs for protocol summarisation & documentation assistance
LLM-assisted EDA & coding in R/Python notebooks
Text classification & entity extraction for biomedical text (concepts)
Embeddings & semantic search for scientific literature
Domain-specialised models (biomedical LLMs – concepts)
Basic RAG-style workflows with scientific PDFs (high level)
Multimodal ideas: combining tabular + text or tabular + images
Data governance, anonymisation & PHI/PII basics
Bias, fairness & robustness checks in medical ML
ML model lifecycle: experiment tracking & versioning
Reproducible pipelines with notebooks & scripts
Intro to MLOps: packaging, environments & deployment options (concepts)
Monitoring model drift & refreshing datasets (concepts)
Documentation & model cards for bioscience models
Visual reporting: dashboards & interpretability reports
Collaboration patterns with biologists, clinicians & engineers
Reading & critiquing ML for health/bio papers
Validation & translation of ML into lab/clinic workflows
Regulatory & ethical overview for AI in health & biotech
Capstone: end-to-end ML workflow on a bioscience dataset

In-Silico · Online BioNLP · Text Mining Literature · Clinical Notes

Check below focused areas and choose one to apply

Biomedical text sources: PubMed, preprints, patents & clinical notes
Data acquisition and APIs (Entrez/Europe PMC concepts)
Text pre-processing: tokenisation, sentence splitting, normalisation
Handling Unicode, punctuation & abbreviations in biomedical text
Stopwords, stemming, lemmatisation vs domain vocabulary
Bag-of-words and tf–idf representations
Word embeddings (word2vec, GloVe – concepts)
Contextual embeddings (BioBERT, ClinicalBERT – concepts)
Subword tokenisation (BPE/WordPiece) & vocabulary handling
Biomedical corpora & annotation schemes basics
Named Entity Recognition (NER) for genes, diseases, drugs, variants
Dictionary & rule-based NER (lexicons, regex) foundations
ML/neural NER pipelines overview
Entity normalisation to UMLS, MeSH, SNOMED CT (concepts)
Abbreviation detection & disambiguation in biomedical text
Relation extraction: gene–disease, drug–drug, drug–event
Co-occurrence vs supervised relation extraction
Event extraction for biological processes (high level)
Negation, speculation & assertion detection
Document classification for topics, trial phases & study types
Multi-label tagging (e.g., MeSH/subheading assignment)
Sentence/passage ranking for evidence retrieval
Information retrieval & BM25 basics
Semantic search with dense embeddings (concepts)
Question answering over biomedical literature (high level)
Summarisation of biomedical articles (extractive/abstractive – concepts)
RAG-style workflows with PDFs and databases (overview)
Knowledge graph construction from text (entities + relations)
Ontology & terminology integration into pipelines
Text mining for systematic reviews & evidence synthesis
Pharmacovigilance signal detection from case reports/text (concepts)
Clinical note processing: de-identification & PHI basics
Text classification for triage, routing & alerts (concepts)
Bias, fairness & domain shift in clinical NLP
Evaluation metrics for NER/RE/classification (precision, recall, F1, etc.)
Annotation tools & workflows (BRAT-style concepts)
Inter-annotator agreement & guidelines design
Active learning & weak supervision for annotation (concepts)
Pipeline design & orchestration for large-scale text mining
Working with rate-limited APIs & big literature pulls
Storage/indexing for text & embeddings (overview)
Visualising entities, relations & evidence networks
Reproducible notebooks & scripts for BioNLP pipelines
Model documentation and “model cards” for BioNLP systems
Prompt design for LLMs on biomedical text
Hallucinations, verification & human-in-the-loop review
Integrating text mining with omics/clinical data projects
Reporting mined evidence to scientists & clinicians
Maintaining/updating models, dictionaries & ontologies
Packaging workflows for collaboration (repos, configs, docs)
Capstone: mini literature-mining pipeline for a chosen disease/target

In-Silico · Online Health Informatics & RWE RWD · Epidemiological Modeling

Check below focused areas and choose one to apply

Health informatics landscape & data sources (EHR, claims, registries)
Data models & standards (HL7 v2/v3 basics, FHIR concepts)
Terminologies & coding (ICD, SNOMED CT, LOINC, RxNorm overview)
Clinical data quality: completeness, correctness, consistency
Phenotyping from EHR: rule-based cohort definitions
Extract–transform–load (ETL) pipelines for clinical data
De-identification & pseudonymisation basics
HIPAA/GDPR-style privacy concepts (non-legal)
Common data models (CDM) overview: OMOP, PCORnet, i2b2
Mapping source data to a CDM (high level)
Time-series structures: visits, episodes, person-time
Basic epidemiologic measures: incidence, prevalence, risk, rate
Person-time and dynamic cohorts
Confounding, bias & effect modification (concepts)
Study designs: cohort, case–control, case–crossover
Target trial emulation using observational data (high level)
Propensity scores & balancing methods (concepts)
Matching/stratification & inverse probability weighting basics
Survival analysis (Kaplan–Meier, Cox model overview)
Competing risks & multi-state concepts
Missing data mechanisms & simple handling strategies
Sensitivity analyses for unmeasured confounding (concepts)
RWE for safety: signal detection from real-world data
RWE for effectiveness & comparative effectiveness research
Pragmatic trials & hybrid designs (high level)
Registries & post-marketing studies (overview)
RWE in regulatory decision-making (FDA/EMA concepts)
Vaccine effectiveness & safety monitoring (epidemiologic view)
Infectious disease modelling: deterministic SIR-type models
Stochastic & agent-based modelling ideas (concepts)
Reproduction number (R₀, Rₜ) and epidemic curves
Scenario analysis & intervention modelling (NPIs, vaccination)
Spatial epidemiology basics: mapping incidence & risk
Cluster detection & hotspot analysis concepts
Dashboards for surveillance & situational awareness
Data pipelines for automated refresh & QC checks
Visual analytics: trend plots, heatmaps, small multiples
Communicating risk, uncertainty & limitations to stakeholders
Governance: data access boards, SOPs & audit trails
Metadata, data dictionaries & lineage documentation
Reproducible R/Python pipelines for RWE studies
Notebook vs script workflows & project templates
Version control for code, cohorts & definitions
Simple containerisation for deployable analytic pipelines
ETL + analytics orchestration (Airflow/Prefect-style concepts)
Quality management & validation of analytic code
Collaboration with clinicians, epidemiologists & IT teams
Writing RWE/epi study reports & technical appendices
Health informatics career paths and role definitions
Capstone: RWE/epidemiology analysis plan + mini pipeline

In-Silico · Online Biostatistics Design · Reproducibility

Check below focused areas and choose one to apply

Types of data, scales of measurement & study variables
Descriptive statistics: centre, spread & shape
Visualising data: histograms, boxplots, scatterplots
Probability basics & common distributions (normal, binomial, Poisson)
Sampling distributions & Central Limit Theorem (CLT)
Point estimation & confidence intervals (means, proportions)
Hypothesis testing concepts: null, alternative, p-values, errors
t-tests: one-sample, paired & two-sample comparisons
ANOVA: one-way & simple post-hoc comparisons
Non-parametric tests (Wilcoxon, Mann–Whitney, Kruskal–Wallis) basics
Chi-square tests for independence & goodness-of-fit
Correlation & simple linear regression
Multiple linear regression & model diagnostics (concepts)
Logistic regression for binary outcomes
Odds ratios, risk ratios & interpreting regression output
Time-to-event data & survival endpoints
Kaplan–Meier curves & log-rank tests (overview)
Cox proportional hazards model (concepts)
Power & sample size for means & proportions
Power & sample size for survival/clinical endpoints (high level)
Parallel-group vs crossover, factorial & cluster trial designs
Randomisation methods: simple, block, stratified (overview)
Blinding, allocation concealment & protocol adherence
Diagnostic test evaluation: sensitivity, specificity, PPV, NPV
ROC curves & AUC interpretation
Repeated measures & mixed-effects concepts
Longitudinal data basics & correlated observations
Handling missing data: MCAR/MAR/MNAR concepts
Complete case, simple imputation & multiple imputation (overview)
Multiple testing, family-wise error & FDR concepts
Interim analyses & stopping rules (high level idea)
Bias types: selection, information, confounding
Effect modification vs confounding (conceptual)
Causal diagrams (DAGs) for design & adjustment planning (overview)
Design of Experiments (DoE) for lab/biotech studies
Factorial & fractional factorial designs (concepts)
Blocking, randomisation & replication in DoE
Response surface methods & optimisation (high level)
Experimental design for omics & high-throughput assays
Pre-registration & statistical analysis plans (SAPs)
Reporting standards: CONSORT, STROBE, PRISMA (concepts)
Reproducible workflows: scripts vs point-and-click
Literate programming: R Markdown / Quarto / Jupyter
Version control for code & analysis (Git basics)
Data provenance, metadata & tidy data principles
Simulation-based power & design checking
Bootstrap & permutation tests (conceptual)
Sensitivity analyses & robustness checks for key assumptions
Collaboration between biostatisticians & domain scientists
Capstone: design + analysis + reproducible report for a study

In-Silico · Online Dashboards & Data Viz R · Python · BI

Check below focused areas and choose one to apply

Foundations of tidy data and data cleaning for visualisation
Choosing appropriate chart types for different biomedical questions
Univariate plots: histograms, density plots and boxplots
Bivariate plots: scatterplots, line charts and bar charts
Multivariate visualisation with colour, facets and small multiples
Perceptual principles and avoiding misleading scales/encodings
Designing publication-quality figures for manuscripts and theses
Colour palettes, accessibility & colour-blind–friendly design
Annotating plots with statistical summaries & uncertainty
Visualising distributions, outliers and batch effects in omics data
Time-series plots for longitudinal and monitoring data
Visualising survival curves, risk tables and confidence bands
Forest plots for effect sizes and meta-analyses (concepts)
Correlation matrices and pair-plots for quick EDA
Heatmaps for gene expression and high-dimensional matrices
Clustered heatmaps & dendrograms (concepts)
Volcano and MA plots for differential expression results
Visualising dimensionality reduction (PCA, t-SNE, UMAP)
Geospatial maps for public health and epidemiology
Dashboards vs static reports: when to use which
Wireframing a scientific dashboard (layout & UX basics)
KPI tiles, summary cards and drill-down design
Filter panels, slicers and linked views for exploration
Designing dashboards for non-technical stakeholders
R/ggplot2 grammar of graphics for layered plots
R-based interactive plots (plotly/ggplotly-style concepts)
Python matplotlib foundations for scientific plots
Python seaborn-style high-level statistical plots (concepts)
Python plotly-style interactive visualisations (concepts)
R Shiny-style dashboard concepts for interactive apps
Python Dash/Streamlit-style dashboard concepts
BI tools (Power BI/Tableau-style) in scientific contexts (overview)
Embedding statistical models & uncertainty into dashboards
Parameter controls & scenario sliders for "what-if" analysis
Visualising model performance (ROC, PR, calibration plots)
Visualising ML feature importance and SHAP-style outputs (concepts)
Designing QC dashboards for labs and NGS pipelines (concepts)
Designing monitoring dashboards for clinical/operational metrics
Handling large datasets: sampling & aggregation strategies
Performance considerations for interactive dashboards (high level)
Exporting figures for journals (size, DPI, formats)
Exporting dashboard views and snapshots for reports
Automating report generation with R Markdown/Quarto
Automating PowerPoint/PDF exports from R/Python (concepts)
Reusable plotting functions & theming for consistent branding
Version-controlling dashboards & visual assets with Git
Documenting dashboard logic & data lineage for auditability
Collecting feedback and iterating on dashboard design
Communicating limitations, caveats & uncertainty in visuals
Capstone: design and implement a small dashboard/report for a bio/health dataset

In-Silico · Online Bio-Data Engineering ETL · Pipelines · Cloud/HPC

Check below focused areas and choose one to apply

Fundamentals of data engineering for bio/health domains
Source systems for bio data: instruments, LIMS, EHR, external databases
Data formats: CSV, TSV, JSON, XML, Parquet basics
Bio-specific formats: FASTQ, BAM/CRAM, VCF, HDF5, AnnData (concepts)
Schema design for experimental and clinical datasets
Data modelling: star/snowflake vs wide tables (overview)
ETL vs ELT concepts and patterns
Batch vs streaming ingestion (high level)
File naming conventions and folder hierarchies for labs
Metadata capture and data dictionaries
Using checksums and manifests for file integrity
Data validation rules and schema checks (concepts)
Handling missing, inconsistent and out-of-range values
ID management, primary keys and foreign keys
Patient/sample IDs and pseudonymisation (non-legal overview)
Log design: capturing run, pipeline and audit logs
Data lineage and provenance tracking (overview)
Designing staging, raw, curated and analytics layers
Partitioning strategies for large tables and object stores
Indexing and query performance basics
Scheduling ETL jobs with cron-style tools (concepts)
Workflow orchestration: Airflow/Prefect-like concepts
Retry, backoff and idempotency in pipelines
Alerting and monitoring for failed jobs
Using message queues/pub-sub (high level)
Interfacing with NGS pipelines and QC outputs
Aggregating MultiQC-style metrics for dashboards
Loading data into analytic databases/warehouses (concepts)
SQL patterns for cohort extraction and summaries
Designing APIs for data access (REST-style concepts)
Object storage (S3/GCS/Azure) and folder layouts
Permissions, IAM roles and principle of least privilege
Basic encryption in transit and at rest (conceptual)
Cloud cost awareness: storage vs compute trade-offs
HPC clusters vs cloud VMs: when to use which
SLURM-style schedulers: jobs, queues and job arrays (concepts)
Container images for pipelines (Docker/Singularity-style concepts)
Environment management and reproducibility (Conda-style)
Config-driven pipelines (YAML/JSON configs)
Template repos and cookiecutter-style project skeletons
Testing ETL code and pipeline components (unit/smoke tests)
Sample data and synthetic datasets for development
Documentation for pipelines and datasets (README, ADRs)
Data catalog and discovery concepts
Governance checklists and access request workflows
Backup, archiving and retention for research/clinical data
Migrating pipelines between on-prem and cloud (high level)
Handover practices for data engineering artefacts
Collaboration between data engineers, bioinformaticians and IT
Capstone: design a simple bio-data lake plus ETL plus analytics layer

In-Silico · Online Knowledge Graphs Ontologies · Semantics

Check below focused areas and choose one to apply

Introduction to knowledge graphs (KGs) and semantic networks
Nodes, edges, labels and properties: KG data model basics
Graph vs relational vs document databases: when to use what (concepts)
RDF triples vs property graphs: comparative concepts
URIs/IRIs and identifiers in biomedical data
Ontologies, taxonomies and controlled vocabularies: key differences
Basic description logics (DL) intuition for practitioners
OWL and RDFS: classes, properties and individuals (concepts)
Reusing standard biomedical ontologies (GO, HPO, DO, etc.) — overview
OBO Foundry principles and ontology ecosystem (high level)
Terminology servers and mapping services (concepts)
Designing simple domain models for diseases, drugs and genes
Representing pathways, interactions and phenotypes in graphs (conceptual)
Entity–relationship and concept modelling prior to graph design
Ontology engineering lifecycle: requirements & competency questions
Ontology editing tools (Protégé-style usage and patterns)
Naming conventions, IDs and annotation properties
Logical vs annotation axioms: keeping models clean (concepts)
Reasoning and classification: what DL reasoners actually do (high level)
Consistency checks and debugging unsatisfiable classes (conceptual)
Integrating heterogeneous datasets into a unified KG
Schema/ontology alignment and mapping strategies
Cross-references, synonyms and equivalence mappings
Normalising identifiers (CURIE patterns: HGNC, UniProt, etc.) concepts
SPARQL querying for RDF-style knowledge graphs (basic patterns)
Graph query languages (Cypher/Gremlin/PGQL-style) — overview
Graph patterns for gene–disease–drug queries
Path, neighbourhood and subgraph queries for hypothesis exploration
Inference-enriched querying: leveraging reasoners (conceptual)
Provenance and evidence modelling (e.g. ECO-style) overview
Attaching scores, confidence and provenance to edges
Modelling temporal/contextual qualifiers (time, tissue, species)
Graph design for clinical concepts: diagnoses, labs, medications (high level)
FAIR data principles and how KGs support FAIRness (overview)
KG construction pipelines from tabular/relational data
ETL for KGs: mapping CSV/SQL to triples/edges
Mapping languages (R2RML/OBDA-style) — conceptual view
Incremental updates and versioning strategies for ontologies/KGs
Quality metrics for KGs: coverage, connectivity, consistency
Graph visualisation tools and layout choices
Integrating KGs with ML (node embeddings, link prediction concepts)
Using KGs to power search, recommendation and Q&A (conceptual)
Working with public biomedical KGs (Bio2RDF-style, etc.) overview
API design for KG-backed applications (REST/GraphQL concepts)
Security, access control and governance in KG deployments (high level)
Collaboration workflows: curators, modellers and engineers
Documentation and onboarding for ontology/KG reusers
Evaluating KG usefulness with real user queries and feedback
Lightweight semantic models for small teams and projects
Capstone: scoped biomedical KG/ontology design + example queries

In-Silico · Online GWAS · Stat Genetics Polygenic Risk

Check below focused areas and choose one to apply

Foundations of population and quantitative genetics (conceptual)
Hardy–Weinberg equilibrium, allele/genotype frequencies
Linkage disequilibrium (LD) , haplotypes and LD decay (concepts)
Common study designs: case–control, cohort, trio and GWAS meta-analysis
Genotyping technologies and SNP arrays (high level overview)
Genotype calling, quality control and imputation (concepts)
Sample- and variant-level QC metrics and thresholds (conceptual)
Population structure, ancestry estimation and PCA plots
Relatedness, kinship and cryptic relatedness checks
Basic association testing: allelic, genotypic and trend tests
Logistic regression for binary traits (GWAS context)
Linear regression for quantitative traits (GWAS context)
Covariate adjustment: age, sex, ancestry PCs and batch effects
Multiple testing and genome-wide significance thresholds
Manhattan and Q–Q plots: construction and interpretation
Inflation factors (λGC) and genomic control concepts
Mixed models and LMM-style association (high level)
Handling stratification and relatedness in association studies
Conditional and joint association analyses (concepts)
Fine-mapping and credible sets (high level overview)
Gene-based and region-based association testing (concepts)
Pathway and enrichment-style analyses for GWAS hits (conceptual)
Rare-variant and burden test basics (overview)
Imputation reference panels and reference bias (high level)
Trans-ethnic GWAS and transferability challenges
eQTL and QTL-style association concepts
Colocalisation (GWAS with QTL traits) concepts
Post-GWAS functional annotation of variants (conceptual)
Variant-to-gene linking strategies (overview)
Polygenic inheritance and SNP heritability (concepts)
SNP-heritability estimation and LD score regression (conceptual)
Construction of polygenic risk scores (PRS) from GWAS summary stats
Clumping and thresholding for PRS (high level)
Bayesian and shrinkage methods for PRS (concepts)
Evaluating PRS: AUC, R² and calibration (overview)
Portability of PRS across ancestries (issues and considerations)
Translational aspects of PRS (screening, stratification; non-clinical overview)
Simulating genotype–phenotype data for teaching and validation
Data formats for GWAS: PLINK, VCF, text summary stats
Working with public GWAS catalog and summary-statistics resources
Basic scripting workflows to run QC and association steps
Reproducible pipelines for GWAS and PRS analysis (high level)
Visualising GWAS and PRS results for presentations and reports
Interpreting and communicating GWAS findings responsibly
Understanding common pitfalls and over-interpretation risks
Ethical, legal and social considerations in genetic risk analysis (non-legal)
Integrating GWAS with other omics (brief conceptual overview)
Documentation, metadata and analysis plans for GWAS projects
Collaboration between statisticians, geneticists and clinicians
Capstone: mini GWAS/PRS analysis using public summary data (conceptual pipeline)

In-Silico · Online Synthetic Biology Design & CAD

Check below focused areas and choose one to apply

Overview of synthetic biology and design-build-test-learn cycles
DNA as a programmable substrate: parts, devices and systems (concepts)
Biological parts libraries: promoters, RBSs, CDS, terminators (overview)
Standards for genetic parts and assemblies (conceptual)
Design constraints: host, chassis, context and burden
Gene circuit motifs: toggle switches, oscillators and logic gates (concepts)
In-silico prototyping of simple gene circuits
Concepts of metabolic engineering within synthetic biology
Pathway selection and retrosynthesis-style route planning (high level)
Flux and cofactor considerations (conceptual)
DNA sequence design: codon usage, GC content and constraints
Minimal off-target and homology considerations (overview)
Basic design rules to avoid unwanted secondary structures
Insulation, orthogonality and composability concepts
Host chassis options: bacteria, yeast, mammalian cells (conceptual)
Genome-scale design vs plasmid-level design (high level)
Genome editing design concepts (CRISPR guide design, high level)
In-silico design of gRNAs and off-target scanning (conceptual)
Regulatory element tuning: promoter/RBS strength design (concepts)
Transcriptional, translational and post-translational control layers
Signal processing and biosensor circuit concepts
Kill-switches and biocontainment designs (conceptual)
Modular cloning and assembly strategy planning (Golden Gate-style concepts)
DNA assembly maps, compatibility and overhang planning
Basic SBML-style model concepts for gene circuits
Deterministic vs stochastic modelling of gene networks (high level)
Parameter ideas: transcription, translation, degradation rates (conceptual)
Simulating circuit dynamics to evaluate design behaviour (conceptual)
Sensitivity-style thinking: which parameters influence behaviour most
Constraint-based modelling ideas for metabolic pathways (conceptual)
Multi-objective trade-offs: productivity vs growth vs stability
Digital twins and in-silico strain design concepts
Data needed to calibrate and refine synbio models (overview)
Integrating omics data into design decisions (conceptual)
CAD workflows for DNA constructs (design to sequence file)
Annotation of designs with features, landmarks and metadata
Version control for constructs and design iterations
Bill of materials (BOM) for DNA synthesis and cloning (concepts)
Design review checklists before sending constructs for synthesis
Laboratory protocols as structured design outputs (conceptual)
Design of experiments (DoE) ideas for testing circuit variants
Recording build-test results for feedback into design
Pipelines linking CAD tools to LIMS/ELN-style systems (conceptual)
Graph representations of parts and constructs (high level)
Risk thinking: failure modes in constructs and circuits
Ethical, biosafety and biosecurity considerations in synbio design (non-regulatory overview)
Applications: biosensors, biomanufacturing, cell-based therapies (conceptual survey)
Communication of synthetic biology designs to wet-lab teams
Documentation packages: maps, sequence files and simulation notes
Capstone: scoped in-silico design and CAD for a simple synthetic biology construct or circuit

In-Silico · Online Signal & Image MRI · Pathology · CV

Check below focused areas and choose one to apply

Basics of biomedical signals (ECG, EEG, EMG) and images (MRI, CT, microscopy)
Sampling, Nyquist concepts and anti-aliasing in biomedical acquisition
Noise sources in biomedical data and denoising strategies (conceptual)
Time-domain features: peaks, intervals, morphology descriptors
Frequency-domain analysis: FFT and power spectra (high level)
Time–frequency concepts: STFT/wavelet thinking (overview)
Digital filtering concepts: low/high/band-pass and notch filters
Baseline wander, motion artefact and powerline interference (ECG/EEG)
ECG waveform segmentation: P–QRS–T detection (concepts)
Heart rate variability (HRV) feature families (conceptual overview)
EEG channel layouts and basic rhythm bands (δ, θ, α, β, γ concepts)
Event-related potentials (ERP) and simple averaging concepts
Signal quality indices (SQI) and QC thinking for biosignals
2D image basics: pixels, resolution, bit depth and colour models
Image histograms, contrast stretching and basic enhancement
Smoothing, sharpening and edge detection filters (conceptual)
Segmentation concepts: thresholding, region-based, clustering approaches
Connected components and basic morphology (erosion/dilation) concepts
Feature extraction: shape, texture and intensity descriptors
Classical ML for classification/regression on engineered features
Introduction to computer vision with biomedical examples
Deep learning concepts for images: CNN intuition (no heavy math)
Segmentation networks (U-Net-style ideas) for lesion/tissue masks
Detection and localisation concepts (bounding boxes, heatmaps)
Patch- and tile-based analysis for whole-slide pathology images (conceptual)
Registration concepts: aligning multimodal images (e.g. MRI/CT)
Motion correction concepts for dynamic imaging
Region-of-interest (ROI) selection and feature summarisation
Radiomics-style feature families (shape, intensity, texture concepts)
Basic MRI sequences overview (T1/T2/FLAIR; non-physics focus)
Simple brain MRI workflows: skull strip → segment → quantify (conceptual)
Quantitative metrics: volumes, thickness, signal ratios (overview)
Pathology image colour normalisation (concepts)
Basic pipelines for nuclei or cell segmentation in histology
Quality assurance for imaging pipelines and outputs
Dataset curation: de-identification concepts for images/signals
Train/validation/test splits and leakage pitfalls (conceptual)
Evaluation metrics: accuracy, ROC/AUC, Dice, IoU (overview)
Cross-validation and robustness thinking for biomedical ML
Simple explainability concepts (saliency/heatmap intuition)
Pipeline design: from raw DICOM/waveforms to analysis-ready datasets
File formats: DICOM, NIfTI, TIFF/WSI, EDF-style signals (high level)
Organising datasets and metadata for reproducibility
Basic scripting workflows to chain processing steps
Documenting pipelines: configs, logs and reports
Common pitfalls and artefacts in MRI and pathology image analysis
Bias, generalisation and domain shift considerations (conceptual)
Ethical/clinical caveats: decision support vs diagnosis (non-clinical training)
Communicating results with clear caveats and limitations
Capstone: design a scoped analysis pipeline for one signal or image use-case

In-Silico · Online LIMS · ELN Lab Automation · Digital

Check below focused areas and choose one to apply

Overview of LIMS, ELN and lab automation ecosystems
Sample lifecycle concepts: accessioning → testing → storage → disposal
Sample identification, barcoding and labelling strategies (conceptual)
Aliquots, derivatives, batches and pooling in digital workflows
Test definitions, panels and method metadata in LIMS
Instrument worklists and basic instrument integration concepts
Result entry, verification and validation workflows (high level)
QC samples, controls and flags in digital workflows (concepts)
Difference between LIMS vs ELN vs inventory tools (roles)
Designing structured ELN templates for experiments and assays
Free-text vs structured fields: balance & trade-offs
Metadata standards and ontologies for lab records (conceptual)
Inventory management: reagents, consumables, lots and expiry
Storage location hierarchies: room → freezer → rack → box → position
Chain-of-custody logging and traceability concepts
Scheduling and capacity: planners, calendars and resource booking
User roles, permissions and segregation of duties (high level)
Master data: tests, instruments, locations, units and reference ranges
Configuration vs customisation: what to tweak vs what to avoid
Workflow engines and state machines inside LIMS-style systems (concepts)
Designing a sample accessioning workflow (from request to label)
Stability studies and sample retention tracking (conceptual)
Deviation, incident and CAPA logging in digital systems (overview)
Audit trails and e-signatures: 21 CFR Part 11-style concepts (non-legal)
Review and approval workflows for results and reports
Basic validation and UAT thinking for LIMS/ELN changes
Import/export patterns: CSV templates and simple APIs (conceptual)
HL7/FHIR-style interfacing concepts for hospital/lab connectivity
Barcode template design and label layout simulation
Parsing simple instrument data files into LIMS-friendly structures
Rules engines for auto-accept, auto-flag and reflex testing (high level)
Dashboards for sample counts, TAT and workload monitoring
Key KPIs for lab operations: TAT, pending, re-run, rejection metrics (conceptual)
ELN templates for SOPs, methods and experiment notes
Linking ELN pages to samples, runs, attachments and reports
Template versioning, approvals and change history
Scripting repetitive tasks and simple automations around LIMS data
Logical scheduling of instruments and robots (simulation concepts)
Simulating sample routing across benches, rooms and instruments
What-if simulations: workload, capacity and staffing scenarios
Designing a core data model for a small lab (entities & relationships)
Choosing fields, constraints and validations in forms
Test case design for new workflows and configuration changes
Migration from spreadsheets/manual logs to LIMS/ELN (conceptual roadmap)
Change control and configuration management basics for labs
Backup, restore and archiving concepts for lab data
Data integrity and ALCOA+ principles (high-level overview)
Vendor-neutral selection checklists and RFP-style thinking (concepts)
User training, SOPs and adoption strategies for digital lab systems
Capstone: design and simulate a mini LIMS/ELN workflow for one lab scenario

In-Silico · Online Digital QA/QC Compliance & e-Validation

Check below focused areas and choose one to apply

Role of QA vs QC vs digital teams in regulated-style environments (conceptual)
Basics of GxP-style thinking for labs, manufacturing and R&D (non-legal overview)
ALCOA+ data integrity principles and examples
Master data, reference data and controlled vocabularies for QA/QC
Capturing QC data digitally: checklists, forms and structured logs
Deviation, incident and OOS/OOT logging concepts
Change control records and impact assessment thinking (high level)
CAPA lifecycle: root cause → actions → effectiveness checks (conceptual)
Risk-based thinking: FMEA-style concepts for processes and systems
Digital SOPs and controlled document management workflows
Version control, approvals and training assignment concepts for SOPs
Training records and competency tracking in digital systems
Audit trail concepts: who changed what, when and why
Basics of computerized system lifecycle (plan → spec → build → test → release)
User requirements vs functional specifications vs configuration specs (conceptual)
Configuration vs customisation in lab/QA systems (trade-offs)
Test plan, test case and test script design concepts
Static vs dynamic testing; installation, operational and performance test ideas
Traceability matrix concepts: linking requirements to tests and evidence
Electronic signatures and identity verification concepts (non-legal overview)
Part 11-style control concepts: access, audit trails, records (high level)
Data classification and retention concepts for QA/QC data
Backup, restore and archival testing concepts
Configuring checks and limits: specifications, ranges and QC rules
Digital QC calculations, rounding and significant figures (conceptual)
Control charts and trend monitoring basics (X-bar, R charts concepts)
Using dashboards to monitor deviations, CAPA, complaints and KPIs
Key QA/QC KPIs: deviations, CAPA closure, investigation times, OOS rates
Sampling plans and acceptance criteria concepts (non-statistical overview)
Linking equipment, instruments and calibration records to QA/QC data
Template design for forms: required fields, checks and picklists
Valid values, lookup lists and reference tables for quality data
Workflows for complaint intake, triage and investigation logging
Internal and external audit planning and follow-up tracking (digital)
Vendor and supplier qualification records (conceptual)
Risk registers and mitigation tracking for digital systems
Spreadsheet risk assessment and control concepts
Computerized system risk assessment and classification ideas
Data migration and cutover checklists for QA/QC systems
Periodic review concepts for systems, configurations and data
Using queries and simple analytics to detect anomalies in QA/QC datasets
Basic statistical summaries for quality metrics (non-deep math)
Storyboarding the e-validation journey for a small system (conceptual)
Templates for validation plans, reports and test summaries
Collaboration between QA, IT, vendors and end-users
Common pitfalls in e-validation and compliance analytics (conceptual)
Non-compliance scenarios and remediation planning (high level)
Readiness for inspections: digital evidence, audit trails and reports (conceptual)
Communication and training strategies for digital QA/QC initiatives
Capstone: scoped digital QA/QC and e-validation concept for one system or process

In-Silico · Online Oncology Informatics Biomarker Analytics

Check below focused areas and choose one to apply

Landscape of oncology informatics: clinical, molecular & real-world data (overview)
Cancer biology & hallmarks (high-level, non-clinical)
Tumour classification, staging & grading concepts (non-clinical)
Structured data in oncology: diagnosis, procedures, drugs, outcomes
Coding systems & terminologies in oncology records (conceptual)
Tumour boards, EMR and imaging systems as oncology data sources (overview)
Cancer registry concepts: case capture, follow-up & outcomes (high level)
Data models for patient, tumour, episode and treatment lines
Time-to-event data structures: index dates and censoring fields (concepts)
Handling longitudinal therapies, dose changes and regimen switches
Real-world evidence (RWE) in oncology: opportunities & caveats
Clinical trial data structures: arms, visits and endpoints (overview)
Eligibility, inclusion/exclusion and line-of-therapy derivation concepts
Oncology outcome endpoints (response, progression, survival – definitions only)
Basic survival analysis concepts: KM curves & hazard thinking (non-math)
Confounding & bias in observational oncology datasets
Data quality checks specific to oncology (dates, stage, sites, regimens)
Basic biomarker concepts: diagnostic, prognostic & predictive markers
Genomic biomarkers: variants, fusions & signatures (high-level view)
Immuno-oncology biomarkers: TMB, microenvironment etc. (conceptual)
Multi-omics biomarkers: genomics, transcriptomics, proteomics (overview)
Companion diagnostics & assay report structures (non-clinical)
Integrating molecular reports with EMR/registry records (concepts)
Curation pipelines for variant & biomarker annotations (high level)
Knowledge bases & guidance for cancer variants (conceptual overview)
Real-world biomarker testing patterns & adoption analytics
Cohort definition for biomarker-enriched populations (concepts)
Feature engineering for oncology models: lines, burden, prior therapies
Building descriptive dashboards for oncology programmes
Visualising timelines: swim-lane plots for treatment journeys
Plotting response and burden-of-disease trajectories (high level)
Basic ML concepts in oncology informatics (risk scores, stratification)
Model evaluation metrics for oncology predictions (ROC, PR, calibration – overview)
Fairness & subgroup performance considerations in oncology models (conceptual)
Privacy & de-identification concepts for oncology datasets
Data sharing frameworks & federated thinking (high-level)
Data pipelines from source systems to oncology data marts (ETL concepts)
Data dictionaries & metadata for oncology analytics
QA/QC checks for survival & response-based analyses
Change management when updating coding, staging or biomarker rules
Collaborative workflows across clinicians, data teams & statisticians
Documentation standards for oncology analysis artefacts
Reporting templates for internal tumour boards & strategy teams
Communicating limitations & uncertainty in oncology analytics
High-level view of regulatory & HTA use of oncology data (non-advisory)
Road-mapping an oncology informatics programme in an organisation
Benchmarking and external comparison concepts (registries, publications)
Role of AI/ML & NLP in extracting oncology variables from text (overview)
Common pitfalls and “gotchas” in oncology & biomarker analytics
Capstone: scoped oncology informatics or biomarker analytics mini-project

In-Silico · Online Agri & Plant Omics Crop Informatics

Check below focused areas and choose one to apply

Landscape of agri/plant bioinformatics and crop informatics (overview)
Plant genome organisation, ploidy and reference resources (high level)
Crop pan-genomes and germplasm diversity concepts
Reference genome databases and browsers for major crops (conceptual)
Gene and transcript annotation concepts for plants
Functional annotation sources for plant genes and proteins (overview)
Read mapping and variant calling workflows for crop genomes (conceptual)
Handling polyploidy, homeologs and duplicated regions (high level)
Variant types in crops: SNPs, InDels, SVs and CNVs (conceptual)
Quality control and filtering of variant calls in plant datasets
Constructing and managing variant panels for breeding programmes
Genotyping-by-sequencing (GBS) and array data concepts
Genotype matrices, missingness and imputation ideas
Population structure and relatedness in germplasm panels (conceptual)
Linkage disequilibrium and haplotype blocks (intuitive overview)
QTL mapping concepts for agronomic traits (non-mathematical)
GWAS-style association analysis for crop traits (conceptual)
Multi-environment trial (MET) data structures and covariates
Phenotyping data formats: field, greenhouse and high-throughput phenotyping (overview)
Data cleaning and harmonisation for phenotypic traits
Trait definitions, units and scales; basic transformations (conceptual)
Integrating environmental and management data with phenotypes
Genomic selection (GS) concepts and typical workflows (high level)
Model inputs for GS: markers, kinship, environmental covariates
Prediction accuracy, cross-validation and bias considerations (conceptual)
Marker-assisted selection (MAS) vs genomic selection (comparison)
Intro to crop-specific decision-support dashboards (conceptual)
Designing simple dashboards for lines, traits and locations
Basic spatial and GIS concepts for field experiments (overview)
Plot-level vs line-level vs genotype-level aggregations
Gene expression and RNA-seq concepts for plant stress/trait studies
Co-expression networks and modules for plant genes (conceptual)
Pathway and GO term enrichment for crop trait candidates
Multi-omics concepts: linking genomics, transcriptomics and metabolomics in crops
Intro to plant–pathogen interactome and resistance gene analytics (high level)
Plant pan-genome presence/absence variation (conceptual)
Curating metadata for accessions, locations and seasons
Data standards and ontologies in plant breeding and trials (overview)
File formats and organisation for multi-season, multi-location datasets
Basic QC checklists for genotypes, phenotypes and environments
Scenario thinking: benchmarking varieties across locations and years
Intro to crop modelling concepts and linking with informatics outputs
Communicating limitations and uncertainties in agronomic analytics
Ethical and data-sharing considerations in breeding programmes (conceptual)
Collaboration patterns between breeders, bioinformaticians and data teams
Road-mapping data infrastructure for a breeding or crop-research unit
Smallholder vs large-scale contexts: data implications (high-level)
Opportunities for AI/ML and remote-sensing data in crop informatics (overview)
Common pitfalls and misunderstandings in agri/plant bioinformatics
Capstone: scoped crop-informatics or plant-bioinformatics mini-project design

Online Internships

Explore online internships at NTHRYS BIOTECH LABS, offering research and industry-oriented programs in Bioinformatics, Cheminformatics, and more. Enhance your skills in cutting-edge fields with expert-led training and real-world projects.

Online Internships

Online Internship Categories — In-Silico Focus

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply

Check below focused areas and choose one to apply