Loading Video...
NTHRYS
Arrow

Benchmarking Datasets & Metrics (MoleculeNet/TDC) | Robust Splits, Metrics & Reproducibility

NTHRYS >> Services >> Academic Services >> Training Programs >> Bioinformatics Training >> Cheminformatics, QSAR & ADMET >> Benchmarking Datasets & Metrics (MoleculeNet/TDC) | Robust Splits, Metrics & Reproducibility

Benchmarking Datasets & Metrics — Hands-on

Build trustworthy model benchmarks for QSAR/ADMET and related tasks using MoleculeNet and the Therapeutics Data Commons (TDC) . This hands-on module covers dataset curation, split strategies (random/scaffold/time/geography) , metric selection for imbalanced data, calibration & uncertainty, and leaderboard-quality reporting with complete reproducibility.

Benchmarking Datasets & Metrics (MoleculeNet/TDC)
Help Desk · WhatsApp
Session 1
Fee: Rs 18800
Datasets, Curation & Splits
  • MoleculeNet & TDC overview
  • tasks: cls/reg/multi-task data cards & provenance licensing & ethics
  • Curation & hygiene
  • standardization & dedupe unit/assay harmonization scaffold/time-aware splits
  • Leakage prevention
  • series & near-duplicate checks vendor/batch effects split diagnostics
Session 2
Fee: Rs 21800
Metrics, Imbalance & Calibration
  • Choosing the right metrics
  • ROC-AUC vs PR-AUC AUROC pitfalls on imbalance RMSE/MAE/R2 for regression
  • Thresholds & calibration
  • Youden/F1/expected cost reliability/Brier/ECE cost-sensitive curves
  • Imbalance handling
  • stratified CV class weights & resampling confidence bands
Session 3
Fee: Rs 24800
Baselines, Reproducibility & Uncertainty
  • Model baselines & controls
  • linear/RF/XGB graph & transformer (lite) ablation & sanity checks
  • Reproducibility & seeds
  • config-managed runs fixed seeds & CI checks data/version hashes
  • Uncertainty & AD
  • ensembles & bootstraps conformal intervals applicability domain flags
Session 4
Fee: Rs 28800
Mini Capstone: Reproducible Leaderboard
  • Create a benchmark suite on 2–3 datasets (cls + reg)
  • Theory + Practical
  • Report with proper metrics & calibration
  • ROC/PR & RMSE/MAE reliability/ECE plots uncertainty/AD overlays
  • Deliverables
  • leaderboard (CSV/HTML) configs/notebooks & seeds dataset/data-card bundle


PDF