Loading Video...
NTHRYS
Arrow

Feature Engineering — Scaling, Encoding & Leakage Avoidance Training | Biostatistics & ML for Omics

NTHRYS >> Services >> Academic Services >> Training Programs >> Bioinformatics Training >> Biostatistics, AI/ML & Reproducible Omics Analytics >> Feature Engineering — Scaling, Encoding & Leakage Avoidance Training | Biostatistics & ML for Omics

Feature Engineering — Scaling, Encoding & Leakage Avoidance — Hands-on

Learn how to turn messy biomedical and omics data into robust, model ready features. This module covers scaling, transformations, encoding strategies and systematic leakage avoidance so that your train/validation/test pipelines remain honest, reproducible and deployment ready in R and Python.

Feature Engineering — Scaling, Encoding & Leakage Avoidance
Help Desk · WhatsApp
Session 1
Fee: Rs 8800
Fundamentals of Features & Data Representation
  • From raw measurements to features
  • wide vs long omics tables targets, covariates, IDs feature types and roles
  • Numeric, categorical and ordinal variables
  • continuous vs discrete ordinal scales date/time and time since
  • Feature quality checks
  • missingness patterns low variance and constants correlated and redundant features
Session 2
Fee: Rs 11800
Scaling, Transformation & Normalization
  • Why scaling matters for ML algorithms
  • distance based vs tree based models gradient descent stability clinical interpretability
  • Common scaling and transformation choices
  • standardization and min max robust and quantile transforms log, Box Cox, Yeo Johnson
  • Omics specific normalization ideas
  • library size concepts z scores and ranks batch aware scaling caveats
Session 3
Fee: Rs 14800
Encoding Categorical & Structured Data
  • Basic encoders and when to use them
  • one hot and dummy coding ordinal encoding hashing tricks
  • Advanced encoders for high cardinality data
  • target and impact encoding frequency and likelihood encoders leakage risks with target encoders
  • Dates, times and grouped features
  • cyclical encodings aggregations per patient or sample group level summaries
Session 4
Fee: Rs 18800
Leakage Avoidance & Safe ML Pipelines
  • What is data leakage and how it arises
  • preprocessing on full dataset vs training fold
  • Safe splitting and cross validation practices
  • train/validation/test protocols pipeline objects and column transformers grouped and patient level splits
  • Deliverables: leak safe feature pipeline
  • R recipes and tidymodels Python scikit learn pipelines documented transformation map


PDF