ÎÚÑ»´«Ã½

Skip to main content

140.617.79
Statistical Modeling for Public Health: Techniques for Model Diagnosis and Selection

Course Status
Cancelled

Location
Internet
Term
Summer Institute
Department
Biostatistics
Credit(s)
2
Academic Year
2025 - 2026
Instruction Method
Synchronous Online
Start Date
Tuesday, June 10, 2025
End Date
Friday, June 13, 2025
Class Time(s)
Tu, W, Th, F, 1:00 - 5:00pm
Auditors Allowed
Yes, with instructor consent
Available to Undergraduate
No
Grading Restriction
Letter Grade or Pass/Fail
Course Instructor(s)
Contact Name
Frequency Schedule
One Year Only
Resources
Prerequisite

- At least one (1) introductory statistics/biostatistics course including experience with linear and logistic regression models
- Experience coding in R/RStudio

Enrollment Restriction
This course is not restricted.
Description
This course dives deep into statistical modeling techniques tailored for public health research, equipping you with the tools to assess model assumptions, select appropriate models for association and prediction, and evaluate model performance. Gain hands-on experience using a variety of R packages to build and refine models applied to real-world health data.
Introduces the purpose of statistical models and reviews key concepts in linear and logistic regression. Discusses principles of model selection, focusing on parsimony, multicollinearity, and the bias-variance tradeoff. Introduces methods for comparing models to assess fit and performance. Explores approaches for predictive modeling and cross-validation. Addresses challenges related to missing data and reviews strategies for maintaining model validity in public health applications. Uses the tidymodels package in R throughout.
Learning Objectives
Upon successfully completing this course, students will be able to:
  1. Evaluate key assumptions underlying linear and logistic regression models using diagnostic tools such as residual plots and goodness-of-fit tests.
  2. Apply model selection strategies to investigate associations between predictors and health outcomes.
  3. Assess prediction model performance using metrics such as root mean square error (RMSE), receiver operating characteristic (ROC) curves, and area under the curve (AUC), and determine optimal cut points for classification tasks.
  4. Perform k-fold cross-validation to enhance model reliability and generalizability.
  5. Analyze the impact of missing data on model development and apply appropriate strategies to address different types of missingness.
  6. Construct, refine, and evaluate models using R and the tidymodels package to promote reproducibility and transparency.
Upon successfully completing this course, students will be able to:
Methods of Assessment
This course is evaluated as follows:
  • 75% Assignments
  • 25% Final Project
Special Comments

7 hours of pre-course homework, 16 hours in-class learning activities (includes three 15 min breaks and labs), 15 hours onsite homework, 10 hours final project.