This contribution is from David Corliss. David teaches a class on this subject, giving a (very brief) description of 23 regression methods in just an hour, with an example and the package and procedures used for each case.

Here you can check the webcast done for Central Michigan University. The slide deck can be found here. Below is the presentation transcript. If you know some other types of regressions, you can list them in the comment section below. For instance, I would add piecewise linear regression, as well as regression on unusual domains (on a sphere or on the simplex.) For more on regression, click here.

Source for picture: Isotonic regression

Presentation transcript

1 Speed Dating with Regression Procedures. David J Corliss, PhD, Wayne State University, Physics and Astronomy / Public Outreach

2 Model Selection Flowchart NON-LINEAR LINEAR MIXED NON-PARAMETRIC

3 Decision: Continuous or Discrete Outcome PROC LOGISTIC PROC REG

4 Simple Linear Regression Regression Type: Continuous, linear Regression Type: Continuous, linear General regression procedure with a number of options but limited specialized capabilities, for which other procedures or packages have been developed General regression procedure with a number of options but limited specialized capabilities, for which other procedures or packages have been developed Choice of model variable selection methods (e.g., Forward, Backwards, Best Subsets), can be coded for polynomial regression, multiple model statements and features interactive capability Choice of model variable selection methods (e.g., Forward, Backwards, Best Subsets), can be coded for polynomial regression, multiple model statements and features interactive capability SAS = REG, R = lm function, regress SAS = REG, R = lm function, regress

5 Simple Linear Regression Eample: Homeless Students by State Example: Homeless Students by State Solid performance of the model across the range from low to high homelessness states indicates consistency of factors correlated with the number of homeless students r 2 =.652 Actual Percent Model - Percent of Student Population

6 Special Data Needs: Problems with Outliers Robust Regression Regression Type: Continuous, linear Regression Type: Continuous, linear Robust regression is achieved by identifying outliers, limiting their influence by assigning weights and then performing standard regression Robust regression is achieved by identifying outliers, limiting their influence by assigning weights and then performing standard regression Choice of methods for outlier detection e.g. M, LTS, S and MM estimation; robust ANOVA Choice of methods for outlier detection e.g. M, LTS, S and MM estimation; robust ANOVA SAS = ROBUSTREG, R = robustbase, robust SAS = ROBUSTREG, R = robustbase, robust

7 PROC ROBUSTREG Eample: Log-Log Regression With Weighted Outliers Example: Log-Log Regression With Weighted Outliers SAS/STAT ® 9.2 User’s Guide, support.sas.com In Robust Regression, the outliers need not be disregarded: weights can be assigned and incorporated in the regression

8 Special Data Needs: Ill-Conditioned Data Regression Using Givens Rotations Regression Type: Continuous, linear Regression Type: Continuous, linear Regression using the Gentleman-Givens procedure instead of collecting crossproducts Regression using the Gentleman-Givens procedure instead of collecting crossproducts For ill-conditioned data, where small errors in the data may cause large errors in the results – more accurate than simple regression For ill-conditioned data, where small errors in the data may cause large errors in the results – more accurate than simple regression SAS = ORTHOREG, R = givens SAS = ORTHOREG, R = givens

9 Givens Rotation Regression Eample: Fitting a Higher-Order Polynomial Example: Fitting a Higher-Order Polynomial SAS/STAT ® 9.2 User’s Guide, support.sas.com An example of fitting a 9 th -degree polynomial, where near singularities must be distinguished from true ones

10 Special Data Needs: Transformation Regression with Data Transformation Regression Type: Continuous, linear Regression Type: Continuous, linear Regression with a number of data transformations, including smooth, spline, Box-Cox and other non- linear forms Regression with a number of data transformations, including smooth, spline, Box-Cox and other non- linear forms Supports fitting splines with a user-specified degree and number of knots; capable of piece-wise solutions Supports fitting splines with a user-specified degree and number of knots; capable of piece-wise solutions SAS = TRANSREG, R = reg, betareg SAS = TRANSREG, R = reg, betareg

11 Regression with Data Transformation ample: Spline Regression to a Complex Form Example: Spline Regression to a Complex Form Splines used to fit to a spectrographic line profile to determine the radial velocity of erupting gas from a star

12 Special Model Types: General Linear General Linear Models Regression Type: Continuous, linear Regression Type: Continuous, linear General purpose procedure for continuous least squares regression using classification predictor variables as well as continuous General purpose procedure for continuous least squares regression using classification predictor variables as well as continuous While capable of many types of models and analysis, another procedure is often better for a specific task While capable of many types of models and analysis, another procedure is often better for a specific task SAS = GLM, R = glm function SAS = GLM, R = glm function

13 General Linear Model Eample: Age Group as a Categorical Predictor Variable Example: Age Group as a Categorical Predictor Variable GLM used with Box and Whisker output An Overview of ODS Statistical Graphics in SAS ® 9.3 Robert N. Rodriguez, SAS Institute Inc., Cary, NC agegroup Distribution of Response

14 Special Model Types: By Quantile Quantile Regression Regression Type: Continuous, linear Regression Type: Continuous, linear Quantile regression: while other procedures model the mean, quantile regression models the median and other specified quantiles to provide a more complete picture of the response variable Quantile regression: while other procedures model the mean, quantile regression models the median and other specified quantiles to provide a more complete picture of the response variable Uncertainties for individual quantiles can be estimated by bootstrapping Uncertainties for individual quantiles can be estimated by bootstrapping SAS = QUANTREG, R = quantreg SAS = QUANTREG, R = quantreg

15 Quantile Regression Eample: 5/10/ 25/50/75/90/95% Quantiles Example: 5/10/ 25/50/75/90/95% Quantiles An example of Quantile Regression demonstrating greater detail than possible with ordinary regression Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY Predicted birth weight by maternal weight gain

16 Special Model Types: PLS, PCA Regression Partial Least Squares & Principal Components Regression Type: Continuous, linear Regression Type: Continuous, linear Partial Least Squares and Principal Component regression: predictor and response variables are projected into a new coordinate systems, possibly with reduced complexity Partial Least Squares and Principal Component regression: predictor and response variables are projected into a new coordinate systems, possibly with reduced complexity Supports reduced rank regression with cross validation of the number of components Supports reduced rank regression with cross validation of the number of components SAS = PLS, R = pls SAS = PLS, R = pls

17 Partial Least Squares / Principal Components Eample: Variable Importance Plot Example: Variable Importance Plot Principal Component variables derived from the original, observed variables Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY

18 Special Model Types: Survey Data Survey Regression Regression Type: Continuous, linear Regression Type: Continuous, linear Special capabilities for analysis in the presence of common survey data features, including stratification, clustering and weighting Special capabilities for analysis in the presence of common survey data features, including stratification, clustering and weighting Supports several methods for sampling and estimation of sampling error using either Taylor series or primary sample units Supports several methods for sampling and estimation of sampling error using either Taylor series or primary sample units SAS = SURVEYREG, R = survey SAS = SURVEYREG, R = survey

19 Survey Regression Eample: Regression with Stratified Sampling Example: Regression with Stratified Sampling Example output from application to survey data, with summary statistics and model parameters PROC SURVEYREG sas.support.com, example 98.4 Stratum Information Stratum IndexStateRegionN ObsPopulation TotalSampling Rate 1Iowa % % % 4Nebraska % % Tests of Model Effects EffectNum DFF ValuePr > F Model Intercept FarmArea Note:The denominator degrees of freedom for the F tests is 14. Estimated Regression Coefficients ParameterEstimate Standard Errort ValuePr > |t| Intercept FarmArea Covariance of Estimated Regression Coefficients InterceptFarmArea Intercept FarmArea

20 Special Model Types: PH on Survey Data Proportional Hazards with Survey Data Regression Type: Continuous, linear Regression Type: Continuous, linear Performs Cox Proportional Hazards modeling on survey data with truncation, supporting stratification, clustering and weighting Performs Cox Proportional Hazards modeling on survey data with truncation, supporting stratification, clustering and weighting Performs estimation of variance by model parameters by Taylor series, BRR or Jackknife Performs estimation of variance by model parameters by Taylor series, BRR or Jackknife SAS = SURVEYPHREG, R = survey SAS = SURVEYPHREG, R = survey

21 Proportional Hazards with Survey Data Eample: Stratified Sampling with Truncated Data Example: Stratified Sampling with Truncated Data Example output for Proportional Hazards regression on survey data with truncation: summary statistics and model parameters PROC SURVEYPHREG sas.support.com, example 97.2 Analysis of Maximum Likelihood Estimates ParameterDFEstimateStandard Errort ValuePr > |t| Hazard Ratio BodyWeight Smoke Smoke Smoke Smoke Type III Tests of Model Effects EffectNum DFDen DFF ValuePr > F BodyWeight Smoke Estimate LabelEstimateStandard ErrorDFt ValuePr > |t|Exponentiated Row

22 Special Model Types: Categorical Regression on Categorical Data Regression Type: Continuous, linear Regression Type: Continuous, linear A generalization of continuous methods to categorical data, performs linear regression and other analyses on data than can be expressed in a contingency tables A generalization of continuous methods to categorical data, performs linear regression and other analyses on data than can be expressed in a contingency tables Supports both ordinary and logistic regression, log- linear and repeated measures Supports both ordinary and logistic regression, log- linear and repeated measures SAS = CATMOD, R = catdata, vgam SAS = CATMOD, R = catdata, vgam

23 Regression on Categorical Data Eample: Bartlett's Data, No 3-Variable Interaction Example: Bartlett's Data, No 3-Variable Interaction Example output from regression on categorical data, with summary statistics and model parameters PROC CATMOD sas.support.com, example 28.4 Data Summary ResponseLength*Time*StatusResponse Levels8 Weight VariablewtPopulations1 Data SetBARTLETTTotal Frequency960 Frequency Missing0Observations8 Response Profiles ResponseLengthTimeStatus Maximum Likelihood Analysis of Variance SourceDFChi-SquarePr > ChiSq Length Time Length*Time Status148.94<.0001 Length*Status148.94<.0001 Time*Status195.01<.0001 Likelihood Ratio

24 Special Model Types: Complex Optimization Response Surface Regression Regression Type: Continuous, linear Regression Type: Continuous, linear Linear regression for fitting quadratic Response Surface Models – a type of general linear model that identifies where optimal response values occur more efficiently than ordinary regression or GLM Linear regression for fitting quadratic Response Surface Models – a type of general linear model that identifies where optimal response values occur more efficiently than ordinary regression or GLM Output displays the Response Surface and identifies ridges of optimum response Output displays the Response Surface and identifies ridges of optimum response SAS = RSREG, R = rsm SAS = RSREG, R = rsm

25 Response Surface Regression Eample: A Response Surface with Optimal Solution Example: A Response Surface with Optimal Solution An example of a response surface with the optimal solution found at the minimum; multiple minima and maxima are possible Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY

26 Special Model Types: Time to Failure Survival Analysis Regression Type: Continuous, linear Regression Type: Continuous, linear Models time to failure data as a linear combination of predictors and a random disturbance term, which can be described by many different distributions Models time to failure data as a linear combination of predictors and a random disturbance term, which can be described by many different distributions Supports standard survival analysis data censored on the right, left, both or neither Supports standard survival analysis data censored on the right, left, both or neither SAS = LIFEREG, R = survival SAS = LIFEREG, R = survival

27 Survival Analysis Eample: A Cumulative Hazard Model Example: A Cumulative Hazard Model This example plots the log-logistic vs. the Kaplan-Meier Cumulative Hazard Quantile regression with PROC QUANTREG Peter L. Flom, Peter Flom Consulting, New York, NY

28 Special Model Types: Time-dependent Risk Proportional Hazards Model Regression Type: Continuous, linear Regression Type: Continuous, linear Cox Proportional Hazards modeling, where the a unit increase in a predictor multiplies the risk by a factor determined by the model Cox Proportional Hazards modeling, where the a unit increase in a predictor multiplies the risk by a factor determined by the model Supports proportional hazards models with data censored on the right, left, both or neither, variable selection by multiple methods incl. best subset Supports proportional hazards models with data censored on the right, left, both or neither, variable selection by multiple methods incl. best subset SAS = PHREG, R = coxph SAS = PHREG, R = coxph

29 Proportional Hazards Model Eample: Model With Time-Dependent Predictors Example: Model With Time-Dependent Predictors Example output from a Proportional Hazards model, with summary statistics and model parameters

30 Special Model Types: Simultaneous Outcomes Structural Equation Models Regression Type: Continuous, linear Regression Type: Continuous, linear In Structural Equation Modeling, a linear combination of predictors describes a vector equal to a linear combination of outcome variables In Structural Equation Modeling, a linear combination of predictors describes a vector equal to a linear combination of outcome variables Supports latent variables, multiple and multivariate regression, path analysis and canonical correlation Supports latent variables, multiple and multivariate regression, path analysis and canonical correlation SAS = CALIS, R = sem SAS = CALIS, R = sem

31 Structural Equation Model Eample: Linear Relations among Factor Loadings Example: Linear Relations among Factor Loadings Example output from a Structural Equation model, with matrices of model parameters

32 Discrete Outcomes: Simple Logistic Logistic Regression Regression Type: binary & ordinal outcomes, linear Regression Type: binary & ordinal outcomes, linear General procedure for logistic regression with a number of options; other procedures may offer more capabilities for specific types of discrete models General procedure for logistic regression with a number of options; other procedures may offer more capabilities for specific types of discrete models Supports many model variable selection methods and diagnostic tests Supports many model variable selection methods and diagnostic tests SAS = LOGISTIC, R = glm function SAS = LOGISTIC, R = glm function

33 Discrete Outcomes: Simple Logistic Logistic Regression Example data and output from a Logistic Regression model, with summary statistics and model parameters Data: IDRE / UCLA

34 Discrete Outcomes: Generalized General Linear Models Regression Type: discrete outcomes, linear Regression Type: discrete outcomes, linear Generalized linear models with discrete outcomes, appropriate where the data are not normally distributed or the variance is not the same for all observations Generalized linear models with discrete outcomes, appropriate where the data are not normally distributed or the variance is not the same for all observations Supports Poisson Regression and Repeated Measures Supports Poisson Regression and Repeated Measures SAS = GENMOD, R = glm function SAS = GENMOD, R = glm function

35 Discrete Outcomes: Generalized General Linear Models Example output from a General Linear Regression model of a discrete outcome, with summary statistics and model parameters

36 Discrete Outcomes: Outcome Probability PROBIT Models Regression Type: discrete outcomes, linear Regression Type: discrete outcomes, linear Models the probability that an observation will have a particular outcome Models the probability that an observation will have a particular outcome Supports probit, logit, ordinal logistic, and extreme value / gompit Supports probit, logit, ordinal logistic, and extreme value / gompit SAS = PROBIT, R = glm, family = binomial(link = "probit") SAS = PROBIT, R = glm, family = binomial(link = "probit")

37 Discrete Outcomes: Outcome Probability PROBIT Models Example data and output from a PROBIT model, with summary statistics and model parameters

38 Non-Linear Models: General Non-Linear Models Regression Type: non-linear Regression Type: non-linear Performs non-linear regression with the dependent variable divided into a mean component and a (random) error component; process is iterative Performs non-linear regression with the dependent variable divided into a mean component and a (random) error component; process is iterative Supports steepest-descent, Newton, modified Gauss- Newton and Marquardt methods Supports steepest-descent, Newton, modified Gauss- Newton and Marquardt methods SAS = NLIN, R = nls function, nleqslv SAS = NLIN, R = nls function, nleqslv

39 Non-Linear Models Eample: Fitting a Model to a Complex Curve Example: Fitting a Model to a Complex Curve In this example observations are normally distributed about a non-linear function – in this case, a Morlet wavelet

40 Non-Linear Models: Mixed Effects Non-Linear Mixed-Effects Models Regression Type: non-linear Regression Type: non-linear Performs non-linear regression where both the mean and errors components of the dependent variable are non-linear; process uses a Taylor series expansion about zero Performs non-linear regression where both the mean and errors components of the dependent variable are non-linear; process uses a Taylor series expansion about zero Supports normal, binomial and Poisson distributions and capability for programing a general distribution Supports normal, binomial and Poisson distributions and capability for programing a general distribution SAS = NLMIXED, R = nlme SAS = NLMIXED, R = nlme

41 Non-Linear Mixed-Effects Models Eample: Plot of Profile of Trees Over Time Example: Plot of Profile of Trees Over Time In this example, variability the shape of observed trees increases over time

42 Linear Mixed: Fixed and Random Effects Mixed Models Regression Type: linear, fixed and random effects Regression Type: linear, fixed and random effects Performs linear regression using a linear combination of fixed effects added to a second linear combination of random effects Performs linear regression using a linear combination of fixed effects added to a second linear combination of random effects Supports repeated measures in longitudinal studies; especially useful for dealing with missing data Supports repeated measures in longitudinal studies; especially useful for dealing with missing data SAS = MIXED, R = lme4, coxme SAS = MIXED, R = lme4, coxme

43 Linear Mixed-Effects Models Eample: Repeated Measures Example: Repeated Measures Example of a Mixed Effects Model, incorporating both fixed and random effects to improve the predictive power

44 Linear Mixed: General General Mixed Models Regression Type: linear mixed Regression Type: linear mixed Generalization of mixed models to permit normally- distributed random effects and non-normal error terms Generalization of mixed models to permit normally- distributed random effects and non-normal error terms Supports fitting models to correlated data or where the variability is not constant Supports fitting models to correlated data or where the variability is not constant SAS = GLIMMIX, R = lme4 SAS = GLIMMIX, R = lme4

45 General Mixed Models Eample: Crossed Random Effects Example: Crossed Random Effects LOESS with crossed random effects analyzes in-breeding in an isolated population, allowing generalization to all populations

46 Non-Parametric Models: Localized Local Regression Regression Type: linear, non-parametric Regression Type: linear, non-parametric Develops a model using non-parametric regression to segments of data and calculates confidence limits for Develops a model using non-parametric regression to segments of data and calculates confidence limits for the outcome; computationally intensive the outcome; computationally intensive Supports multiple dependent variables, multidimensional predictors and interpolation using kd trees Supports multiple dependent variables, multidimensional predictors and interpolation using kd trees SAS = LOESS, R = locfit SAS = LOESS, R = locfit

47 Local Regression Eample: Periodicities in Weather Data Example: Periodicities in Weather Data In this example, Local Regression is used to identify potential periodicities at 12 and 42 months

48 Non-Parametric Models: Additive Generalized Additive Models Regression Type: linear, non-parametric Regression Type: linear, non-parametric Generalized Additive Models, with multiple independent non-parametric predictors; univariate smoothing provides finer details than is possible with the piece-wise LOESS procedure Generalized Additive Models, with multiple independent non-parametric predictors; univariate smoothing provides finer details than is possible with the piece-wise LOESS procedure Supports non-parametric and semi-paramentric models, multidimensional predictors Supports non-parametric and semi-paramentric models, multidimensional predictors SAS = GAM, R = gam SAS = GAM, R = gam

49 Additive Model Eample: Segmented Response Surface Example: Segmented Response Surface An Additive Model used to fit a complex response surface without loss of detail to due piece-wise fitting in local regression

Top DSC Resources

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge