π Regression Analysis
Definition: A statistical method for estimating the relationships between a dependent variable (outcome) and one or more independent variables (predictors). It is the foundation of quantitative business analysis.
Key courses: Wharton STAT 613, Booth BUSN 41100, HBS Analytics
π The Core Intuition
Question regression answers: βHow does Y change when X changes, holding everything else constant?β
This βholding everything else constantβ (ceteris paribus) is what makes regression powerful β it controls for confounders.
π Simple Linear Regression
| Term | Meaning |
|---|---|
Y | Dependent variable (what weβre predicting) |
X | Independent variable (the predictor) |
Ξ²β | Intercept (Y when X = 0) |
Ξ²β | Slope (change in Y for 1-unit change in X) |
Ξ΅ | Error term (unexplained variation) |
Example: Predicting Sales from Advertising Spend
Interpretation: Each additional 3.2K more in sales.
π Multiple Regression
Controls for multiple variables simultaneously:
π Evaluating a Regression
RΒ² (R-squared) β Goodness of Fit
- RΒ² = 0: Model explains nothing
- RΒ² = 1: Model explains everything perfectly
- RΒ² = 0.72: Model explains 72% of variation in Y
Warning: RΒ² always increases when adding variables β use Adjusted RΒ² for multiple regression.
Statistical Significance (p-value)
- p < 0.05: The coefficient is statistically significant β X reliably predicts Y
- p > 0.05: Could be noise
Confidence Intervals
- 95% CI: The true Ξ²β lies in this range with 95% confidence
- Narrow CI = more precise estimate
β οΈ Assumptions (OLS)
The Ordinary Least Squares (OLS) estimator requires:
| Assumption | If Violated |
|---|---|
| Linearity | Use polynomial or log transformation |
| Independence of errors | Autocorrelation in time series data β use time series models |
| Homoscedasticity (constant variance) | Heteroscedasticity β use robust standard errors |
| No multicollinearity | Correlated predictors β unstable coefficients |
| Normality of errors | Mostly needed for small samples |
π’ Logistic Regression (Binary Outcomes)
When Y is binary (0 or 1), use logistic regression:
- Outputs a probability between 0 and 1
- Common in: Credit default prediction, customer churn, fraud detection, disease diagnosis
πΌ Business Applications
| Business Problem | Regression Type | Y Variable |
|---|---|---|
| Sales forecasting | Multiple regression | Revenue |
| Price elasticity | Log-log regression | Quantity sold |
| Customer churn | Logistic regression | Churned (0/1) |
| House pricing model | Multiple regression | Home price |
| Ad attribution | Multiple/ridge regression | Conversions |
| Credit scoring | Logistic regression | Default (0/1) |
π§ Key Business Interpretation Rules
- Coefficients show marginal effects β the impact of one variable holding all others fixed
- Signs matter: Positive Ξ² = positive relationship, negative Ξ² = negative
- Scale matters: Compare standardized coefficients for relative importance
- Causation β correlation: Regression shows association; need good design for causality (A-B Testing)
- Out-of-sample validity: Always test on held-out data
π Connected Concepts
- A-B Testing β When you need causal inference, not just correlation
- Monte Carlo Simulation β Simulation-based complement to regression
- Hypothesis Testing β p-values and confidence intervals underlying regression
- Decision Trees β Non-linear alternative for prediction
- Behavioral Economics Overview β Regression can confirm behavioral biases
β π Data & Analytics MOC | Related: A-B Testing Β· Hypothesis Testing Β· Decision Trees