Residual Sum of Squares (RSS)

Key takeaways
– RSS measures the total squared difference between observed values and values predicted by a regression model; smaller RSS indicates a closer fit.
– RSS is computed as the sum of squared residuals: RSS = Σ(yi − ŷi)^2.
– RSS is closely related to RSE, TSS, SSE, and R^2 but is not itself normalized for sample size or number of parameters.
– Minimizing RSS is the objective of ordinary least squares (OLS); care is needed to avoid overfitting and to handle outliers or heteroscedasticity.
– Practical computation can be done by hand for small datasets or quickly in Excel, R, or Python for larger ones.

What RSS is (intuitively)
The residual for an observation is the difference between the actual outcome (yi) and the model’s prediction (ŷi). RSS aggregates those differences by squaring them and summing across all observations. Squaring serves two purposes: it makes every contribution nonnegative and penalizes larger errors more heavily. RSS therefore quantifies the variation in the dependent variable that the model fails to explain.

Mathematical definition
RSS = Σ(yi − ŷi)^2, where:
– yi is the observed value for observation i,
– ŷi is the predicted value from the model for observation i,
– the sum is over all observations i = 1…n.

Note: In many texts SSE (sum of squared errors) is used interchangeably with RSS.

How to calculate RSS — step-by-step (manual)
1. Fit your model and compute predicted values ŷi for each observation.
2. Compute residuals: ei = yi − ŷi.
3. Square each residual: ei^2.
4. Sum the squared residuals: RSS = Σ ei^2.

Worked numeric example
Observed values y: [3, 5, 4, 6]
Predicted values ŷ: [2.5, 5, 4, 6.5]

Residuals e = y − ŷ: [0.5, 0, 0, −0.5]
Squared residuals: [0.25, 0, 0, 0.25]
RSS = 0.25 + 0 + 0 + 0.25 = 0.5

Compute related measures from the same example:
– n = 4; degrees of freedom for a simple linear regression = n − 2 = 2.
– Residual standard error (RSE) = sqrt(RSS / (n − p)), where p is the number of estimated parameters (p = 2 for intercept + slope in simple linear regression).
RSE = sqrt(0.5 / 2) = 0.5.
– Total sum of squares (TSS) = Σ(yi − ȳ)^2. Here ȳ = 4.5, TSS = 5.
– R-squared = 1 − RSS/TSS = 1 − 0.5/5 = 0.90.

Practical steps for common tools
– Excel:
1. Put yi and xi in columns.
2. Fit a regression using Data → Data Analysis → Regression (or compute slope/intercept with SLOPE/INTERCEPT).
3. Compute predicted values: =INTERCEPT + SLOPE*xi.
4. Compute residuals: =yi − ŷi.
5. Compute squared residuals: =(residual)^2.
6. RSS = SUM(squared residuals).

• R:
1. Fit: model <- lm(y ~ x, data = df)
2. RSS: sum(resid(model)^2)
3. RSE (for general p parameters): sigma(model) (or sqrt(sum(resid(model)^2)/(n – p)))

• Python (pandas + statsmodels or scikit-learn):
1. Fit model with statsmodels.api.OLS or sklearn.linear_model.LinearRegression.
2. y_pred = model.predict(X)
3. rss = ((y – y_pred)**2).sum()
4. rse = np.sqrt(rss / (n – p))

How RSS is minimized (brief)
Ordinary least squares (OLS) chooses parameter values (e.g., intercept and slopes) that minimize RSS. For simple linear regression, closed-form formulas exist for the slope and intercept; for more complex models, numerical optimization (gradient descent, numerical solvers) is used to minimize RSS.

RSS vs. related quantities (clear comparisons)
– RSS vs. RSE: RSS is the total sum of squared residuals. RSE (residual standard error) scales RSS by the appropriate degrees of freedom and takes the square root: RSE = sqrt(RSS / (n − p)). RSE has the same units as the dependent variable and can be interpreted like a standard deviation of residuals.
– RSS vs. SSE: In many sources, SSE (sum of squared errors) = RSS (they are synonyms).
– RSS vs. TSS: TSS = Σ(yi − ȳ)^2 measures total variability in responses. TSS = RSS + ESS (explained sum of squares) when the model includes an intercept. R^2 = 1 − RSS/TSS.
– RSS vs. R-squared: RSS is an absolute measure (depends on scale and n). R^2 is a relative measure (proportion of variance explained) and normalizes RSS by TSS.

Can RSS be zero?
Yes, RSS = 0 is possible but uncommon in real data. It means the model predicts every observed yi exactly (yi = ŷi for all i). This happens for perfect fits, overfitting (e.g., using as many parameters as data points), or deterministic relationships with no noise.

Limitations and pitfalls of using RSS
– Scale and model complexity: RSS increases with sample variance and depends on units of measurement; you cannot compare RSS across datasets or models with different numbers of parameters without adjustment.
– Outlier sensitivity: Squaring residuals gives outliers disproportionate influence on RSS and hence on parameter estimates in OLS.
– Assumption dependence: Minimizing RSS via OLS relies on assumptions (linearity, independent errors, homoscedasticity, etc.). Violations can produce biased or inefficient estimates.
– Overfitting: A more complex model can reduce RSS without improving out-of-sample performance. Use penalized criteria (AIC/BIC), adjusted R^2, cross-validation, or regularization (ridge/lasso) to compare models.
– Limited interpretability: RSS is a single-number goodness-of-fit measure; it does not reveal model misspecification, omitted variable bias, or the nature of residual patterns.

Special considerations and robust alternatives
– Heteroscedasticity: If residual variance changes with predictors, consider weighted least squares (WLS) or robust standard errors.
– Outliers/influential points: Use robust regression (e.g., Huber, RANSAC, M-estimators) or inspect influence measures (Cook’s distance).
– Model selection: Prefer information criteria (AIC, BIC), adjusted R^2, or cross-validation error (e.g., mean squared error on holdout) for comparing models with different parameter counts.
– Regularization: Ridge and lasso minimize RSS plus a penalty term to reduce overfitting.

Practical checklist for a regression analysis using RSS
1. Plot the data and initial fit; inspect residuals vs. fitted values and QQ plots for normality.
2. Compute RSS and RSE to quantify fit.
3. Compare RSS-derived metrics across candidate models using adjusted R^2, AIC/BIC, or cross-validation rather than raw RSS.
4. Check for outliers and influential points; decide whether to investigate, remove, or use robust methods.
5. Test model assumptions (linearity, independence, homoscedasticity); if violated, apply appropriate remedies (transformations, WLS, generalized linear models).
6. Validate predictive performance on withheld data or via k-fold cross-validation.

Use in finance and applied settings
– Investors and analysts commonly fit regressions (e.g., return predictors, factor models) and use RSS-related measures to assess fit.
– Because financial data often contains heteroscedasticity, autocorrelation, and outliers, supplement RSS-based assessment with robust diagnostics and out-of-sample testing.
– In algorithmic contexts (machine learning), the analogue of RSS is mean squared error (MSE) used as a loss function; practitioners often regularize or cross-validate to prevent overfitting.

Fast fact
Minimizing RSS is the principle behind the ordinary least squares estimator; OLS yields closed-form solutions for linear models and numerical solutions for nonlinear models.

The bottom line
RSS is a fundamental, easy-to-understand measure of the unexplained variation left by a regression model. It is central to OLS estimation and to many diagnostic metrics (RSE, R^2), but it must be interpreted carefully: RSS is sensitive to scale, outliers, and model complexity. For robust model building and comparison, combine RSS-based measures with diagnostics, regularization, and validation techniques.

Source
This article is based on standard regression theory and concepts summarized in: Investopedia — “Residual Sum of Squares (RSS)” (Zoe Hansen). Original source

Editor’s note: The following topics are reserved for upcoming updates and will be expanded with detailed examples and datasets.