Brushing up the Basics: Linear Regression

5 min readOct 17, 2022

In this series of “Brushing up the Basics”, let us explore the possible interview questions in each machine learning and deep learning models.

1. Explain linear regression and its assumptions?

Linear regression is a supervised machine learning algorithm finds the best fit line, probably a straight line to establish a linear relationship between the independent variables/predictors and dependent variable/target (continuous values). It is mostly done using sum of squares residuals method or ordinary least squares method. The major assumption of linear regression are,

· Linearity: There exists a linear relationship between the independent variables and the dependent variable.

· Additive: The effect of changes in value of one independent variable on the target variable does not depend on the values of other independent variables

· Multicollinearity: The independent variables are independent of each other.

· Homoscedasticity: The error term has constant variance.

· Normality: The error follows a normal distribution.

· Independence: The observations are independent of each other

2. How to calculate error in linear regression?

· Measure the difference between the actual and predicted target values for each value of x.

· Square each distance and calculate the mean of each squared distance.

Mean Squared Error

Smaller the MSE, the closer we are to find the best fit line.

3. List out some evaluation metrics for regression model.

i. Mean squared error:

· Squared difference between actual and predicted values.

· Differentiable function and easy to perform math operations.

· Bigger penalties to outliers to minimize loss.

· Easy to apply optimizers.

ii. Root mean squared error:

· Square root of mean squared error.

· Not robust to outliers.

· Differentiable function.

· Easy to apply optimizers to minimize loss.

iii. Mean absolute error:

· Absolute difference between actual and predicted values.

· Robust to outliers.

· Non-differentiable function.

iv. R-squared:

· Coefficient of determination.

· Measures how close the data are to the fitted regression line.

· Ranges from 0 (bad prediction) and 1 (perfect prediction).

· Increases with the addition of new features irrespective of its usefulness.

v. Adjusted R-squared:

· A modified version of R-squared that has been adjusted for the number of predictors in the model.

· Value increases or decreases based on the usefulness of the new feature added.

4. What scenario would you prefer to use Gradient Descent instead of Ordinary Least Square Regression and why?

OLS is computationally expensive. Therefore, for complex problems involving larger datasets, gradient descent is the preferred optimization algorithm.

5. How does gradient descent work in linear regression?

· Start with initializing random values for all the coefficients

· Compute the sum of squared errors for each pair of input and output values (loss function), using learning rate as scaling factor

· For each iteration, update the coefficients in the direction towards minimizing the error

· Iterate until minimal sum of squared errors is achieved or no further improvement is possible.

6. Is linear regression suitable for time series analysis?

Linear regression can be used for time series results and yields workable results but does not provide remarkable performance as

· Time series data exhibits seasonality (peak hours, festive seasons, etc.) which might be treated as outliers in linear regression.

· Time series data are mainly used for future prediction which requires extrapolation and therefore linear regression rarely results in good predictions.

7. What are the limitations of linear regression?

· Due to assumptions of linearity, it fails to fit for complex problems

· Sensitive to outliers

· Affected by multicollinearity

8. What is the use of regularization?

A technique used to handle overfitting problems through balancing the bias and variance. Regularization adds penalty as the complexity of the model increases. The regularization parameter (lambda), a hyperparameter penalizes all the parameters except the intercept so that the model generalizes the data well and does not overfit. The most common regularization techniques are,

L1 or Least Absolute Shrinkage and Selection Operator (LASSO) regression: Adds absolute value of magnitude of coefficient as penalty term to the cost function. Along with shrinking the coefficients, LASSO performs feature selection as the less important coefficients become exactly zero.

L2 or Ridge regression: Adds squared magnitude of coefficient as penalty term to the cost function. It forces the weights to be small but not zero. L2 is not robust to outliers due to squaring the difference and it performs better with weights of roughly equal size. Moreover, L2 can learn complex data patterns.

9. How do you know that linear regression is suitable for any given data?

Use scatter plot for simple linear regression. In case of multivariate linear regression use 2D pairwise scatter plot, rotating plot and dynamic graph. If the relationship look linear, opt for linear model else perform some transformations to make it linear.

10. If you have only one independent variable, how many coefficients will you require to estimate in a simple linear regression model?

Two, simple linear regression with one independent variable (y=a+bx) requires two coefficients a and b.

These are some basic interview question in linear regression. Rest of the topics related to bias, variance, overfitting, datasets, etc. have already been discussed in the previous posts. For your reference,

Brushing up the Basics: Linear Regression

Written by Nivethitha Somu

No responses yet