Avatto>>DATA SCIENTIST>>SHORT QUESTIONS>>Data Scientist Tricky Interview Questions
The goodness of fit measure. Variance explained by the regression / total variance Remember, the more predictors you add the higher R^2 becomes. Hence use adjusted R^2 which adjusts for the degrees of freedom or train error metrics
Statistically, It depends on the quality of your data, for example, if your data is biased, just getting more data won’t help. It depends on your model. If your model suffers from high bias, getting more data won’t improve your test results beyond a point. You’d need to add more features, etc.
Practically, Also there’s a tradeoff between having more data and the additional storage, computational power, memory it requires. Hence, always think about the cost of having more data.
Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression model are highly linearly related.
Leave the model as is, despite multicollinearity. The presence of multicollinearity doesn’t affect the efficiency of extrapolating the fitted model to new data provided that the predictor variables follow the same pattern of multicollinearity in the new data as in the data on which the regression model is based.
principal component regression
There are many tests, few are:-
-A/B Test
-Student’s T Test
-Chi Square Test
-Fisher’s Exact Test
-Mann-Whitney Test