##### Follow Us :
>>>>>>Data Scientist Tricky Interview Questions
##### What is R2? What are some other metrics that could be better than R2 and why?
The goodness of fit measure. Variance explained by the regression / total variance Remember, the more predictors you add the higher R^2 becomes. Hence use adjusted R^2 which adjusts for the degrees of freedom or train error metrics
##### Is more data always better?
Statistically, It depends on the quality of your data, for example, if your data is biased, just getting more data won’t help. It depends on your model. If your model suffers from high bias, getting more data won’t improve your test results beyond a point. You’d need to add more features, etc.
Practically, Also there’s a tradeoff between having more data and the additional storage, computational power, memory it requires. Hence, always think about the cost of having more data.
##### You have several variables that are positively correlated with your response, and you think combining all of the variables could give you a good prediction of your response. However, you see that in the multiple linear regression, one of the weights on the predictors is negative. What could be the issue?
Multicollinearity refers to a situation in which two or more explanatory variables in a multiple regression model are highly linearly related.
Leave the model as is, despite multicollinearity. The presence of multicollinearity doesn’t affect the efficiency of extrapolating the fitted model to new data provided that the predictor variables follow the same pattern of multicollinearity in the new data as in the data on which the regression model is based.
principal component regression
##### What are the tests which are performed on data sets?
There are many tests, few are:-
-A/B Test
-Student’s T Test
-Chi Square Test
-Fisher’s Exact Test
-Mann-Whitney Test