\r\n| It requires training labeled data.<\/td>\r\n | It does not require labeled data.<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n\r\n \t- What is Bias?<\/strong><\/li>\r\n<\/ol>\r\n
It is the error introduced in the model due to the simplification of the Machine Learning Algorithm. This can lead to underfitting. If you train the model at the time model, it makes simplified assumptions for the target to function easier to understand.<\/p>\r\n\r\n \r\n \t- What are the low and high Bias Machine Learning algorithms?<\/strong><\/li>\r\n<\/ol>\r\n
\r\n \t- Low bias Machine learning algorithms include Decision Trees, SVM, and k-NN.<\/li>\r\n \t
- High bias Machine learning algorithms include Linear Regression and Logistic Regression. These are the Common Data Science Interview Questions that are asked to a fresher in an Interview.<\/li>\r\n<\/ul>\r\n
\r\n \t- What are the different kernel’s functions in SVM?<\/strong><\/li>\r\n<\/ol>\r\n
There are four types of kernels in SVM.<\/p>\r\n\r\n \r\n \t- Linear Kernel<\/li>\r\n \t
- Sigmoid kernel<\/li>\r\n \t
- Polynomial kernel<\/li>\r\n \t
- Radial basis kernel<\/li>\r\n<\/ul>\r\n
\r\n \t- What are Recurrent Neural Networks[RNNs]?<\/strong><\/li>\r\n<\/ol>\r\n
The Recurrent nets are the type of artificial neural networks that are designed to recognize the pattern from the sequence of data like Time series, Stock market, and government agencies, etc. For understanding the recurrent nets, initially, you have to understand the basics of the feed-forward nets. These networks RNN and the feed-forward are named after the way channel information is set through a series of mathematical operations that are performed at the nodes of a network.<\/p>\r\n\r\n \r\n \t- What is \u2018Naive\u2019 in a Naive Bayes?<\/strong><\/li>\r\n<\/ol>\r\n
The Naive Bayes Algorithms are based on the Bayes Theorem. The Bayes<\/p>\r\n theorem describes a probability of the event, based on the prior knowledge of conditions which may be related to the event.<\/p>\r\n\r\n \r\n \t- What is Boosting?<\/strong><\/li>\r\n<\/ol>\r\n
Boosting is the iterative technique that is used for adjusting the weight of the observation depending upon the last classification. It is observed that the classification was classified incorrectly and this tries to increase the weight of this observation. Boosting is a common bias error that builds strong predictive models.<\/p>\r\n\r\n \r\n \t- What is Bagging?<\/strong><\/li>\r\n<\/ol>\r\n
It tries to implement similar learners on the small sample population and then it takes the means of every prediction. In generalized bagging, we could use different learners on a different population. Since this expect us to reduce the variance error.<\/strong> These are Basic Data Science Interview Questions that are asked to a fresher in an Interview.<\/p>\r\n\r\n\r\n \t- List the classification of Algorithms?<\/strong><\/li>\r\n<\/ol>\r\n
The classification of Algorithms are as follows<\/p>\r\n\r\n \r\n \t- SVM<\/li>\r\n \t
- Linear<\/li>\r\n \t
- Quadratic<\/li>\r\n \t
- Decision Trees<\/li>\r\n \t
- Neural Networks<\/li>\r\n \t
- Kernel Estimation<\/li>\r\n<\/ul>\r\n
\r\n \t- What is a Linear Regression?<\/strong><\/li>\r\n<\/ol>\r\n
Linear Regression is the statistical technique, here the score of a variable Y is predicted based on the score of the second variable X.<\/p>\r\n Generally, X is known as the predictor variable and Y is referred to as the criterion variable.<\/p>\r\n\r\n \r\n \t- How are the outlier values are treated?<\/strong><\/li>\r\n<\/ol>\r\n
It is identified using univariate or other graphical analysis methods. In case the number of outlier values are few, then they could be assessed individually for many numbers of outliers. The values shall be substituted in the 99th or the 1st percentile values.<\/p>\r\n\r\n \r\n \t- Write the steps involved in the Analytics project?<\/strong><\/li>\r\n<\/ol>\r\n
\r\n \t- First, understanding the Business problem.<\/li>\r\n \t
- Exploring the data and being familiar with it.<\/li>\r\n \t
- Preparing the data for modeling through detecting outliers, transforming variables, and treating missing values.<\/li>\r\n \t
- After the data preparation, we should start running the model, analyze the results and tweak the approach.<\/li>\r\n \t
- Validating the model using the new data set.<\/li>\r\n \t
- Implementing the model and tracking the results for analyzing the performance model over a while.<\/li>\r\n<\/ul>\r\n
\r\n \t- What are the common ways to treat the outlier value?<\/strong><\/li>\r\n<\/ol>\r\n
Not all extreme values are outlier values. The common ways to treat the outlier values are<\/p>\r\n\r\n \r\n \t- Changing the value and bringing it within a range.<\/li>\r\n \t
- Or by just removing the values. These are the Basic Data Science Interview Questions that are asked to a fresher in an Interview.<\/li>\r\n<\/ul>\r\n
\r\n \t- Give some examples where the collaborative filtering concepts are used?<\/strong><\/li>\r\n<\/ol>\r\n
The collaborative filtering concept is used in recommending movies on Netflix, BookMyShow, IMDB, and product recommenders in e-commerce sites such as Amazon, Flipkart, YouTube, and eBay. Also, this is used in gaming recommendations in Xbox.<\/p>\r\n\r\n \r\n \t- When working on the Data Set, How do you choose the important variables? Explain it?<\/strong><\/li>\r\n<\/ol>\r\n
By following the methods we can select the important variables,<\/p>\r\n\r\n \r\n \t- Removing the correlated variables before selecting the important variables.<\/li>\r\n \t
- Using the linear regression and selecting the variables based on p values.<\/li>\r\n \t
- By using Forward Selection, Backward Selection, or Stepwise Selection.<\/li>\r\n \t
- Using the Random Forest, plot variable importance chart and Xgboost.<\/li>\r\n \t
- Using the Lasso Regression.<\/li>\r\n \t
- Measuring the information gain for the available set features and choosing the top features accordingly.<\/li>\r\n<\/ul>\r\n
\r\n \t- What is the time interval in Selection bias?<\/strong><\/li>\r\n<\/ol>\r\n
The trail could be terminated at an extreme value (usually for ethical reasons) the extreme value is likely to be reached by the variables with larger variance, though all the variables have a similar mean.<\/p>\r\n\r\n \r\n \t- Differentiate between covariance and correlation.\u00a0<\/strong><\/li>\r\n<\/ol>\r\n
\r\n\r\n\r\nCovariance<\/strong><\/td>\r\nCorrelation<\/strong><\/td>\r\n<\/tr>\r\n\r\n| They are the standardized form of covariance.<\/td>\r\n | They are difficult to compare. For example, when we calculate the covariances of salary($) and age (years) we would get different covariances that can\u2019t be compared as it has unequal scales. For combating a situation like that, we need the correlation value to be calculated between -1 and 1 values irrespective of their scales.<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n\r\n \t- Can we secure the correlation between the continuous and categorical variables? Is yes, explain?<\/strong><\/li>\r\n<\/ol>\r\n
Yes, we could use ANCOVA (analysis of covariance) technique to capture the association between the continuous and categorical variables. These are the Common Data Science Interview Questions that are asked to freshers and experienced in an Interview.<\/p>\r\n\r\n \r\n \t- Explain the Data Type in the Selection Bias.<\/strong><\/li>\r\n<\/ol>\r\n
While the specific subsets of data are chosen for supporting a conclusion or the rejection of bad data on the arbitrary grounds, instead of previously stated or generally agreed on criteria.<\/p>\r\n\r\n \r\n \t- Are the True Positive Rate and Recall is related? Write the equation.<\/strong><\/li>\r\n<\/ol>\r\n
Yes, they are related. True Positive Rate = Recall. The formula is (TP\/TP + FN).<\/p>\r\n\r\n \r\n \t- How will you deal with the missing values in a given set of data?<\/strong><\/li>\r\n<\/ol>\r\n
By assigning a unique category for the missing values and knowing that the missing values may decipher some trend. Or we could remove them abruptly. Also, we could sensibly check their distribution with the target variable and finding the pattern for the missed value and assigning them to a new category while removing the others.<\/p>\r\n\r\n \r\n \t- Which cross-validation technique you would use on the time series data set, k-fold or LOOCV?<\/strong><\/li>\r\n<\/ol>\r\n
We won’t use either of these.<\/p>\r\n\r\n \r\n \t- Why you won’t use k-fold or LOOCV?<\/strong><\/li>\r\n<\/ol>\r\n
On the time series problem, k fold could be troublesome as there may be some pattern in the year 4 or 5 that is not in the year 3. Resampling those data set would separate the trends and may end up validation in the past years that is incorrect. Rather we could use the forward chaining strategy with 5 fold. These are the Common Data Science Interview Questions that are asked to a fresher in an Interview.<\/p>\r\n\r\n \r\n \t- Explain the Central limit theorem and it’s essential?<\/strong><\/li>\r\n<\/ol>\r\n
The Central Limit theorem is defined as the statistical theory that indicates a large sample size from the available population with a definite level of variance. It means the mean of the sample population is exactly equal to the mean of the total population.<\/p>\r\n\r\n \r\n \t- What are the types of Sampling methods?<\/strong><\/li>\r\n<\/ol>\r\n
The types of Sampling are as follows<\/p>\r\n\r\n \r\n \t- Cluster sampling<\/li>\r\n \t
- Stratified sampling<\/li>\r\n \t
- Multistage sampling<\/li>\r\n \t
- Systematic sampling<\/li>\r\n \t
- Simple random sampling method<\/li>\r\n<\/ul>\r\n
\r\n \t- What are the algorithms that are used in Supervised Learning?<\/strong><\/li>\r\n<\/ol>\r\n
\r\n \t- Regression<\/li>\r\n \t
- Naive Bayes<\/li>\r\n \t
- Decision Trees<\/li>\r\n \t
- Neural Networks<\/li>\r\n \t
- Support Vector Machines<\/li>\r\n \t
- K-nearest Neighbor Algorithm<\/li>\r\n<\/ul>\r\n
\r\n \t- What is the ROC curve?<\/strong><\/li>\r\n<\/ol>\r\n
The term ROC stands for Receiver Operating Characteristic This is basically a plot between the true and false positive rate. It helps in finding out the correct trade-off between the true and false-positive rates for various probability thresholds of the predicted values. Closer the curve to the upper left corner, the better the model is. In simple, whichever curve has wider areas under it would be a better model.<\/p>\r\n\r\n \r\n \t- What algorithm is used in Unsupervised learning?<\/strong><\/li>\r\n<\/ol>\r\n
\r\n \t- Clustering<\/li>\r\n \t
- Neural Networks<\/li>\r\n \t
- Anomaly Detection<\/li>\r\n \t
- Latent Variable Models.These are the Basic Data Science Interview Questions that are put forth to a fresher in an Interview.<\/li>\r\n<\/ul>\r\n
\r\n \t- What are the various types of sorting algorithms that are available in the R language for Data Science?<\/strong><\/li>\r\n<\/ol>\r\n
There are three types of Algorithms available and they are,<\/p>\r\n\r\n \r\n \t- Bubble<\/li>\r\n \t
- Insertion<\/li>\r\n \t
- Selection Sorting<\/li>\r\n<\/ul>\r\n
\r\n \t- Write the command that is used for storing R objects in a file?<\/strong><\/li>\r\n<\/ol>\r\n
save (x, file=\u201dx.Rdata\u201d)<\/p>\r\n\r\n \r\n \t | | | |