Week 1
Question 1: Supervised learning deals with unlabeled data, while unsupervised learning deals with labelled data.
- True
- False
Question 2: Which of the following is not true about Machine Learning?
- Machine Learning was inspired by the learning process of human beings.
- Machine Learning models iteratively learn from data, and allow computers to find hidden insights.
- Machine Learning models help us in tasks such as object recognition, summarization, and recommendation.
- Machine learning gives computers the ability to make decision by writing down rules and methods and being explicitly programmed.
Question 3: Which of the following groups are not Machine Learning techniques?
- Classification and Clustering
- Numpy, Scipy and Scikit-Learn
- Anomaly Detection and Recommendation Systems
Question 4: The “Regression” technique in Machine Learning is a group of algorithms that are used for:
- Predicting a continuous value; for example predicting the price of a house based on its characteristics.
- Prediction of class/category of a case; for example a cell is benign or malignant, or a customer will churn or not.
- Finding items/events that often co-occur; for example grocery items that are usually bought together by a customer.
Question 5: When comparing Supervised with Unsupervised learning, is this sentence True or False?
In contrast to Supervised learning, Unsupervised learning has more models and more evaluation methods that can be used in order to ensure the outcome of the model is accurate.
- False
- True
Week 2
Question 1: Multiple Linear Regression is appropriate for:
- Predicting the sales amount based on month
- Predicting whether a drug is effective for a patient based on her characterestics
- Predicting tomorrow’s rainfall amount based on the wind speed and temperature
Question 2: Which of the following is the meaning of “Out of Sample Accuracy” in the context of evaluation of models?
- “Out of Sample Accuracy” is the percentage of correct predictions that the model makes on data that the model has NOT been trained on.
- “Out of Sample Accuracy” is the accuracy of an overly trained model (which may captured noise and produced a non-generalized model)
Question 3: When should we use Multiple Linear Regression?
- When we would like to predict impacts of changes in independent variables on a dependent variable.
- When there are multiple dependent variables
- When we would like to identify the strength of the effect that the independent variables have on a dependent variable.
Question 4: Which of the following statements are TRUE about Polynomial Regression?
- Polynomial regression can use the same mechanism as Multiple Linear Regression to find the parameters.
- Polynomial regression fits a curve line to your data.
- Polynomial regression models can fit using the Least Squares method.
Question 5: Which sentence is NOT TRUE about Non-linear Regression?
- Nonlinear regression is a method to model non linear relationship between the dependent variable and a set of independent variables.
- For a model to be considered non-linear, y must be a non-linear function of the parameters.
- Non-linear regression must have more than one dependent variable.
Week 3
Question 1: Which one IS NOT a sample of classification problem?
- To predict the category to which a customer belongs to.
- To predict whether a customer switches to another provider/brand.
- To predict the amount of money a customer will spend in one year.
- To predict whether a customer responds to a particular advertising campaign or not.
Question 2: Which of the following statements are TRUE about Logistic Regression? (select all that apply)
- Logistic regression can be used both for binary classification and multi-class classification
- Logistic regression is analogous to linear regression but takes a categorical/discrete target field instead of a numeric one.
- In logistic regression, the dependent variable is binary.
Question 3: Which of the following examples is/are a sample application of Logistic Regression? (select all that apply)
- The probability that a person has a heart attack within a specified time period using person’s age and sex.
- Customer’s propensity to purchase a product or halt a subscription in marketing applications.
- Likelihood of a homeowner defaulting on a mortgage.
- Estimating the blood pressure of a patient based on her symptoms and biographical data.
Question 4: Which one is TRUE about the kNN algorithm?
- kNN is a classification algorithm that takes a bunch of unlabelled points and uses them to learn how to label other points.
- kNN algorithm can be used to estimate values for a continuous target.
Question 5: What is “information gain” in decision trees?
- It is the information that can decrease the level of certainty after splitting in each node.
- It is the entropy of a tree before split minus weighted entropy after split by an attribute.
- It is the amount of information disorder, or the amount of randomness in each node.
Week 4
Question 1: Which statement is NOT TRUE about k-means clustering?
- k-means divides the data into non-overlapping clusters without any cluster-internal structure.
- The objective of k-means, is to form clusters in such a way that similar samples go into a cluster, and dissimilar samples fall into different clusters.
- As k-means is an iterative algorithm, it guarantees that it will always converge to the global optimum.
Question 2: Which of the following are characteristics of DBSCAN? Select all that apply.
- DBSCAN can find arbitrarily shaped clusters.
- DBSCAN can find a cluster completely surrounded by a different cluster.
- DBSCANhas a notion of noise, and is robust to outliers.
- DBSCAN does not require one to specify the number of clusters such as k in k-means
Question 3: Which of the following is an application of clustering?
- Customer churn prediction
- Price estimation
- Customer segmentation
- Sales prediction
Question 4: Which approach can be used to calculate dissimilarity of objects in clustering?
- Minkowski distance
- Euclidian distance
- Cosine similarity
- All of the above
Question 5: How is a center point (centroid) picked for each cluster in k-means?
- We can randomly choose some observations out of the data set and use these observations as the initial means.
- We can create some random points as centroids of the clusters.
- We can select it through correlation analysis.
Week 5
Question 1: What is/are the advantage/s of Recommender Systems ?
- Recommender Systems provide a better experience for the users by giving them a broader exposure to many different products they might be interested in.
- Recommender Systems encourage users towards continual usage or purchase of their product
- Recommender Systems benefit the service provider by increasing potential revenue and better security for its consumers.
- All of the above.
Question 2: What is a content-based recommendation system?
- Content-based recommendation system tries to recommend items to the users based on their profile built upon their preferences and taste.
- Content-based recommendation system tries to recommend items based on similarity among items.
- Content-based recommendation system tries to recommend items based on the similarity of users when buying, watching, or enjoying something.
- All of above.
Question 3: What is the meaning of “Cold start” in collaborative filtering?
- The difficulty in recommendation when we do not have enough ratings in the user-item dataset.
- The difficulty in recommendation when we have new user, and we cannot make a profile for him, or when we have a new item, which has not got any rating yet.
- The difficulty in recommendation when the number of users or items increases and the amount of data expands, so algorithms will begin to suffer drops in performance.
Question 4: What is a “Memory-based” recommender system?
- In memory based approach, a recommender system is created using machine learning techniques such as regression, clustering, classification, etc.
- In memory based approach, a model of users is developed in attempt to learn their preferences.
- In memory based approach, we use the entire user-item dataset to generate a recommendation system.
Question 5: What is the shortcoming of content-based recommender systems?
- Users will only get recommendations related to their preferences in their profile, and recommender engine may never recommend any item with other characteristics.
- As it is based on similarity among items and users, it is not easy to find the neighbour users.
- It needs to find similar group of users, so suffers from drops in performance, simply due to growth in the similarity computation.