数据科学是各种工具、算法和机器学习原理的融合,目标是从原始数据中发现隐藏模式。所以,数据科学主要用于决策和预测,利用预测性的因果分析、预测性分析(预测性加决策科学)和机器学习。
下面是一个数据科学考试代写的高分案例:
1. Support Vector Machines
(a) Set up and describe the optimization problem that a soft-margin support vector machine classifier solves.
(b) True or false. Please provide a reasoning. Which of the following are true of support vector machines?
(i) Increasing the hyperparameter C tends to decrease the training error
(ii) The hard-margin SVM is a special case of the soft-margin with the hyperparameter C set to zero
(iii) Increasing the hyperparameter C tends to decrease the margin
(iv) Increasing the hyperparameter C tends to decrease the sensitivity to outliers
2. Consider a Naive Bayes classifier with 3 binary features X1, X2 and X3, and one binary output, Y
(a) How many parameters must be estimated to train such a naive Bayes classifier? Please list them.
(b) How many parameters would have to be estimated to learn the above classifier if we do not make the naive Bayes conditional independence assumption?
3. Trees
(a) Please explain the binary recursive splitting algorithm that is used to fit regression trees.
Can you provide a formal argument how regression trees relate to ordinary least squares?
(b) Classification trees. You are given the following cloud of points with two separate labels reflecting a binary classification setting for a categorical dependent variable yi ∈ {0, 1} and two numeric features X1, X2. Can you illustrate what the decision boundary would look like when fitting trees?
4. Asymptomatic Testing and Optimal Self Isolation
You are an economist advising the government to devise a socially- and economically optimal testing and self isolation strategy.