1. What is Machine Learning?
1.1. Definition
- Arthur Samuel (1959): Field of study that gives computers the ability to learn without being explicitly programmed. <<一个较为宏观的定义>>
- Tom Mitchell (1998).Well-posed Learning Problem:A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. <<一个较为practical的定义, 难点在于如何辨别一个problem中的T/P/E>>
1.2. Main goal(Aim)
- Learning functions:FeaturesLabelsLearning functionsMachine learningxf(x) = yThe relationship between x and y.ML is the process of learning a function or hypothesis h(x), let the h(x) --> y,best approximates.
- Generalisation(泛化): This hypothesis can then make predictions
on new data points , such that is as close as possible to its real label y . - 就人类而言,正是因为泛化能力,才让我们有可能用已知来对抗未知,以有限对抗无限.
1.3. Learning functions
- (a): Original data points;
- (b): Fit by three piecewise-linear segments (perfect fit);
- 三个线性分段函数;
- parameter: 3 x 2 = 6 variables;
- (c): Fit by third-order polynomial (perfect fit);
- 三阶多项式;
- parameter: 3(polynomial) + 1(bias) = 4 variables;
- (d): Fit by first-order polynomial (straight line - not perfect);
- 一阶多项式;
- parameter: 2 variables;
- Rule of thumb: Data points ~= 10 x variables num
- Tips:
- 避免过拟合
- 控制参数个数
2. Representation
2.1. Supervised/Unsupervised Learning
- Machine Learning: is learning from experience. It’s also called supervised learning. E consists of features and labels, and P and T are well-defined.
- Pattern Recognition: is finding patterns without experience. It’s also called unsupervised learning. E consists of only features, and P and T are defined in much broader terms of finding ‘interesting patterns’.
监督学习(supervised learning):
- 已知Label是什么,长什么样子,正确答案\结果是什么
- 输入输出之间存在较为明确的关系
- 比如:房屋面积和房价
无监督学习(unsupervised learning)
- 对Label的定义不是很明确,或者可以理解为很广泛
- 通过数据集中原本存在的某种关系进行归类(Clustering)
- 比如:
- 对基因组进行归类
- 从音频中分出音乐和对话
2.2. Representation view on ML (Pedro Domingos, 2012)
Machine Learning = Representation + Evaluation + Optimisation
- Representation: 表征, a way of describing the problem and data
- Evaluation: 评估,类似于Tom Mitchell提出的measure P
- Optimisation:优化,使预测值接近真实值的算法
2.3. Classification
Classification is a ML task where T has a discrete set of outcomes.
- Often classification is binary: [0,1]
- Examples:
- face detection
- smile detection
- spam classification
- hot/cold
2.4. Regression
Regression is a ML task where T has a real-valued outcome on some continuous sub-space:
- Examples:
- age estimation
- stock value prediction
- temperature prediction
- energy consumption prediction
3. Features, labels, tasks
3.1. Features and Labels
- Data points or instances make up the data used to learn a hypothesis h or find a pattern g
- In Machine Learning, a data point consists of feature/label tuples {x, y}
- A single data point comes from one measurement/observation
- Many data points together make a dataset
3.2. Labels
Labels y are the values that h(x) aims to predict.
- Obtaining labels is usually an arduous task
- Often manual
- Repetitive
- Complicated experiments Difficult to obtain data
- Example:
- Facial expressions of pain
- Impact of diet on astronauts in space
- Predictions of house prices
3.3. Features/Attributes
Features/Attributes are measurable values of variables for which some form of pattern exists, that can be used to infer the associated label y.
- Sender domain in spam detection
- Mouth corner location in smile detection Temperature in forest fire prediction
- Pixel value in face detection
- Head pose estimation from facial point locations
3.4. Features Definition
- For a given problem, all data points must have the same, fixed-length(a row-vector with d elements) set of features x :
- A dataset with n data points is then denoted as a n x d matrix:
3.5. Labels Definition
For a given problem with a singular task, the set of labels y accompanying the set of features X is given as:
4. Linear Regression Intro
4.1. Simplest Example -Latitude and Temperature
4.2. Training Set and Meaning of Symbol
4.3. Learning Flow
Univariate Linear Regression: One feature.
4.4. Training Algorithm - Minimises the Cost Function
Given a model h with solution space
4.5. Intrinsic/Hyper Parameters
Intrinsic parameters
- Can be efficiently learned on the training set
- Large in number
- E.g. weights in linear regression or Artificial Neural Network
- 固有参数是指模型可以通过数据自动学习出的变量;Eg. 深度学习的权重、偏差
Hyper-parameters
- Must be learned by establishing generalisation error
- No efficient search possible
- Smaller in number
- E.g. the number of nodes in an ANN or the degree of a polynomial linear regression model
- 超参数是用来确定模型的参数,超参数不同,模型不同,一般是根据经验确定的;Eg. 学习速率,迭代次数,层数,每层神经元的个数等
4.6. Brute Force Search(暴力查找)
Note-思考问题:
- You can’t do this for hyper-parameters using the above formulation (why not?)
- Clearly you can’t search all possible values (why not?)
- This is a very small formula, but there are some hidden caveats. Can you write matlab/pseudo code for this?