0%

COMP3009 - 1.Introduction

1. What is Machine Learning?

1.1. Definition

  • Arthur Samuel (1959): Field of study that gives computers the ability to learn without being explicitly programmed. <<一个较为宏观的定义>>
  • Tom Mitchell (1998).Well-posed Learning Problem:A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. <<一个较为practical的定义, 难点在于如何辨别一个problem中的T/P/E>>

1.2. Main goal(Aim)

  • Learning functions:
    Features 
    Labels
    Learning functions
    Machine learning 
    x
    f(x) = y
    The relationship between x and y.
    ML is the process of learning a function or hypothesis h(x), let the h(x) --> y,best approximates.
  • Generalisation(泛化): This hypothesis can then make predictions on new data points , such that is as close as possible to its real label y .
  • 就人类而言,正是因为泛化能力,才让我们有可能用已知来对抗未知,以有限对抗无限.

1.3. Learning functions

LearningFunctionsEg.png

  1. (a): Original data points;
  2. (b): Fit by three piecewise-linear segments (perfect fit);
    • 三个线性分段函数;
    • parameter: 3 x 2 = 6 variables;
  3. (c): Fit by third-order polynomial (perfect fit);
    • 三阶多项式;
    • parameter: 3(polynomial) + 1(bias) = 4 variables;
  4. (d): Fit by first-order polynomial (straight line - not perfect);
    • 一阶多项式;
    • parameter: 2 variables;
  • Rule of thumb: Data points ~= 10 x variables num
  • Tips:
    1. 避免过拟合
    2. 控制参数个数

2. Representation

2.1. Supervised/Unsupervised Learning

  • Machine Learning: is learning from experience. It’s also called supervised learning. E consists of features and labels, and P and T are well-defined.
  • Pattern Recognition: is finding patterns without experience. It’s also called unsupervised learning. E consists of only features, and P and T are defined in much broader terms of finding ‘interesting patterns’.

监督学习(supervised learning):

  1. 已知Label是什么,长什么样子,正确答案\结果是什么
  2. 输入输出之间存在较为明确的关系
  3. 比如:房屋面积和房价

无监督学习(unsupervised learning)

  1. 对Label的定义不是很明确,或者可以理解为很广泛
  2. 通过数据集中原本存在的某种关系进行归类(Clustering)
  3. 比如:
    • 对基因组进行归类
    • 从音频中分出音乐和对话

2.2. Representation view on ML (Pedro Domingos, 2012)

Machine Learning = Representation + Evaluation + Optimisation

  • Representation: 表征, a way of describing the problem and data
  • Evaluation: 评估,类似于Tom Mitchell提出的measure P
  • Optimisation:优化,使预测值接近真实值的算法

2.3. Classification

Classification is a ML task where T has a discrete set of outcomes.

  • Often classification is binary: [0,1]
  • Examples:
    • face detection
    • smile detection
    • spam classification
    • hot/cold


2.4. Regression

Regression is a ML task where T has a real-valued outcome on some continuous sub-space:

  • Examples:
    • age estimation
    • stock value prediction
    • temperature prediction
    • energy consumption prediction

3. Features, labels, tasks

3.1. Features and Labels

  • Data points or instances make up the data used to learn a hypothesis h or find a pattern g
  • In Machine Learning, a data point consists of feature/label tuples {x, y}
  • A single data point comes from one measurement/observation
  • Many data points together make a dataset

3.2. Labels

Labels y are the values that h(x) aims to predict.

  • Obtaining labels is usually an arduous task
    • Often manual
    • Repetitive
    • Complicated experiments Difficult to obtain data
  • Example:
    • Facial expressions of pain
    • Impact of diet on astronauts in space
    • Predictions of house prices

3.3. Features/Attributes

Features/Attributes are measurable values of variables for which some form of pattern exists, that can be used to infer the associated label y.

  • Sender domain in spam detection
  • Mouth corner location in smile detection Temperature in forest fire prediction
  • Pixel value in face detection
  • Head pose estimation from facial point locations

3.4. Features Definition

  • For a given problem, all data points must have the same, fixed-length(a row-vector with d elements) set of features x :
  • A dataset with n data points is then denoted as a n x d matrix:

3.5. Labels Definition

For a given problem with a singular task, the set of labels y accompanying the set of features X is given as:

4. Linear Regression Intro

4.1. Simplest Example -Latitude and Temperature

SimplestExampleLinearRegression.png

4.2. Training Set and Meaning of Symbol

TrainingSetAndMeaningofSymbol.png

4.3. Learning Flow

Univariate Linear Regression: One feature.
LearningFlow.png

4.4. Training Algorithm - Minimises the Cost Function

Given a model h with solution space and a training set {X,y}, a learning algorithm finds the solution that minimises the cost function J(S).

4.5. Intrinsic/Hyper Parameters

  1. Intrinsic parameters

    • Can be efficiently learned on the training set
    • Large in number
    • E.g. weights in linear regression or Artificial Neural Network
    • 固有参数是指模型可以通过数据自动学习出的变量;Eg. 深度学习的权重、偏差
  2. Hyper-parameters

    • Must be learned by establishing generalisation error
    • No efficient search possible
    • Smaller in number
    • E.g. the number of nodes in an ANN or the degree of a polynomial linear regression model
    • 超参数是用来确定模型的参数,超参数不同,模型不同,一般是根据经验确定的;Eg. 学习速率,迭代次数,层数,每层神经元的个数等

4.6. Brute Force Search(暴力查找)


Note-思考问题:

  • You can’t do this for hyper-parameters using the above formulation (why not?)
  • Clearly you can’t search all possible values (why not?)
  • This is a very small formula, but there are some hidden caveats. Can you write matlab/pseudo code for this?