perceptron

第二章-感知机 perceptron

感知机是二分类的线性分类模型，输入为实例的特征向量，输出是实例的类别，取得是 +1 或者 -1。感知机对应于输入空间将实例划分为正负两类的分离超平面，属于是判别模型。

感知机模型

输入空间 $X \in R ^n$和输出空间$Y={+1,-1}$满足的函数关系是$$f(x)=sign(w \cdot x+b)$$解释为：

$w$表示的是权值向量
$b$表示的是偏置
$w \cdot x$表示的是二者内积
sign(x)表示的是符号函数：x大于等于0，函数取1；否则取-1

感知机模型对应于输入空间中的分类离超平面$w \cdot x+b = 0$

感知机学习策略

感知机学习的目标是求出一个将正负分类的超平面，需要确定两个模型参数$w,b$。输入空间中的任意一点到超平面$S$的距离为： $$\frac {1}{||w||}|w \cdot x_0 + b|$$其中||w||是w的$L_2$范数。

对于误分类点$(x_i,y_i)$来说，总是有$$-y_i(w \cdot x_i + b) > 0$$

当$w \cdot x_i + b > 0$，$y_i=-1$
当$w \cdot x_i + b < 0$，$y_i= +1$

因此某个误判点到超平面的距离总是$$- \frac {1}{||w||}y_i(w \cdot x_0 + b)$$

假设超平面$S$的误分类点集合为M，那么所有误分类点到超平面S的总距离是

$$-\frac {1}{||w||}\sum_{x_i \in M}y_i{(w \cdot x_i+b)}$$

不考虑前面的系数$\frac {1}{||w||}$，得到感知机的学习策略是极小化损失函数：

$$\min {w, b} L(w, b)=-\sum{x_{i} \in M} y_{i}\left(w \cdot x_{i}+b\right)$$

感知机算法

感知机学习算法是基于随机梯度下降法的对损失函数的最优化算法，有原始形式和对偶形式两种。梯度下降算法的图解形式

原始形式

输入：给定输入数据集$$T={(x_1,y_1),(x_2,y_2),…,(x_N,y_N)},x_i \in X=R^n,y_i\in Y={+1,-1}$$
输出：输出$w,b$；感知机模型$f(x)=sign(w \cdot x + b)$
具体过程：
- 选取初值$w_0,b_0$
- 随机抽取一个误分类点$(x_i,y_i)$对$w,b $进行更新，其中学习率$\eta$满足$0 < \eta < 1$
- 不断地更新$w,b$
  
  $w=w+\eta y_ix_i$
  
  $b = b+\eta y_i$

算法的具体阐述：

当一个实例点被误分类时，即位于分离超平面的错误一侧，调整两个参数的值，使得分离超平面向着误分类点一侧移动，减小该误分类点到分离超平面的距离，直至超平面越过该误分类点使其被正确分类。

算法的收敛性

当训练数据是线性可分时，感知机的学习算法是收敛的。误分类次数k满足不等式$$k \leq (\frac {R}{\gamma})^2$$

代码

iris数据集

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

1	from sklearn.datasets import load_iris

# load data
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['label'] = iris.target

df.head()

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

1 2	df.columns = ['sepal_length', 'sepal_width', 'petal_width', 'petal_width', 'label'] # 指定数据的列属性

1	df.label.value_counts() # 统计每个类别标签出现的个数

2    50
1    50
0    50
Name: label, dtype: int64

plt.scatter(df[:50]['sepal_length'], df[:50]['sepal_width'],
            label = '0')   # 生成数据集

plt.scatter(df[50:100]['sepal_length'], df[50:100]['sepal_width'],
            label = '1')
plt.xlabel("sepal_length")  # 横纵轴的标签
plt.ylabel("sepal_width")
plt.legend()

png

1	df[:50]['sepal_length'][:5] # df的前50行数据中取出sepal_length列的前5个数据

0    5.1
1    4.9
2    4.7
3    4.6
4    5.0
Name: sepal_length, dtype: float64

1 2	# 取出 df 数据的前100行中的第0，1 和 -1 列(最后一列) data = np.array(df.iloc[:100, [0,1,-1]])

1	data[:5] # 查看前5个数据；numpy的数据不能使用head()方法

array([[5.1, 3.5, 0. ],
       [4.9, 3. , 0. ],
       [4.7, 3.2, 0. ],
       [4.6, 3.1, 0. ],
       [5. , 3.6, 0. ]])

1
2
3

# X 是所有行的除去最后一列；y 是所有行的最后一列
X, y = data[:, :-1], data[:, -1]
y

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

1
2
3

# 只是将0变成了1
y = np.array([1 if i == 1 else -1 for i in y])
y

array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
       -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
        1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1])

Perceptron

# 数据线性可分，二分类数据

class Model:
    def __init__(self):
        # 定义3个参数的初值
        self.w = np.ones(len(data[0]) - 1, dtype = np.float32)   # len(data[0])-1
        self.b = 0
        self.l_rate = 0.1

    def sign(self, x, w, b):
        y = np.dot(x, w) + b
        return y

    # 随机梯度下降
    def fit(self, X_train, y_train):
        flag = False
        while not flag:
            wrong_count = 0
            for d in range(len(X_train)):  # 遍历样本
                X = X_train[d]
                y = y_train[d]
                if y * self.sign(X, self.w, self.b) <= 0:
                    self.w = self.w + self.l_rate * np.dot(y, X)  # 更新权重
                    self.b = self.b + self.l_rate * y  # 更新偏置
                    wrong_count += 1

            if wrong_count == 0 :
                flag = True
        return "Perceptron Model!"

    def score(self):
        pass

1 2	perceptron = Model() perceptron.fit(X, y)

'Perceptron Model!'

x_points = np.linspace(4,7,10)
y_ = -(perceptron.w[0] * x_points + perceptron.b) / perceptron.w[1]
plt.plot(x_points, y_)

plt.plot(data[:50, 0], data[:50, 1], 'bo', color='blue', label='0')
plt.plot(data[50:100, 0], data[50:100, 1], 'bo', color='orange', label='1')

plt.xlabel('sepal_length')
plt.ylabel('sepal_width')
plt.legend()

scikit-learn实例

1	from sklearn.linear_model import Perceptron

1 2	clf = Perceptron(fit_intercept=False, max_iter=1000, shuffle=False) clf.fit(X, y)

Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0,
           fit_intercept=False, max_iter=1000, n_iter_no_change=5, n_jobs=None,
           penalty=None, random_state=0, shuffle=False, tol=0.001,
           validation_fraction=0.1, verbose=0, warm_start=False)

1	print(clf.coef_)

[[ 16.3 -24.2]]

1 2	# 截距 print(clf.intercept_)

[0.]

x_points = np.arange(4,8)
y_ = -(clf.coef_[0][0] * x_points + clf.intercept_) / clf.coef_[0][1]
plt.plot(x_points, y_)

plt.plot(data[:50, 0], data[:50, 1], 'bo', color='blue', label='0')
plt.plot(data[50:100, 0], data[50:100, 1], 'bo', color='orange', label='1')

plt.xlabel('sepal_length')
plt.ylabel('sepal_width')
plt.legend()

代码实现（栗子2.1）

# --*-- coding: utf-8 --*--
import numpy as np

class Perceptron:
    def __init__(self):  # 初始化函数
        self.weights = None  # 定义两个参数：权重和偏置
        self.bias = None

    def sign(self, value):  # 定义符号函数：大于等于0输出1；否则输出-1
        return 1 if value >= 0 else -1

    def train(self, dataset, labels):
        learning_rate = 1  # 定义学习率为1
        dataset = np.array(dataset)  # 将输入的数据转成 np.array 的形式
        n = dataset.shape[0]  # 数据是 n * m
        m = dataset.shape[1]

        weights = np.zeros(m) # 设置权重、偏置和学习率；权重是个数为 m 的全0向量
        bias = 0
        i = 0

        while i < n:
            # 根据误分类点满足：y * sign(w.x + b) < 0 来判断；再进行权重和偏置的更新
            if (labels[i] * self.sign(np.dot(weights, dataset[i]) + bias) <= 0):
                weights = weights + learning_rate * labels[i] * dataset[i] # dataset[i]看作是x[i]
                bias = bias + learning_rate * labels[i]  # labels[i] 看作是y[i]
                i = 0
            else:  # 如果不是误分类点，i自加1，进行下次循环，直到while循环条件不再满足
                i += 1
        self.weights = weights # 更新之后的权重和偏置
        self.bias = bias

    def predict(self, data):
        if (self.weights is not None and self.bias is not None):
            # 权重和偏置都不是空值，根据 sign(w.x + b) 进行预测
            return self.sign(np.dot(self.weights, data) + self.bias)
        else:
            return 0


if __name__ == "__main__":
    dataset = [[3,3],
               [4,3],
               [1,1]]
    labels = [1,1,-1]

    perceptron = Perceptron()
    perceptron.train(dataset, labels)
    print("weight is:", perceptron.weights)
    print("bias is:", perceptron.bias)

    result = perceptron.predict([3,3])
    print("prediction:", result)

weight is: [1. 1.]
bias is: -3
prediction: 1