Hui Lin @Google
Fun video: https://www.youtube.com/watch?v=cNxadbrN_aI
Two features: (\(x_1\) and \(x_2\)); linear functions to separate classes; find (\(w_0\), \(w_1\), \(w_2\)) such that:
\[z^{(i)} = w_0 + w_1x_1^{(i)} + w_2x_2^{(i)}\]
\[pred^{(i)}=\begin{cases} \begin{array}{c} 1\\ -1 \end{array} & \begin{array}{c} if\ z^{(i)}>0\\ if\ z^{(i)}\le0 \end{array}\end{cases}\]
Start with random weights. Set a maximum of epochs of M, for each epoch (permutation):
\[z^{(i)} = w_0 + w_1x_1^{(i)} + w_2x_2^{(i)}\]
\[pred^{(i)}=\begin{cases} \begin{array}{c} 1\\ -1 \end{array} & \begin{array}{c} if\ z^{(i)}>0\\ if\ z^{(i)}\le0 \end{array}\end{cases}\]
\[w_0 = w_0 + \eta(actual^{(i)} - pred^{(i)})\]
\[w_1 = w_1 + \eta(actual^{(i)} - pred^{(i)})x_1^{(i)}\] \[w_2 = w_2 + \eta(actual^{(i)} - pred^{(i)})x_2^{(i)}\]
It is a linear classification function, and the weight is updated after we feed each data point to the algorithm (a concept similar to stochastic gradient descent).
The algorithm continues to update when we feed the same data set again and again (i.e., epochs)
It does not solve non-linearly spreadable problems.
\[X=\left[\begin{array}{cccc} x_{1}^{(1)} & x_{1}^{(2)} & \dotsb & x_{1}^{(m)}\\ x_{2}^{(1)} & x_{2}^{(2)} & \dotsb & x_{2}^{(m)}\\ \vdots & \vdots & \vdots & \vdots\\ x_{n_{x}}^{(1)} & x_{n_{x}}^{(2)} & \dots & x_{n_{x}}^{(m)} \end{array}\right]\in\mathbb{R}^{n_{x}\times m}\]
\[y=[y^{(1)},y^{(2)},\dots,y^{(m)}] \in \mathbb{R}^{1 \times m}\]
\(\hat{y}^{(i)} = \sigma(w^Tx^{(i)} + b)\) where \(\sigma(z) = \frac{1}{1+e^{-z}}\)
\[X=\left[\begin{array}{cccc} x_{1}^{(1)} & x_{1}^{(2)} & \dotsb & x_{1}^{(m)}\\ x_{2}^{(1)} & x_{2}^{(2)} & \dotsb & x_{2}^{(m)}\\ \vdots & \vdots & \vdots & \vdots\\ x_{n_{x}}^{(1)} & x_{n_{x}}^{(2)} & \dots & x_{n_{x}}^{(m)} \end{array}\right]\in\mathbb{R}^{n_{x}\times m}\]
\[y=[y^{(1)},y^{(2)},\dots,y^{(m)}] \in \mathbb{R}^{1 \times m}\]
\(\hat{y}^{(i)} = \sigma(w^Tx^{(i)} + b)\) where \(\sigma(z) = \frac{1}{1+e^{-z}}\)