Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/fontdata.js
Recurrent Neural Networks
Hui Lin @Google
Types of Neural Network
Why sequency?
Speech Recognition |
 |
⟶ |
Get your facts first, then you can distort them as you please. |
Music generation |
∅ |
⟶ |
 |
Sentiment classification |
Great movie ? Are you kidding me ! Not worth the money. |
⟶ |
 |
DNA sequence analysis |
ACGGGGCCTACTGTCAACTG |
⟶ |
AC GGGGCCTACTG TCAACTG |
Machine translation |
网红脸 |
⟶ |
Internet celebrity face |
Video activity recognition |
 |
⟶ |
Running |
Name entity recognition |
Use Netlify and Hugo. |
⟶ |
Use Netlify and Hugo. |
RNN types
- rectangle: a vector
- green: input vector
- blue: output vector
- red: intermediate state vector
- arrow: matrix multiplications

Notation
x: Use(x<1>) Netlify(x<2>) and(x<3>) Hugo(x<4>) .(x<5>)
y: 0 (y<1>) 1(y<2>) 0(y<3>) 1(y<4>) 0(y<5>)
x(i)<t>, T(i)x (ith sample)
y(i)<t>, T(i)y (ith sample)
Representing words
[a[1]aaron[2]⋮and[360]⋮Hugo[4075]⋮Netlify[5210]⋮use[8320]⋮Zulu[10000]]⟹use=[00⋮0⋮0⋮0⋮1⋮0],Netlify=[00⋮0⋮0⋮1⋮0⋮0],and=[00⋮1⋮0⋮0⋮0⋮0],Hugo=[00⋮0⋮1⋮0⋮0⋮0]
What is RNN?

x: Use(x<1>) Netlify(x<2>) and(x<3>) Hugo(x<4>) .(x<5>)
y: 0 (y<1>) 1(y<2>) 0(y<3>) 1(y<4>) 0(y<5>)
x(i)<t>, T(i)x (ith sample)
y(i)<t>, T(i)y (ith sample)
Forward Propagation

a<0>=o; a<1>=g(Waaa<0>+Waxx<1>+ba)
ˆy<1>=g′(Wyaa<1>+by)
a<t>=g(Waaa<t−1>+Waxx<t>+ba)
ˆy<t>=g′(Wyaa<t>+by)
Forward Propagation

L<t>(ˆy<t>)=−y<t>log(ˆy<t>)−(1−y<t>)log(1−ˆy<t>)
L(ˆy,y)=ΣTyt=1L<t>(ˆy<t>,y<t>)
Backpropagation through time

Deep RNN

Vanishing gradients with RNNs
- The cat, which ate already, was full.
- The cats, which ate already, were full.

LSTM
LSTM
LSTM
LSTM
Word representation
- Vacabulary = [a, aaron, …, zulu, ], |V|=10,000
- One hot representation
ManWomanKingQueenApplePumpkin(5391)(9853)(4914)(7157)(456)(6332)[0000â‹®1â‹®00][00000â‹®1â‹®0][000â‹®1â‹®000][00000â‹®1â‹®0][0â‹®1â‹®00000][00000â‹®1â‹®0]
Word representation
- My favourite Christmas dessert is pumpkin ____
- My favourite Christmas dessert is apple ____
ManWomanKingQueenApplePumpkin(5391)(9853)(4914)(7157)(456)(6332)[0000â‹®1â‹®00][00000â‹®1â‹®0][000â‹®1â‹®000][00000â‹®1â‹®0][0â‹®1â‹®00000][00000â‹®1â‹®0]
Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Featurized representation: word embedding

Analogies
- man ⟶ woman ≈ king ⟶ ?

Analogies
- man ⟶ woman ≈ king ⟶ ?

Analogies
- man ⟶ woman ≈ king ⟶ ?

Analogies
- eman−ewoman=[−2,−0.01,0.03,0]T≈[−2,0,0,0]T
- eking−equeen=[−1.92,−0.02,0.01,−0.01]T≈[−2,0,0,0]T

Analogies
eman−ewoman≈eking−e?
→argmaxw{sim(ew,eking−eman+ewoman)}

Cosine similarity
sim(ew,eking−eman+ewoman) = ?
Cosine similarity: sim(a,b)=aTb||a||2||b||2

Cosine similarity
sim(ew,eking−eman+ewoman) = ?
Cosine similarity: sim(a,b)=aTb||a||2||b||2


Embedding matrix

Embedding matrix

- In practice, we look up embedding instead of doing matrix multiplication.
Data Preprocessing
Data Preprocessing
