Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Modern Advances in Neural Networks

No description
by

Thomas Lotze

on 21 June 2013

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Modern Advances in Neural Networks

Training a Neural Network
The Backpropagation Algorithm
A Basic Neural Network
Neural Networks
Modern Advances
RBMs
Image credits:
digital brain: wikimedia user Gengiskanhg
simple neural network: wikimedia user Miso
neural network activation diagram: Artificial Neural Networks wikibook
RBM example from Edwin Chen's Blog
Dropout comparison graphs: Geoffrey Hinton Google Tech Talk: "Brains, Sex, and Machine Learning"

Copyright (c) 2013 Thomas Lotze
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation
Restricted Boltzmann Machines
Dropout
Learning More
Merchant's MCC is "individual_use":
{
no: x0 = 0

yes: x0 = 1
w01 = 0.7
w11 = 15
Merchant's avg. daily $, scaled
x1 = (daily $) / (maximum daily $)
x1 = [0, 1]
net1 =
x0*w01
+
x1*w11
o1 = (net1)
output activation = [0, 1]
List of examples:
individual_use, scaled_gpv, ... , is_fraudster

0, 0.2745, ..., 0
1, 0.0000, ..., 1
1, 0.4817, ..., 0
0, 0.0123, ..., 1
...
0
0.2745
0.19
ACTUAL=0
0.98
0.02
0.76
0.94
Compute error and derivative (gradient)
Adjust weights towards gradient (to get closer to target output)
Iterate to convergence
Basic Neural Networks, Unsupervised Feature Learning and Deep Learning
https://www.coursera.org/course/ml
http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial
Neural Networks win the Merck Kaggle Competition
http://blog.kaggle.com/2012/11/01/deep-learning-how-i-did-it-merck-1st-place-interview/
RBMs
http://blog.echen.me/2011/07/18/introduction-to-restricted-boltzmann-machines/
http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf
Coursera Neural Network course from Hinton:
https://www.coursera.org/course/neuralnets
Deep Learning Tutorials (with Python examples)
http://deeplearning.net/tutorial/
Dropout
youtube.com/watch?v=DleXA5ADG78
http://www.cs.toronto.edu/~nitish/msc_thesis.pdf
http://www.stanford.edu/~sidaw/cgi-bin/home/lib/exe/fetch.php?media=papers:fastdropout.pdf
More Code Examples and Libraries in Progress
http://www.r-bloggers.com/restricted-boltzmann-machines-in-r/
https://github.com/lisa-lab/pylearn2
For each example, randomly "drop" half the hidden nodes in each layer
compute activations, then randomly set half the nodes to 0
update weights as normal
after training, multiply all output weights from hidden nodes by 0.5
Significantly reduces overfitting
Breaks up complex co-adapted hidden nodes
Makes the overall network more robust
Shared weights results in very strong regularization (reduction when node is dropped)
Like ensemble training 2^N networks and averaging via geometric mean
Learn Features Before Labels:
Quickly Identify Common Structure
Also helps in input layer (with lower dropout rate)
Stochastic Spikes
Instead of propagating a value p with probability 0.5, what about propagating a value 0.5 with probability p?
This is what actual neurons do -- and it has always been a neurological puzzle why.
Initial results from Hinton suggest that this is similar to dropout...
slightly longer to learn
requires a slightly larger network
and generalizes slightly better
Variance of p(1-p)/4 rather than p^2/4...which for small p, approaches Poisson...which is also seen in actual neurons
MNIST
(digit classification)
Boltzmann Machines
0/1 binary valued nodes
Undirected edges with weights
Probabilistic activation based on neighbors
Overall "energy" ~ weighted disagreement between nodes
Harry Potter Avatar LOTR Gladiator Titanic Star Trek
Deepening Layers
Harry Potter Avatar LOTR Gladiator Titanic Star Trek
Pre-training for Neural Network
Harry Potter Avatar LOTR Gladiator Titanic Star Trek
Train weights using Contrastive Divergence (fast approximate Gradient Descent). For each example:
set visible activations
construct hidden activations
reconstruct "imagine" visible (Gibbs-like)
update weights: wij' = wij + (vi*hj - ri*hj)
More Improvements
Momentum
Weight Decay (L2 Regularization)
Weight Scaling (Max-Norm Regularization)
More Gibbs Sampling
Rectified Linear Units
Parallel Batch Updates

"Deep Learning"
Full transcript