hi everyone it’s a pleasure to be here

I’ve really enjoyed the workshop so far

so deep morning what is it and what is

it good for so deep learning what is it

and what is it good for

so the key reason at the key challenge

for which deep learning was a solution

was these was first evident in the so

called perceptual tasks such as object

recognition

so you see here four different

categories of objects airplanes bicycles. Xem them dich vu chuyen phat nhanh sai gon di da nang

Birds and cars and within each category

you see the sheer amount of variation

within each category and we call this

variation nuisance variation in the

sense that for the task of object

recognition we only care about the

category of the object all this other

variation is just causing us problems

and you can imagine this generalizes to

many other kinds of perceptual tasks so

for example for speech recognition if

you want to recognize the words or

phonemes that are coming out of

someone’s mouth the nuisances might be a

change in pitch a change in volume if I

start to speak very quietly or if I talk

something really really fast but you can

still understand me right so that means

that your brain has the ability to deal

with this nuisance variation but it

turned out that this nuisance variation

which is tasks dependent and can be

quite nonlinear and complicated is the

core reason why traditional or shallow

machine learning techniques did not

suffice so how do we solve how do we

deal with this nuisance variation well

one solution is to kind of imagine so

there’s this nice picture from Jim

Kokomo and others that we kind of keep

in our mind we don’t know how true it is

but there’s this picture in our mind

that we have to let’s say we want to

distinguish between cars and airplanes

and there is a manifold which is a low

dimensional space for cars and there’s

also a manifold about airplanes depicted

here on the right now the two manifolds

are are curvy they’re nonlinear and

they’re entangled but in order for us to

distinguish them with a simple linear

classifier meaning I can put a nice

linear boundary between the two classes

I needed so called disentangle these two

manifolds

and is positive at the ventral stream in

our brain from from all the way to the

back all the way to the front is

responsible for this transformation so

if we take if we take this as our basic

set up and our goal is to disentangle

these intertwine manifolds how can we do

that well to to a first degree we need

to build some representations some

features of these inputs that are

sensitive to the task relevant or target

features in this case the category of

the object while also simultaneously

being robust or invariant or coherent to

all this nuisance variation right such

as a change in pose for example in the

cars so this dance of both being

selective and invariant at the same time

that’s what’s really difficult and

that’s what we are going to look to the

brain for a little bit of inspiration

about how to solve so early early on in

the in the 1960s Hubel and Wiesel who

eventually won a Nobel Prize for their

work discovered these two different

types of cells which served as

inspiration for these early predecessors

to modern-day neural networks and these

cells had given them an idea the idea

was alternate between two layers of

processing the first layer depicted here

is a kind of soft and build selectivity

it looks for specific kinds of

conjunctions of features the second

layer are shown in blue is a kind of

Tolerance or invariance layer it builds

a little bit of Tolerance or invariance

to these nuisance variations that

plagued the task and by alternating

between selectivity and invariance over

many many layers people hypothesize

researchers hypothesize that they could

build a deep architecture a deep neural

network that would be able to both be

selective and invariant to the to the

features that we want so this was a key

inspiration for neuroscience from

neuroscience that you know as early as

the Neo cognate Ronn architecture from

the early 70s all the way down to the

90s to modern day deep convolutional

nets so this this basic architecture was

was in all four of these

now knowing that you know having this

basic idea is not sufficient to make it

work but it is it was necessary in this

case so let’s step back for a second so

the deep and deep learning we’ve got

these neural networks here and what is a

neural network well here’s the depiction

of a neural network it takes in some

kind of input in this case it’s a facial

recognition network so it’s going to

take in pictures of faces and so the

inputs here on the left and then after

layers and layers of linear and

non-linear processing depicted here are

these intermediate blocks of images or

feature maps at the very end of the

network a decision is made so in this

case the decision will be you know which

actor or which celebrity do I detect in

this image a deep neural network has to

have these Lin these nonlinearities in

it otherwise it just becomes a linear

transformation and a linear

transformation just like a linear

regression does not have enough

expressive power it can’t really

disentangle all those complicated

nuisances that I talked about earlier

so then the nonlinearities are essential

and every time you introduce more and

more non-linearity it essentially lets

you express more and more curves in the

function more and more wiggles if you

will so you can express much more

complicated functions the more

nonlinearities you have so the basic

artificial neural network architecture

of course was directly inspired by by

the brain but then we can we can deviate

as we see fit so what are some of the

successes of modern-day deep learning

before we get to some of the failures

and limitations which is the main focus

of this session so the the sort of the

shot heard around the world back in 2012

communists were the first neural network

architecture to really dominate so

called image net challenge so this is a

depiction of this 12 layer neural

network and this was one of the largest

neural networks ever trained at the time

that it was invented there’s seven

hidden layers 650,000 units or

artificial neurons and had 60 million

parameters he was trained on more data

than any other neural network in history

at the time it had been trained on

millions and millions of images of data

and actually that data was expanded out

to billions of images using a trick

called data augmentation so we’re now in

the Big Data phase of machine learning

where the datasets are enormous and

critical also was the the advent of much

much faster computational methods

including the hardware use of GPUs

graphical processing units these gave

speed ups anywhere from 5x to 50x over

CPU and it was critical for these

developments but of course we cannot

forget that this data set this image and

a data set of millions of hand labelled

images how was that even possible

well it was Amazon Mechanical Turk that

allowed us to crowdsource all of these

millions of images and get labels for

all of them so this was a critical part

of this revolution it would not have

been possible and had this not been done

so after this initial success with with

object recognition which is which is

depicted here

you could see you can see several things

about object recognition one is that the

sheer amount of variation that you deal

with so for example look at all these

pumpkins right so there are many many

different faces of the pumpkin many many

different pictures of the aircraft

carrier with different brightnesses

different viewpoints lots of nuisance

variations same thing with the elephant

here right and previously people try to

hand design features hand design

architectures that can deal with these

nuisances now this is all being learned

directly from the data and this side on

the left hand side here you see the kind

of output that a convolutional net can

give it gives out a probability

distribution over different categories

so for this task a categorization task

so you get out these probabilities for

any given input and sometimes the

probabilities make sense in which case

we call them calibrated and other times

the probabilities really don’t make

sense at all and we call them

uncalibrated the interesting thing about

neural networks is whether the

probabilities are calibrated or

calibrated they seem to learn to make

the right decisions

but that’s already a first indication

that these cannot be treated as black

boxes the probabilities that are coming

out are not necessarily calibrated to

reality after this initial success there

was a huge sort of if you will Cambrian

explosion of different kinds of neural

network architectures the evolution of

these different species of neural

networks became it just became really

easy to do you could design your own

neural network train it on some data set

and within days test it out and this

ability to rapidly iterate to define and

iterate and train neural networks led to

these this plethora of many different

species of which we can’t cover them all

today but we can look at a few so one

particular architecture was developed

for facial recognition it’s called a

deep face architecture this was

developed by Facebook and it’s a

particularly nice architecture for

facial recognition because you can see

all the elements of modern-day deep

learning in this in this picture so for

example there’s still some

pre-processing here right so I don’t

just pass in a raw image I first take

this raw image I do a little

pre-processing to crop out the part of

the image that’s gonna be related to the

face that’s still needed you still need

to pre-process you still need to focus

you also need to do what’s called front

realization you need to rotate the face

again this is a computer vision

procedure unrelated to neural networks

to kind of straighten out and make the

face look forward and then after that

point you pass it into a neural network

that has many many layers of units and

you notice that each layer can be

visualized as if it were an image

alright so there’s there’s an x and y

component and there’s a bunch of

channels to this image so these are

called feature maps you can think of

them as a generalization of images and

as you go deeper and deeper you can kind

of still see the remnants of the

original image in this second layer but

as you go deeper and deeper these

activations these responses of these

artificial neurons become more and more

abstract and we can no longer interpret

what’s really going on in these deeper

layers this is an active area of

research to figure out is there some

mathematical and

rotation of these so using these kinds

of deep neural networks many kinds of

amazing applications are possible one

can combine the content of an one image

with the style of another so you can

take your favorite style from from a

painting that you like like van Gogh

starry night and combine it with the

content of some other image we won’t

have time to go into the details but the

fact that this is possible and that

yields human you know high perceptual

quality images is is was one of the

amazing things that we discovered on the

other hand there are limitations and

when we look at some of these

limitations we can see on the left-hand

side that when we train neural networks

to generate digits that they do a pretty

good job of generating digits they even

learn how to you know they even learn

implicitly

that slant is a sort of variable that

matters for generating the different

digits right so they automatically

learned these kinds of things just by

learning to generate digits however when

you ask them to generate dogs much more

complicated set I don’t know about you

but that is the scariest looking dog

I’ve ever seen and and this begins to

show you the limitations of these neural

networks they are capturing something is

central something that is doglike and

indeed if you look at a local patch of

pixels it is it has a texture that is

associated with dogness but globally it

doesn’t quite make sense right so

already we see that maybe these

convolutional nets that were used to

generate these were just modeling local

correlations and somehow missing

long-range correlations so we start to

see some of the failure modes

nevertheless people continue to progress

and they design better and better and

neural networks and after many years of

iteration they started generating faces

with these neural networks celebrity

faces now these two celebrities don’t

actually exist these are completely

made-up people and I highly encourage

you to go I couldn’t get youtube to work

in China but I highly encourage you to

after the fact go watch this video and

see these in

raising looking faces moving away from

the visual domain you not only can

generate images you can also generate

text that looks realistic so this is

generating wiki markup generating C++

code and if you’re a bit of a masochist

generating algebraic topology so all of

these three languages which of which I

only understand a couple all these

languages have their own grammar have

their own syntax and you can see that

the neural networks learn a lot of the

syntax a lot of the grammar however if

you notice the text is semantically

meaningless this code does not even

compile it looks like C++ code but it’s

completely meaningless right same thing

with the algebraic topology but I I

wouldn’t be able to tell so so again we

see the limits of deep learning we see

that you can learn a lot of syntactic

structure but semantically it’s it’s

meaningless of course there are many

many other applications of deep learning

which we don’t have time to go into one

of which is a neuroscience application

it turns out and this is one of the one

of the I’ll skipper slides

due to time it turns out that much to

the chagrin of many neuro scientists our

best models of responses in the visual

cortex are currently deep convolutional

neural networks achieving an r-squared

of roughly 50% this means showing

natural image or synthetically rendered

image inputs to to a monkey and also to

a neural network both able to do the

same task of object recognition and then

seeing how well you can predict the

responses in the brain off of the

responses in the neural network and

that’s where the r-squared of roughly

50% comes in and roughly it seems like

later layers roughly correspond to later

areas in the visual hierarchy so this

seems to suggest that even though deep

convolutional nets are not neroli or

maybe they’re not nearly plausible there

is something similar about the

representations that they end up

learning from the ventral screen in the

brain this was

a bit of a surprising finding

nevertheless now for some cold water so

deep learning still has many many things

that struggles with and one of these

things that it struggles with it will be

very important for its applications in

the Natural Sciences so in particular

you’ve probably heard about the Tesla

autopilot accident that was I couldn’t

actually even find an image of it on the

internet anymore they’ve taken them all

down but a Tesla a neural network in the

auto pilot miss miscategorized

miscategorized a a tractor-trailer as a

billboard so and because it was a

billboard it vetoed the radar and said

ignore the radars warnings and so

unfortunately the Tesla driver ran right

into the into the into tractor-trailer

and died so that was I think the most

prominent error that a neural network is

made but it just goes to show you that

you can’t just trust the black box it’s

not interpretable and before you put it

into something as dangerous as a car

we really really need ways of formally

interpreting and verifying what’s going

on under the hood another disturbing

aspect of deep neural networks that

seems extremely robust is that they are

prone to so-called adversarial examples

so you see here on the right I can take

a picture of a panda I can add some

adversary chosen noise to it that is and

gets this new image which is humanly

imperceptible there are imperceptible

differences between the input and the

perturbed image and yet the

convolutional net is 99% confident that

this is now another animal and you can

make it think it’s a school bus if you

want to it’s really really bad so

obviously our brains don’t do this our

brains so far as we can as far as we can

tell our brains are not prone to such

adversarial examples but it suggests

that there’s some really really weird

input-output instability in these neural

networks that we have not been able to

alleviate and finally they’re you know

the most critical thing for natural

science applications is that

deep learning has poor generalization

when extrapolating outside of the

training set so let’s let’s go a bit

into what we mean by that so what are

some applications of deep learning in

the Natural Sciences the key one is

automation alright so this is where you

try to automate away tedious and time

consuming or costly perceptual tasks but

the more flexible modeling application

is the one that we really care about is

how do we model large complex nonlinear

systems so I’ll jump ahead and show you

an example of what happens when you pit

deep learning against recurrent neural

networks and this will motivate the next

two talks so here on the Left I’ve

trained a system that really understands

the dynamics of its abyss basically it’s

a bunch of coupled oscillators a bunch

of pendulums that are think of a bunch

of grandfather clocks that are coupled

by air and you can see in the in orange

that I don’t have enough oscillators and

so I’m able to partially mimic the true

behavior of the system but I have some

errors after time T equals 10 now

imagine doing the same thing with a

recurrent neural network I’m going to

use a very high capacity of recurrent

neural network that’s very popular it’s

called an alice tiem you notice that the

LS TM is perfectly able to match the

first ten time units and then after that

when it’s extrapolating it completely

fails it’s and we’ve tried extensively

over many many different architectures

and this is the key problem is that deep

learning has a lot of expressive power

as enormous amounts of flexibility but

it has a hard time extrapolating because

it doesn’t understand the true physics

of the system so how do we get it to

understand the true physics of the true

domain constraints in a particular

system to put two of our next talks from

go Feng Shang will show how to impose

these kinds of domain knowledge and

constraints in the area of computer

vision using constraints from multi

geometry and and Kyle will show the Kyle

will also show how to use these kinds of

domain-specific constraints in particle

physics applications how do we impose

conservation of energy and momentum and

all the different kinds of particle

types all the knowledge from particle

physics how do we impose it while still

retaining the flexibility and power of

deep learning and with that I will

definitely end my way to long talk and

if anybody wants to talk about how do we

do this in neuroscience how do we impose

domain knowledge extracted directly from

the visual cortex including how neurons

fire and who they’re connected to which

is this massive effort if anybody wants

to talk about that offline I’d love to

continue okay thank you