Encoding and decoding semantic representations with Alexander Huth by The Language Neuroscience Podcast | Podchaser

Episode from the podcastThe Language Neuroscience Podcast

Encoding and decoding semantic representations with Alexander Huth

Released Thursday, 4th May 2023

Good episode? Give it some love!

Encoding and decoding semantic representations with Alexander Huth

Encoding and decoding semantic representations with Alexander Huth

Thursday, 4th May 2023

Good episode? Give it some love!

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:05

Welcome to

0:05

Episode 25 of the Language

0:08

Neuroscience Podcast. I'm

0:08

Stephen Wilson. Thank you for

0:11

listening. It's been a while

0:11

since I recorded an episode.

0:14

Sorry for the delay. But I'm

0:14

very excited to be back to the

0:16

podcast and happy to say that

0:16

the first episode back is a

0:19

really great one. I've got some

0:19

more guests lined up too. So

0:21

there shouldn't be such a long

0:21

wait for the next episodes. To

0:25

make a long story short, the

0:25

reason I didn't record any

0:27

episodes lately is that I moved

0:27

back to Australia with my family

0:30

at the start of this year. There

0:30

has been a lot going on. I moved

0:34

to the US when I was twenty

0:34

three with nothing but two

0:36

suitcases. It was easy. I

0:36

thought nothing of it. But

0:39

moving with my whole family at

0:39

this stage of life is a whole

0:41

other thing. We're now living in

0:41

Brisbane, which is where my

0:44

parents and many extended family

0:44

members live. And it's lovely to

0:46

see them all the time. Instead

0:46

of having to wait years at a

0:49

time. I started a new position

0:49

at the University of Queensland,

0:53

where I'm in the School of Health and Rehabilitation Sciences. There are great new

0:54

colleagues here who I'm enjoying

0:58

getting to know and I'm looking

0:58

forward to developing some new

1:00

lines of research. I need to

1:00

build up a new lab. So if you're

1:04

a potential grad student or

1:04

postdoc, please don't hesitate

1:06

to get in touch. I really miss

1:06

my friends and colleagues at

1:09

Vanderbilt, which was an awesome

1:09

place to work. But fortunately,

1:12

we're continuing to collaborate

1:12

on our Aphasia Recovery Project,

1:15

even though I'm living on the

1:15

other side of the world. Okay,

1:18

on to the podcast. My guest

1:18

today is Alex Huth, Assistant

1:22

Professor of Neuroscience and

1:22

Computer Science at the

1:24

University of Texas, Austin.

1:24

Alex uses functional imaging and

1:28

advanced computational methods

1:28

to model how the brain processes

1:31

language and represents meaning.

1:31

He's done a series of extremely

1:35

elegant studies over the last 10

1:35

or so years. And we're going to

1:37

talk about a few of the

1:37

highlights, including a really

1:40

cool study that is just about to

1:40

come out in Nature Neuroscience

1:43

with first author Jerry Tang, by

1:43

the time you hear this, it will

1:46

be out. Okay, let's get to it.

1:46

Hi, Alex. How are you today?

1:51

Hey, Stephen. I'm doing well. Thanks for having me on your podcast.

1:54

Yeah. Well, thanks very much for joining me, and I'm pretty sure that we've

1:55

never met before, right?

1:58

I don't think so. No.

2:00

Yeah. I mean, do

2:00

you go to a lot of conferences?

2:02

I mean, you

2:02

know, in the before times, I

2:05

went to like, SFN frequently and

2:05

SNL every now and then.

2:10

Ok.

2:12

But yeah, hoping

2:12

to start going back again. But

2:14

uh, yeah.

2:15

Yeah, I also

2:15

haven't been to many for a few

2:17

years and when I did, it would

2:17

mostly be SNL, and usually only

2:21

when it was in the US. So, I'm

2:21

not very much out and about, you

2:26

know. And yeah, so where are you

2:26

joining me from today?

2:30

I'm in Austin,

2:30

Texas.

2:32

Yeah.

2:33

Sunny Austin.

2:34

It looks nice. I

2:34

can see outside your window and

2:36

it looks like a beautiful day.

2:37

It is. Spring

2:37

has sprung. Allergens are in the

2:40

air, you know.

2:41

Right. Yeah. I’m

2:41

now in Brisbane, Australia. And

2:44

it's also a beautiful day here.

2:44

First thing in the morning and

2:49

it's apparently fall, which we

2:49

call autumn. I'm just trying to

2:53

get my head back around the

2:53

Aussie lingo. But you know, all

2:56

the seasons are pretty much the same here.

2:59

Lovely. That's that's how I like it. I grew up in Southern California. That's

3:01

part of the course there.

3:03

All right. So you don't know anything about weather?

3:05

I've learned a

3:05

little bit in Texas, which kind

3:07

of surprised me. But yeah.

3:09

Yeah. Cool. So,

3:09

I always like to kind of start

3:13

talking with people by learning

3:13

about like, how they came to be

3:15

the kind of scientist that they

3:15

are and like, basically

3:18

childhood interests and how that

3:18

led you to where you are. So,

3:21

you know, were you were nerdy

3:21

kid? What were you into when you

3:23

were a kid?

3:24

A very nerdy

3:24

kid. I don't know. That's,

3:25

Okay, so you're you went to

3:25

undergrad at Caltech. Is that

3:26

that's how it goes. Right? I was

3:26

very into Star Trek. I was like

3:31

The Next Generation. That was,

3:33

that was my jam. Especially

3:33

Data. Like, I loved Data. He was

3:40

he was just, I don't know, a

3:40

weird robot hero to me. So, I

3:45

don’t know, I got this like kind

3:45

of fascination with artificial

3:48

intelligence based on that, and

3:48

just generally liking science

3:51

fiction. But when I was starting

3:51

college, I was kind of in a

3:59

slump. It wasn't a it wasn't a

3:59

bad period. This is like the

4:01

early 2000s. So, you know, I

4:01

sort of looked around and it

4:09

seemed like one of the really

4:09

interesting ways forward in if

4:13

we want to build machines that

4:13

think like humans is to figure

4:18

out how humans think. So, I

4:18

started getting into

4:20

neuroscience. I remember the the

4:20

first neuroscience talk I ever

4:24

saw was Mark Konishi describing

4:24

the Binaural auditory

4:29

localization circuit and I was

4:29

like, this is really cool. I

4:32

want to, I want to do this kind

4:32

of stuff. So, I started sort of

4:35

getting into, interested in

4:35

neuroscience through that.

4:40

Yeah, yeah. That’s right.

4:40

right?

4:42

Okay. And is

4:42

that close to where you're from?

4:45

Like you mentioned Southern California.

4:46

Yeah. Yeah. So I

4:46

grew up a little bit north of

4:49

Los Angeles in a little town

4:49

called Ojai, California.

4:51

Oh, right. Yeah.

4:51

My wife is from Thousand Oaks,

4:54

so I do know that area. Yeah.

4:56

Right next door. Yeah.

4:57

Yeah.

4:58

Yeah, so

4:58

Pasadena was like a little ways

5:00

down the road. And Caltech is a

5:00

good school. I was excited to go

5:04

there. So yeah, I like started

5:04

doing neuroscience stuff. But I

5:05

Yeah. really kind of started enjoying

5:08

things when I started doing

5:12

research. So, that was in

5:12

Christoph Cox lab, when he was

5:16

with the Caltech, was working

5:16

with blind subjects, we were

5:20

looking at auditory motion

5:20

perception and blind subjects,

5:23

which was exciting and

5:23

interesting and ended up being

5:27

like a general paper in 2008.

5:31

And then from

5:31

there, there was, there was one

5:34

real moment that was like a sort

5:34

of eye opener, like, changed

5:39

things for me was, Jack Ellis,

5:39

who I ended up doing my PhD

5:43

with. He came to Caltech, and he

5:43

gave a talk, and it was about

5:48

vision. It was about like, V2.

5:48

what does the V2 do? How do we

5:51

model V2. But the thing that he

5:51

really talked about was, was

5:57

this approach of just record

5:57

what the neurons do when they

6:04

see natural stimuli. Right, show

6:04

images. Show images that

6:07

actually are things that, you

6:07

know, an animal might see,

6:10

record what neurons do and then

6:10

build models from that. Try to

6:13

figure out, you know, if you get

6:13

1000s of these images, 1000s of

6:17

responses from these neurons,

6:17

can you figure out like, what is

6:20

it about the image that makes

6:20

this neuron fire, and something

6:23

about, like, just that

6:23

perspective of doing things, not

6:28

from the very like, controlled

6:28

experiment, kind of deductive

6:35

approach, but doing this, this

6:35

inductive thing, where you just

6:38

kind of lean on computational

6:38

modeling, and say, like, I'm

6:40

going to let the model figure

6:40

out, like, what this neuron is

6:43

actually doing. I, I just became

6:43

like, insanely excited about

6:47

this. So when I was applying to

6:47

grad schools, Berkeley and

6:52

working with Jack was like, one

6:52

of my kind of top interests.

6:55

Yeah. Okay. So,

6:55

you know, I definitely have

6:57

noticed that like you, you've

6:57

leaned in very much to natural

7:00

stimuli throughout your career.

7:00

You even have a paper on it

7:03

with, with Liberty Hamilton,

7:03

specifically about that, but

7:06

like, so that was really

7:06

actually a driving force for you

7:09

from the very beginning, right?

7:09

And so, you know, what did you

7:12

see as the advantages? And was

7:12

there anything you were worried

7:15

about giving up as in like, kind

7:15

of, well, you weren't really

7:18

getting away from controlled

7:18

designs, because you never did

7:21

controlled designs? But like,

7:21

I'm sure you knew about the, you

7:24

know, the literature and what

7:24

were you worried about giving up

7:27

anything as you moved in this

7:27

new direction?

7:32

had been like,

7:32

kind of familiar with this, and

7:43

learning how to do things there.

7:43

But just the, the idea of this

7:48

natural stimulus thing, where

7:48

it's like, the work of doing the

7:53

science of like figuring out

7:53

what's going on, you kind of

7:56

move it from one place to

7:56

another, or you move it from the

7:58

experimental design kind of

7:58

phase into this modeling phase.

8:02

And I just really liked that

8:02

idea. Of course, you know, there

8:07

are definitely things we give

8:07

up, right? So there are

8:10

correlations and natural stimuli

8:10

and those are hard to break. And

8:15

sometimes you can't tell like

8:15

what is causing your actually

8:19

this, this response. And

8:19

sometimes you have to go in and

8:24

like do kind of focused

8:24

experiments to break those

8:26

correlations, and then figure

8:26

out like, what is actually

8:28

responsible for, you know, what

8:28

this brain area is doing, or

8:31

what this neuron is doing or

8:31

whatever. But overall, the this

8:36

idea of just kind of, like

8:36

replacing this elaborate, like,

8:41

you know, let's test one little

8:41

thing at a time with the big

8:45

picture, like, let's just see

8:45

how the whole system works. And

8:47

then, you know, kind of let it

8:47

sort itself out as we as we

8:50

figure out how to model it. I

8:50

just really like resonated with

8:53

me. I love that idea.

8:54

Yeah, well, it's

8:54

worked out really well. So let's

8:57

kind of give our listeners a

8:57

more concrete idea about what

9:01

you're talking about when you're

9:01

talking about these experiments

9:05

using natural stimuli. And like

9:05

I told you an email is the key,

9:09

we kind of go through some of

9:09

your work chronologically,

9:13

because I do think it build,

9:13

each paper kind of builds on

9:17

the, each big paper kind of

9:17

builds on the previous ones in a

9:21

very satisfying way. So, though

9:21

we might start with them a paper

9:25

which appears to be your

9:25

dissertation research, which is

9:29

2012, published in Neuron. Is

9:29

that right?

9:32

Yep.

9:32

And it's called

9:32

a ‘Continuous semantic space

9:37

describes the representation of

9:37

thousands of objects and action

9:40

categories across the human

9:40

brain’. It's not quite a

9:42

language paper, but that's okay.

9:42

Like it’s, it’s, it’s semantics

9:45

and that's, that's close enough

9:45

for me.

9:48

Yeah. Yeah. So I

9:48

can give you kind of the

9:50

backstory of this. So when I

9:50

joined Jack's lab, this is like

9:54

2009. I thought I was going to

9:54

be doing vision. I was

9:58

interested in vision. I was

9:58

like, this is, this is where the

10:00

cool stuff is happening. It’s,

10:00

it’s the vision world. I thought

10:03

like, you know, the interesting

10:03

problem is this mid-level vision

10:06

problem. Like, we know how V1

10:06

works. We know, you know that

10:08

there's FFA and PPA and whatever

10:08

the high level visual areas, but

10:12

like what's happening in the middle, like, what are the transformations that actually

10:14

get us there? But Jack sat me

10:18

down, like when I, when I joined

10:18

the lab, this was just after the

10:24

Tom Mitchell's science paper had

10:24

come out. This 2008 paper, where

10:28

they were building these

10:28

encoding models in a lot of the

10:32

same way as we do, for words. So

10:32

they had shown people individual

10:36

words, and then they had these

10:36

feature vectors for the words

10:38

and they were you know,

10:38

predicting the brain responses

10:41

to the words and so on. And I

10:41

think Jack had seen this and was

10:45

really excited by this. And so

10:45

he sat me down when I first

10:48

joined the lab, and he said, you

10:48

know, I have a plan for you. I

10:53

want you to work on language.

10:53

And I said wait, I don't know

10:55

anything about language like

10:55

I've never done this before. And

10:58

he said, Okay, so we have a good

10:58

collaborator, Tom Griffiths, he

11:01

knows a lot about language,

11:01

you're going to learn this

11:03

stuff, and you're going to do

11:03

it. And I said, Okay, fine,

11:06

let's do it. So, we started kind

11:06

of going down that road, and

11:10

like designing language

11:10

experiments, which were a lot

11:13

crummier than what we ended up

11:13

eventually publishing. But sort

11:18

of along that road, we started

11:18

trying to build models of word

11:24

representation, that was kind of

11:24

what we were interested in at

11:27

the time was like, these really

11:27

like word embedding models. And

11:30

it was early days of word

11:30

embedding models, were mostly

11:33

focused on things like LSA, its

11:33

latent semantic analysis, which

11:36

is, like 1989, or something like

11:36

this right? Sort of classic word

11:41

embedding models. So, I was

11:41

building these models and I was

11:46

collecting corpora of text and

11:46

doing this kind of early

11:48

computational linguistic stuff.

11:48

And then, we didn't have good

11:53

fMRI data for that yet. So we

11:53

thought like, how do I actually

11:58

test this? How do I figure it

11:58

out? Figure out if it's actually

12:00

working. And what we did have in

12:00

the lab was a lot of data from

12:04

people watching movies. Data

12:04

from people like looking at

12:07

images. So we just

12:08

thought, like hey, let's, let's

12:08

Yup. try this out on on somebody

12:10

watching a movie. Like, you

12:13

know, if we're looking at

12:13

images, right, we have some

12:15

labels of the things that are in

12:15

these images. Let's just plug

12:18

that into a word embedding model

12:18

and see if it works. And it

12:21

worked really well, it turned

12:21

out. It was it was quite

12:23

effective. And so this kind of

12:23

led, led me down this alternate

12:27

path of like, fitting these

12:27

kinds of models, we ended up

12:31

actually kind of changing things

12:31

up quite a bit instead of word

12:33

embeddings, we were using these

12:33

WordNets labels, which are…

12:38

Yeah, I mean,

12:38

because in that first paper,

12:42

it's not trivial how you would

12:42

go from these movies that you

12:44

had people watching these little

12:44

movie clips, to, you know, your

12:48

semantic representations, right?

12:48

I mean, that's actually not

12:50

trivial. Can you, can you can

12:50

you sort of describe how that

12:54

works, just so that people can

12:54

understand the, the paper?

12:58

Yeah, yeah. So,

12:58

the experiment was we had people

13:01

watch a little more than two

13:01

hours of natural video, natural

13:06

movies. This was a stimulus set

13:06

that I actually didn't design.

13:10

This is designed by Shinji

13:10

Nishimoto, who had a fantastic

13:14

paper in 2011, that was decoding

13:14

video from, from human visual

13:18

cortex, which is, like, just an

13:18

incredibly impressive thing,

13:21

even whatever it is, like 12

13:21

years later, I'm blown away by

13:24

that work. And a lot of like,

13:24

his ideas from that have carried

13:28

forward into our into our current work.

13:29

Right.

13:29

But, um, so we

13:29

had this data around, we just

13:33

didn't know like, what was actually happening in these movies, right, we needed to know

13:34

like, what is this a movie of,

13:38

so like, so we can use these

13:38

techniques.

13:41

Yeah, I mean, as a human you know what the movie is of, but you need to quantify

13:43

it for your models, right?

13:45

Exactly, exactly. We need to turn this into numbers so that we can

13:46

model it, right? So, I started

13:51

exploring, like, you know, do I

13:51

use Mechanical Turk to label

13:55

this or something. And we tried

13:55

that, and it ended up being just

13:58

kind of messy, and the results

13:58

were not great. So we ended up

14:02

doing a thing, which I, I tell

14:02

my students the story all the

14:06

time, because I think it's a,

14:06

it’s an example of like, just be

14:09

a little bit less lazy than the

14:09

next person and things can work

14:12

out for you. I just sat down and

14:12

labeled all the movies myself.

14:15

So it took I don't know, like,

14:15

two months or something, just

14:18

like spend an hour, an hour or

14:18

two a day like labeling, you

14:24

know, a few dozen seconds of

14:24

movie or something like this. So

14:26

we labeled it one second at a

14:26

time. Each second, we like wrote

14:29

down, I wrote down like, what

14:29

are all the things, and like,

14:33

are there verbs that are happening or are there actions that are happening? So…

14:37

Yeah.

14:37

This process took a while.

14:38

I thought so! (Laughter)

14:39

But I felt

14:39

confident in the labels because

14:43

I did them. And I knew that

14:43

they're like consistent across

14:46

the whole sets, unlike the

14:46

Mechanical Turk labels, which

14:49

were very again, like messy in a

14:49

lot of ways. So, once we had

14:54

these labels, then, we could

14:54

very easily kind of fit these

14:58

models so we can convert, you

14:58

know, if a, if a scene has like

15:02

a, you know, a dog in it. We

15:02

know that there's a dog there,

15:07

we can convert that into some

15:07

numerical vector that says,

15:09

like, what is the dog? And then

15:09

we can fit these regression

15:12

models that predicts the fMRI

15:12

data, predict how the brain

15:15

responds based on these vectors

15:15

of like semantic information.

15:19

Yeah, okay. Just

15:19

quick meta point, like, I really

15:23

hear you about, like, the time

15:23

the laboriousness right, when I

15:26

read the paper, I was like, that must have taken a really long time and wondering, I wonder,

15:28

did he do that himself? Or does

15:31

he, like, browbeat somebody into

15:31

doing it. But it's so true,

15:34

right? You have to especially

15:34

like, early in your career, like

15:36

you really do sometimes have to like to suck it up and do something really time consuming.

15:38

And I've done that a bunch when

15:42

I was, you know, earlier in my

15:42

career, and now like, whenever I

15:45

give my students some, some

15:45

awful task, I sort of have this

15:49

like, you know, pocket full of things that can tell them well, you know, when I was a kid, I

15:51

did this, I carried my father

15:54

was go on my back through the

15:54

snow. So you can you know,

15:57

transcribe this speech sample.

15:59

Exactly,

15:59

exactly. There's, there is a lot

16:03

of value in that. I think, like,

16:03

a lot of people just see

16:05

something hard and like, give up

16:05

or like, can I find some

16:08

shortcut around this? And just

16:08

sit down and do it oftentimes,

16:11

which is not that bad.

16:11

Yeah, sometimes

16:11

you just need to do it. Yeah. So

16:16

okay, so you've got the movies,

16:16

you've kind of like, you know,

16:18

you've labeled them with words

16:18

as to what's in them, and then

16:20

you and then you mentioned you

16:20

from those words, you get

16:23

vectors of numbers that are

16:23

going to describe the meaning of

16:25

the words. So that's what you

16:25

know, that's kind of an encoding

16:29

like a, what do you call it? A…

16:31

An encoding model.

16:31

Encoding model.

16:31

So can you describe how you get

16:34

that vector of numbers like, for

16:34

those who have not seen that

16:38

approach before?

16:39

Yeah. Yeah. So,

16:39

for this paper, we used

16:45

WordNets, which some people

16:45

might be familiar with. It's

16:50

essentially a dictionary. So we

16:50

have these unique entries that

16:54

are tied to the definition of

16:54

like this entry. Which, which is

17:00

nice because then, this

17:00

disambiguates like words that

17:03

have different senses, right? So

17:03

dog can be a verb, it can be a

17:07

noun, there's like 10 different

17:07

senses of dog, the noun. But you

17:11

know, I know that like dog dot n

17:11

dot 01 is the word I label for

17:14

like, this is a, you know, a

17:14

mammal, of the canid family,

17:17

whatever.

17:18

Yeah.

17:18

So, with WordNet

17:18

you get that kind of sort of

17:23

detail, but then you also get

17:23

information about what kind of

17:27

thing that is. So WordNet

17:27

contains a lot of these hyponymy

17:32

relationships. So like, a dog is

17:32

canid. canid is a carnivore, a

17:39

carnivore is placental mammal,

17:39

placental mammal is vertebrate,

17:44

etc. So you have this kind of

17:44

chain. That's sort of an easy

17:47

one, because it's taxonomy. But

17:47

it covers all kinds of things,

17:50

right? All, all, I don't know

17:50

how many like 10s of 1000s of

17:54

definitions, there are on WordNet, but it’s very expensive. So, we could label

17:55

all these things in WordNet. And

18:01

then we could use actually that

18:01

information to help us out a

18:03

little bit. So what we did is,

18:03

one kind of simple thing you

18:08

could do, is, say just there

18:08

were 1300 something unique

18:12

labels in this movie dataset.

18:12

So, let's just convert each

18:17

frame or each second of the

18:17

movie into like a length 1300

18:20

vector, that is zeros, except

18:20

for the places that correspond

18:26

to like the categories that are present.

18:27

Right. So say

18:27

there's a dog or not dog.

18:30

Right, right. So there'll be a one if there's a dog in the scene and zero if

18:32

there's not. So that's fine.

18:36

That's that's an OK model. It

18:36

turns out, it doesn't work

18:40

terribly well as a model.

18:40

Because it doesn't have a lot of

18:44

information that is actually

18:44

quite important. Right, so like,

18:48

say, there's a dog in some

18:48

scenes, and a wolf in other

18:51

scenes, right? If you just had

18:51

these as like one zero labels,

18:55

then your model has no idea that

18:55

like a dog is like a wolf. You

18:58

have to separately fit weights

18:58

for, you know, how does the

19:01

brain respond to dog? And how

19:01

does the brain respond to wolf?

19:04

Which is kind of inefficient.

19:04

But also it means that you can't

19:09

like, generalize to new

19:09

categories that you didn't see.

19:12

Right? So if your training data

19:12

contains a dog, but then you're

19:15

testing this model on some video

19:15

that contains a wolf, if you had

19:18

no wolf training data, you just

19:18

wouldn't be able to predict that

19:21

But if your

19:21

model knew that, like a wolf is

19:21

at all.

19:21

Right. actually a lot like a dog, these

19:24

are very similar things, then

19:26

maybe you could guess that the

19:26

response to wolf should be like

19:29

dog, and then everything works

19:29

better. So what we did in this

19:33

model is essentially just add

19:33

these hypernyms. So we extended

19:38

it from like the 1300 categories

19:38

that were actually labeled to

19:41

1705 categories, I think that

19:41

were the full set, which

19:46

included all the hyper names of

19:46

the labels that we had. So

19:49

instead of, you know, if the

19:49

scene just had a dog in it,

19:52

instead of just having one sort

19:52

of indicator that there was a

19:56

dog there, there would also be

19:56

an indicator that there's a

20:00

canine that there's a carnivore

20:00

a mammal.

20:03

A mammal. Yeah.

20:03

Yeah.

20:05

Which, like

20:05

later, we kind of actually

20:09

worked out what this kind of

20:09

meant mathematically in an

20:12

interesting way. In that, this

20:12

really actually kind of

20:15

represented, we could think of

20:15

it as like a prior on the

20:19

weights in the model, that we

20:19

kind of push closer together,

20:24

weights for categories that are

20:24

related. So, it would kind of

20:28

enforce that, like the response

20:28

to Dog and Wolf should be

20:31

similar.

20:32

Yeah, by having

20:32

a common covariant really with

20:36

them. Right?

20:37

Exactly. Yeah.

20:37

It turns out that that model

20:40

works much better. Like it's

20:40

much better, like predicting

20:43

brain responses. So, this is

20:43

another sort of critical part of

20:45

the kind of natural stimulus

20:45

paradigm, I think, is that we

20:49

build these models on natural

20:49

stimuli and then we also test

20:53

them by predicting brain

20:53

responses to like new natural

20:56

stimuli, which I think, you

20:56

know, I argue this strongly in

20:59

some papers. I think this is

20:59

really kind of a gold standard

21:03

for testing theories of how the

21:03

brain does things. Right? It's

21:08

like, we want to understand how

21:08

the brain processes visual

21:11

information, or how it processes

21:11

language. Let's just record

21:14

like, what happens when the

21:14

brain is processing language and

21:17

then let's say how well can we predict that?

21:19

Yeah.

21:19

How well can we

21:19

guess how the brain is going to

21:23

respond to this thing? So….

21:25

It’s fascinating

21:25

how quantifiable it is, right?

21:27

You know whether you're understanding things or not?

21:30

Yeah, it gives

21:30

you a score to just like, make

21:33

it go up. Right? You can keep

21:33

tweaking things and make that….

21:35

And I think

21:35

that's what we're going to see,

21:37

as we discuss some of these

21:37

papers today. Your models keep

21:39

Yeah, yeah,

21:39

absolutely. And I think they get

21:40

on getting more and more

21:40

sophisticated, right? And so

21:43

this is a pretty old school

21:43

model at this point, this paper

21:46

is ten years old, maybe even

21:46

older, when you actually did the

21:48

work. We've come a long way

21:48

since then. But I do want to

21:51

start here, because a lot of the

21:51

concepts kind of run through it

21:54

all and just the models get better. better in a way that is

22:00

quantifiable to us.

22:04

Which I think

22:04

it's very nice. Because it's,

22:04

Yeah. it's easy in a lot of ways. You

22:07

know, to think of like, you

22:12

know, for studying some

22:12

psychological phenomenon, we can

22:14

say like, here's a simple model, and then we're going to elaborate, elaborate and

22:16

elaborate that model. And maybe

22:19

it predicts like other little

22:19

things about it, but it becomes

22:23

unwieldy in a way. It's like,

22:23

it’s unclear with like big,

22:25

elaborate models. How well do

22:25

they actually explain things?

22:29

Whereas here, we have this like

22:29

quantifiable metric, right? We

22:31

can say like, it makes the

22:31

number go up, that makes our

22:34

prediction of how the brain

22:34

actually responds to real things

22:37

that we care about, better. Yeah. Very cool.

22:39

Okay. So, you've kind of

22:43

explained how you get the

22:43

numbers, how you turn the movies

22:46

into numbers, that represent

22:46

their meaning. Now, I can't

22:50

remember if you said, you had, I think you have two hours of movies in the study. Is that

22:51

right? And you don't

22:54

have a huge number of

22:54

That’s right. participants. I think you have

22:55

five participants. One is

23:00

yourself and you've got another

23:00

couple of co-authors in there. I

23:03

noticed there is a participant

23:03

called Jay G. But on careful

23:06

examination, it is not Jack

23:06

Gallant.

23:09

It is not. That's right.

23:11

Did you try to

23:11

get him into the scanner to be

23:13

one of the….(Laughter)

23:17

Yeah, he was scanned for some things, but not

23:19

So, a substitute

23:19

JG in his place?

23:19

for this one.

23:22

Yeah. That’s right.

23:22

So yeah, kind of

23:22

the classic psychophysics

23:25

tradition of you know, small

23:25

numbers of participants, most of

23:29

whom are the authors of the study, because that's who will tolerate two hours of scanning.

23:34

Yeah. Yeah.

23:35

Two hours might

23:35

seem like a lot, but it's going

23:37

to get more. So, anyway, how

23:37

many explanatory variables do

23:40

you end up having, at the end of

23:40

the day, when you fit that to

23:43

your data? Like it must be well

23:43

over a thousand, right?

23:45

Yeah, yeah. So,

23:45

the feature space in that paper,

23:52

goes to 1705 parameters. And

23:52

then we also do a thing where,

23:58

you know, if you're if you're trying to predict how the brain is responding to this, like

24:00

ongoing natural movie, you also

24:02

need to capture the hemodynamic

24:02

response function. Right? And

24:07

the standard way to do this, is

24:07

just to like convolve your

24:09

design matrix, which would be,

24:09

you know, the 1700 dimensional

24:11

thing with, with a canonical

24:11

HRF. But, you know, thing that

24:17

they had found in Jack's lab,

24:17

before I even got there, this is

24:20

really, I think, work by

24:20

Kendrick Kay, that showed this

24:23

very nicely. That doesn't

24:23

actually work terribly well,

24:26

especially if you have this

24:26

opportunity to like measure how

24:30

well are you actually doing? Like, how are you predicting? And it turns out that using the

24:33

canonical HRF is kind of bad, or

24:37

you're leaving a lot of like

24:37

variants on the table. So, we

24:42

use this approach the finite

24:42

impulse response model, where

24:46

essentially we fit separate

24:46

weights for each feature for

24:51

several different delays. So

24:51

we're kind of fitting a little

24:53

HRF for each feature. So…

24:54

Yeah.

24:54

For the dog

24:54

feature, we get a little HRF and

24:57

so on.

24:58

Yeah. Okay, you

24:58

have four different delays you

25:00

like two seconds to each. So you're basically modeling the first eight seconds and letting

25:02

the response take any shape it

25:05

does in that time.

25:06

Exactly. I think it’s really actually three delays. It was like, four, six

25:08

and eight seconds.

25:11

Okay.

25:12

We expanded to

25:12

four delays later for language.

25:14

Because language actually has

25:14

earlier, earlier kind of take

25:17

off, it turns out. Visual cortex

25:17

has the slow HRF. Which, it’s

25:23

kind of weird when you think about it because the canonical HRF is built based on like V1,

25:25

and it turns out V1 doesn't have

25:29

like the most standard HRF

25:29

across the brain. It's quite

25:33

different. Auditory cortex has

25:33

like a very short HRF in

25:35

comparison. Motor cortex has a

25:35

different sort of style of HRF.

25:39

Yeah.

25:39

There’s different things happening everywhere. But using this kind

25:40

of method, it blows up your

25:44

parameter space.

25:44

You have to multiply it by three in this case,

25:47

Exactly. Yeah.

25:47

The 1705 times three features.

25:51

Yeah.

25:51

But it still

25:51

works better than using the

25:54

canonical HRF. Because you're

25:54

capturing all these sort of

25:57

details in, in the hemodynamic

25:57

responses across the brain.

26:02

Yeah, I mean, I'd love to talk about HRF. I could talk about HRF for like an

26:03

hour with you, but I probably we

26:07

probably should focus on

26:07

language. (Laughter). But ya,

26:10

no, I mean, the the relevant

26:10

thing here is that it turns

26:13

what's already a large number of

26:13

explanatory variables into a

26:15

very large number. So, are there

26:15

issues with fitting linear

26:19

models that have 5000

26:19

explanatory variables? Or do you

26:22

have enough data to do that?

26:24

So, they're definitely issues. There’s always issues. There’s always

26:25

issues in fitting linear models. I don't know. This is, in the

26:27

time that I was in Jack's lab,

26:32

I'd say maybe a good like, 80%

26:32

of everyone's time and effort

26:37

was devoted to this question of

26:37

like, how do we fit better

26:39

linear models? Like that was,

26:39

that was really central to like,

26:43

everything we did. It's weird,

26:43

because like, that didn't look

26:46

like the scientific output.

26:46

Right? It's like, we didn't

26:48

publish a lot of papers about like, how do you fit these linear models? It just ended up

26:50

being like a tool that we used.

26:54

But that was a massive amount of

26:54

the like, intellectual effort

26:57

there. It was just like, how do

26:57

we do this well? Yeah. So you

27:01

know, we have to use regularized

27:01

linear regression. That's like a

27:04

super important tool. There were

27:04

a lot of different like styles

27:07

of this that were used for

27:07

different projects in the lab.

27:09

It turned out for sort of vision

27:09

models, if you're modeling like

27:13

visual cortex, then one style of

27:13

model works terrifically well.

27:17

This like sparse linear

27:17

regression, because visual

27:21

cortex responses are actually kind of sparse, they only care about like one little part of

27:23

the visual field. Whereas, for

27:26

these like semantic models, that

27:26

I was using, that actually

27:30

didn't work that well, which was

27:30

surprising. And the thing that

27:33

worked really well was ridge

27:33

regression, and that's been kind

27:35

of the mainstay of everything we

27:35

do since then. So this is a L2

27:40

regularized regression. I don’t know. We can….

27:42

It’s definitely

27:42

interesting. Yeah. And I think

27:44

that the big picture point is

27:44

clear. And it's just a kind of

27:48

a, it's just an interesting

27:48

general point of just how much

27:51

of science often is just these

27:51

behind the scenes details that

27:54

make or break the papers. And

27:54

when you read the paper, it just

27:58

basically says, we did L2

27:58

regularize regression and what

28:01

the reader doesn't always know

28:01

is that like, that was a year of

28:05

pain to get to that or, you

28:05

know. So that's very

28:08

interesting. I always, it’s just

28:08

always fascinating to hear about

28:11

the process behind these papers.

28:11

Because like, I think a good

28:15

paper like just read, it reads

28:15

like that was the most obvious

28:18

thing in the world to do. But

28:18

like, sometimes it wasn't

28:22

obvious at all. Okay, so you fit

28:22

models, at every voxel, you fit

28:25

these models and have 5000

28:25

explanatory variables based on

28:29

the semantic feature

28:29

representation and flexible HRF.

28:32

And then what you find is that

28:32

different parts of the brain or

28:35

different voxels have very

28:35

different response profiles. And

28:39

you demonstrate this this in the

28:39

paper with a voxel from the PPA,

28:42

or the, stands for

28:42

parahippocampal place area, and

28:45

another voxel in the precuneus.

28:45

Can you kind of talk about the

28:49

different responses that you saw

28:49

across the brain?

28:52

Yeah, yeah. So

28:52

that's kind of the second stage

28:57

of this style of like encoding

28:57

model science is, you know, we

29:00

can fit the models, we can test

29:00

them, we see they work well. And

29:03

then we can say, like, what is

29:03

it that’s actually causing some

29:06

piece of brain to activate?

29:06

Right? Like, what what are the

29:08

features that are important to

29:08

this, you know, this voxel this

29:11

chunk of brain? So, you know,

29:11

one thing we can do is just like

29:15

look at the weights, right? So

29:15

we just pick out a voxel and say

29:18

like, what are the weights look

29:18

like? There is high weights for

29:20

the PPA case. I don't have in

29:20

front of me, but it's like high

29:25

weights for buildings and cars.

29:25

So it like it likes sort of

29:29

seeing things that are

29:29

constructed and constructed in

29:32

motion. Whereas the precuneus

29:32

voxel, I think it's much more

29:35

selective for like other people

29:35

and animals and this kind of

29:38

thing. So, you know, we can do

29:38

that we can look in detail,

29:42

quite a bit of detail at like

29:42

individual voxels and say, like,

29:45

you know, what does this voxel

29:45

care about? What is this voxel

29:47

care about? But that has its

29:47

limits? There's a lot of numbers

29:52

to look at there and there's a

29:52

lot of voxels in the in the

29:54

brain, right? And this is, you

29:54

know, we're not doing this on

29:56

groups of subjects. We're

29:56

fitting this separately on each

29:58

individual subject. Um, there’s

29:58

a lot of voxels to look at. So,

30:03

what we did instead was, we kind

30:03

of tried to summarize the

30:06

weights by reducing their

30:06

dimensionality. So, we just

30:10

applied like a standard sort of

30:10

machine learning data mining

30:14

technique, Principal Component

30:14

Analysis, use that to squeeze

30:17

down these things from 1705

30:17

weights we average across the

30:21

HRF, down to just three or four

30:21

dimensions. And say, like, you

30:25

know, what are the major kind of

30:25

axes of variation across,

30:28

across the brain? Right? If we

30:28

had to summarize what these

30:31

voxels are doing, like three or

30:31

four numbers, what does that

30:35

what does that say?

30:37

And those, those

30:37

top three or four dimensions,

30:41

capture kind of interpretable

30:41

aspects of semantics, right, in

30:45

this paper, at least

30:46

interpretable ‘ish’. So like, then interpreting those ends up being

30:48

a whole, a whole thing. Like,

30:51

that's, that's difficult and,

30:51

like, contentious and yeah,

30:56

yeah. So, I mean, they do

30:56

separate some things that are

31:00

sort of natural, and we would

31:00

expect to see, so like animate

31:03

versus inanimate categories.

31:03

It's a big distinction,

31:08

essentially, human versus non

31:08

human. I think like, sort of

31:13

buildings, artifacts, and

31:13

vehicles versus other things. So

31:18

these are the kind of like major

31:18

dimensions that we see pop out.

31:22

And one thing we did in that

31:22

paper was also like we compared

31:25

these dimensions to, there‘s

31:25

been a lot of literature in the,

31:31

the field of people looking at

31:31

ventral, ventral temporal cortex

31:34

of like, what are the major

31:34

dimensions of representations.

31:38

So, one of those was animacy. So

31:38

this is from Jim Haxby’s lab.

31:44

Andy Connolly had this great

31:44

work showing that like, there

31:47

seems to be this like gradient

31:47

of animacy, across ventral

31:51

temporal cortex. And that came

31:51

out like very naturally, in our

31:54

data. We really saw that, that

31:54

there was like this, this

31:57

beautiful, you know, animacy

31:57

kind of dimension. Other things

32:01

we found, like less evidence

32:01

for, but yeah, we still got to

32:06

kind of explore that space.

32:07

Yeah, like my

32:07

notes of it. And this is just

32:10

like my notes that I when I read

32:10

it, which is a couple of weeks

32:14

ago, whenever I wrote the first

32:14

component, motion slash animacy,

32:19

second component, social

32:19

interaction, third component,

32:22

civilization versus nature.

32:22

fourth component, biological

32:25

versus non biological. So, like

32:25

you said, I'm not like trying to

32:28

hold you to those. But I think

32:28

it's kind of interesting that

32:30

like, these are the big, big

32:30

organizing principles. It kind

32:35

of like a bit of an, it's a

32:35

window into like, what's really

32:38

important for humans, right?

32:38

Like, these are the major axes,

32:41

on which the semantic system is,

32:41

you know, has the most variance.

32:45

Yesh, yeah. And

32:45

that's, that's something that I

32:50

really like about this approach,

32:50

too, right? Is that, like, we're

32:52

not taking those things as given? We’re not baking

32:55

those things into our

32:55

No. experimental design. We're just

32:56

saying, like, watch a bunch of

32:59

videos, and let's see what falls

32:59

out. Right? Like, yeah, what are

33:02

the differences across the brain? What are the major distinctions the brain makes?

33:05

Yeah. And the brain did. And he was surprised in this paper that the brain

33:07

didn't care about object size,

33:10

which actually, maybe, is not

33:10

that surprising. Like, maybe it

33:14

shouldn't, right?

33:16

Maybe it shouldn't, that's one of these, like, sort of proposals for what

33:17

was the major organizing kind of

33:21

feature of, of visual cortex was

33:21

object size, and we found less

33:25

evidence for that. I don't know

33:25

what the current kind of status

33:28

of of that theory is.

33:29

Yeah. Me

33:29

neither. Okay, so like, just

33:33

kind of big picture was like,

33:33

you know, can you describe,

33:36

like, the what, which brain

33:36

areas were responsive to these

33:40

semantic distinctions in this study?

33:43

Yeah, so big

33:43

picture. You know, what we saw

33:48

here was that this sort of

33:48

semantic selectivity for visual

33:53

concepts, was not isolated to

33:53

just like these few areas in

33:58

higher visual cortex. Which is

33:58

kind of the picture that we had,

34:02

loosely, from a lot of studies

34:02

that came before this, right?

34:05

So, we knew about like, the

34:05

place selective areas, we knew

34:08

about face selective areas, body

34:08

selective areas. And those were

34:12

kind of the major, you know,

34:12

sets of things that we knew

34:15

about.

34:15

Yeah.

34:16

And those turned

34:16

out to be quite important and

34:19

very clearly, like come out in

34:19

this kind of data. But we think

34:22

of those more as like, you know,

34:22

ticks in a terrain, right? So

34:28

there's this actually

34:28

complicated terrain, kind of all

34:31

across higher visual cortex.

34:31

Like if you go outside of

34:33

retinotopic visual cortex,

34:33

there's kind of a band of, of

34:36

cortex that stretches around

34:36

that, like all the way around

34:38

the brain. And all across this

34:38

band, we see selectivity for

34:41

something for like some kind of,

34:41

you know, specific semantic

34:46

features. So that's kind of the

34:46

majority of where this happens

34:49

is sort of in this band, like

34:49

around higher visual cortex. We

34:54

also see quite a bit of stuff

34:54

happening. You know, there's

34:57

there's kind of these weird like spirits that come out of visual cortex. So, up to the to the

34:59

pSTS, there is some visual

35:04

representations up there,

35:04

through the intraparietal sulcus

35:08

and sort of onto the medial

35:08

surface of the cortex, there’s

35:12

another kind of spear of visual

35:12

cortex. And then up in

35:16

prefrontal cortex, there's also

35:16

like a few selected areas that

35:20

are quite visually selective. So

35:20

there's some face patches and

35:22

like the frontal eye fields are

35:22

very visually responsive.

35:26

It's funny, you're calling them visual, but like, don't you think they're

35:27

semantic?

35:31

Yes. (Laughter)

35:31

Because all of these things, you

35:35

know, we've seen later are like,

35:35

you know, they don't really care

35:38

that it's vision per se. That's

35:38

not quite true. So, FEF does

35:43

care that it’s vision. FEF kind

35:43

of only responds to visual

35:46

Frontal eye field. Yeah. stuff.

35:48

Yeah.Intraparietal

35:48

sulcus, it’s quite visual. So,

35:51

that's kind of a gap in the

35:51

other maps that we see. Well,

35:55

lot of this stuff in higher

35:55

visual cortex, especially, we

35:58

call it visual cortex, because

35:58

that's how we were studying it,

36:01

but it turns out that that

36:01

overlaps, like very heavily with

36:05

other modalities of

36:05

representation.

36:09

Cool. Yeah, I want to talk more about like, the anatomy of the areas that

36:11

you find, but maybe best in the

36:15

context of the next paper where I think it comes out more clearly. So shall we move on to

36:17

the next one now?

36:21

Sure thing.

36:22

So this is your

36:22

2016 Nature paper. And, you

36:27

know, I've always noticed that,

36:27

like, Nature doesn't publish

36:30

fMRI, you know? Like science

36:30

does, or they did you know,

36:34

like, back in the heyday of

36:34

fMRI, where, you know, every man

36:37

and his dog was getting, you

36:37

know, these high profile papers,

36:40

it was only science that was

36:40

buying the Kool-Aid, you know?

36:44

Nature, the only fMRI papers

36:44

they ever published was like,

36:49

sorry, I'm blanking on the one,

36:49

Logothetis et al., where, you

36:55

know, they actually do

36:55

simultaneous fMRI and, you know,

37:00

direct cortical recording. I

37:00

mean, that was good enough for

37:02

Nature. But generally speaking,

37:02

Nature does not publish fMRI.

37:05

So….

37:05

We were pleased that this happened.

37:06

Congratulations.

37:06

Yeah. If any paper was going to

37:11

be in Nature, I think this is a

37:11

worthy, a worthy one.

37:15

Thank you. So,

37:15

it was, it was a big effort,

37:18

yeah.

37:19

Definitely. So

37:19

in this one, like, it's

37:21

definitely a bit more language

37:21

than the other one. So, because

37:24

the stimuli are language, right?

37:24

So, can you tell us about the

37:26

stimuli?

37:28

Yeah, yeah. So

37:28

this was, you know, what I said

37:32

before is like, you know, I kind

37:32

of started off, you know, we

37:34

wanted to do language, we wanted

37:34

to do encoding models for

37:36

language. I did this kind of

37:36

offshoot project into vision

37:39

that was kind of using our

37:39

models that we were designing to

37:43

study language started to do

37:43

vision. But at the same time, we

37:45

were sort of continuing down

37:45

this path of doing language. So,

37:49

we’d done some, some tests with

37:49

other kinds of stimuli. So, we

37:52

tried things like having people

37:52

read sentences with RSVP, just

37:57

like one word at a time. That

37:57

didn't work terribly well, it

38:01

turns out, it does work fine. We

38:01

were just kind of doing things

38:03

It teaches us

38:03

something about prosody, the

38:04

poorly at the time. We were

38:04

worried about timing. So we're

38:05

fact that that's not going to work. worried about you know, like,

38:07

when did the words happen? So,

38:13

my co-author, Wendy de Heer and

38:13

I, we really, like developed

38:16

this experiment together. We

38:16

spent a lot of time doing things

38:19

like recording ourselves

38:19

speaking stories, at a rate of

38:24

one word per second. (Laughter)

38:24

Which is the most mind

38:29

bendingly, like awful thing to

38:29

listen to. Like imagine, it's

38:34

shocking, like just how boring

38:34

like, verging on like painfully

38:38

boring, that is. It’s awful.

38:44

Right? So, we

38:44

did these, like one word per

38:49

second experiments, terrible.

38:49

But it was it was controlled in

38:54

a sense that, you know, we knew

38:54

when every word happened. And

38:58

then Wendy said, why don't we

38:58

just try something like actually

39:01

natural. She had been listening

39:01

a lot to The Moth, which is a

39:05

storytelling, podcast and radio

39:05

show. And she said, why don't we

39:09

try listening to one of these

39:09

stories? And I was like, you

39:13

know, how are we going to deal

39:13

with the timing? And she said,

39:15

Oh, this is the thing that

39:15

linguists have totally figured

39:18

out. You need to transcribe it,

39:18

use this Forced Aligner, you can

39:20

figure out when each word is.

39:20

Fine. And I was like, Okay, if

39:23

you know how to do that, that's

39:23

great. So we did it. We

39:26

collected some data with, I

39:26

think just the two of us

39:28

listening to these Moth stories.

39:28

In fact, just like one more

39:31

story to start. And it just

39:31

immediately worked extremely

39:36

well. We got beautiful signal

39:36

quality, like all across the

39:39

brain. We do a sort of pretest

39:39

on a lot of the stimuli that we

39:43

use, which is we just have

39:43

someone listen to the same

39:45

stimulus multiple times. Right?

39:45

We just like, have this person

39:49

listen to the same story twice.

39:52

Intra-subject

39:52

correlation, huh?

39:55

Intra-subject. Yeah, exactly. So it’s not inter-subject correlation,

39:56

intra-subjects. It’s just within

39:59

each voxel, how correlated are

39:59

the responses? Kind of a measure

40:02

of like, how big are the

40:02

functional signals that we're

40:04

getting. Right? We've done that

40:04

for these like, you know, one

40:07

word per second stimuli. It was

40:07

God awful. And then we did it

40:11

for The Moth story, and it was

40:11

just beautiful, like the whole

40:13

brain was trying to respond…

40:14

But you don't

40:14

know, yeah., sorry. You don't

40:16

know yet. What's driving that?

40:16

Right? If you're…

40:19

Right.

40:19

I mean, it could be anything. It could be something really trivial, like,

40:21

you know, auditory amplitude,

40:24

right? But it's not, but you

40:24

don't know yet, right?

40:28

That was actually, what Wendy was interested in. So she was a grad

40:30

student in Frederic Theunissen’s

40:32

lab, they study sort of low

40:32

level auditory processing,

40:35

mostly in songbirds, but now

40:35

also in fMRI. So when he was

40:38

interested in sort of building

40:38

these acoustic models, like

40:42

spectral representation models

40:42

of auditory cortex. So, she was

40:45

totally fine with that. Yeah,

40:45

but we didn't know why this

40:49

activation was happening. So, we

40:49

got these maps. We are like,

40:51

this is beautiful. Let's keep

40:51

doing this. So we just collected

40:54

a bunch of data of us at first,

40:54

just listening to these Moth

40:57

stories, listening to lots of

40:57

Moth stories. We collected a

41:03

two sessions of five stories

41:03

each of listening to these Moth

41:07

stories. We went through this

41:07

the transcription and alignment

41:12

process, which I learned

41:12

eventually. So then, you know,

41:17

we had this information about

41:17

like, every single word in the

41:19

stories, and exactly when all

41:19

those words were, were spoken,

41:23

right? So we had this aligned

41:23

transcript. And then we could

41:27

start to do kind of interesting

41:27

things, right? So, we also have

41:30

phonetic transcripts, we can

41:30

build phonetic models. So, just

41:33

say, you know, our feature

41:33

spaces, the English phonemes.

41:38

What does each voxel respond to

41:38

in phoneme space? We could build

41:42

acoustic models with this data.

41:42

So get sound spectra for the

41:45

stories, and then use that to

41:45

model the brain data. And we

41:48

could get these semantic models,

41:48

right. So this is using these,

41:53

again, kind of primitive word

41:53

embedding models, latent

41:55

semantic analysis, and so on.

41:55

And those worked very well

41:59

across like a big chunk of

41:59

brain. And that was, that was

42:02

very exciting. That actually

42:02

happened, I think in like 2012.

42:04

So that was around the time that

42:04

the earlier like movie paper was

42:07

coming out when we had these

42:07

first results that were showing

42:10

that this is really starting to

42:10

work and starting to show us

42:13

something.

42:13

Okay, so in this

42:13

case, you're not having to go

42:16

through it laboriously identify

42:16

what's in the like, because

42:20

previously, you were looking at the movies and saying what is there semantically, right? Like,

42:21

whereas here, you're just

42:25

actually taking the words that

42:25

are transcribable and then

42:28

deriving semantic

42:28

representations for the words

42:33

automatically from that point,

42:33

right? Like…

42:35

Yeah.

42:35

So, it's a lot less labor intensive.

42:38

Definitely. And I mean, the transcription process, I think it's the most

42:40

similar to like the labeling

42:43

process.

42:43

But it’s sort of

42:43

much more deterministic.

42:45

Yeah. It’s much

42:45

easier, much easier. No, no

42:48

judgment calls to be made,

42:48

really. You can listen to

42:50

something at half speed, if

42:50

you're pretty good and just bang

42:53

it out. Yeah.

42:57

Okay. So, yeah,

42:57

I know. And it works a lot

43:00

better, right? Like then,

43:00

compared to the previous paper,

43:04

like the semantic maps here, are

43:04

just a lot more clear and

43:11

expansive, right?

43:13

Yeah, yeah. So,

43:13

where the, you know, the visual

43:17

stimuli really elicited these

43:17

responses, like in this band

43:20

around visual cortex, these

43:20

stories stimuli, we got stuff

43:24

everywhere, got stuff

43:24

everywhere. It was all over,

43:29

sort of, Temporoparietal

43:29

Association Cortex, all the way

43:32

from like, ventral temporal

43:32

cortex, up through sort of

43:38

lateral parietal cortex and down

43:38

into the precuneus. And then

43:41

also, all across prefrontal

43:41

cortex, we found this strong,

43:45

predictable responses to

43:45

language, with these semantic

43:49

features. So, this um, just

43:49

worked really well, kind of

43:53

quickly. One thing that at this

43:53

point that we were like, kind of

43:56

freaked out by, was, we just

43:56

didn't find any asymmetry in

43:59

terms of model performance

43:59

between the two hemispheres, it

44:01

was like the right hemisphere

44:01

was responding just as much or,

44:04

you know, our models were working just as well in the right hemisphere, as in the left

44:06

hemisphere.

44:09

Sorry, when you talk about working, I really want to talk very much when I'm

44:10

talking about this, because it's super interesting. I just wanna

44:12

make sure you, understand what

44:15

we're saying exactly. So, when

44:15

you say working well, or you're

44:18

talking about predicting held

44:18

out data, or you're just talking

44:21

about having a lot of variance

44:21

by semantics?

44:24

Yes, yes. Sorry,

44:24

that I mean, predicting held out

44:26

data. So we can….

44:28

So, yeah, so you

44:28

can, you're listening to two

44:30

hours of stories and then

44:30

predicting another 10 minutes.

44:34

Right.

44:35

What the brain

44:35

should look like, on a story

44:37

that the model hasn't seen

44:37

before and many voxels in the

44:40

brain are really good at this,

44:40

but not all, and then you make

44:44

these maps of like, where are

44:44

the voxels that are good at

44:46

predicting where you, where

44:46

you're able to predict. Because

44:49

I mean, I guess the I mean,

44:49

forgive me for breaking it down,

44:52

like really simply but like

44:52

nothing. If a voxel is not able

44:56

to predict a held out data that

44:56

probably means it doesn't have

44:59

semantic representations. Because if, it if it doesn't have semantic representation

45:01

then wouldn't be able to,

45:03

because why would it? And then

45:03

if it does, then it should,

45:06

right? So, it's kind of a really

45:06

good window into like, where in

45:08

the brain there are semantic representations.

45:11

Exactly. That's

45:11

kind of one of, you just

45:13

verbalized one of the core kind

45:13

of tenets of this kind of

45:17

encoding model style science.

45:17

Which is that, if you can

45:21

predict what a voxel is doing,

45:21

on held out data, using some

45:24

feature space, then we take that

45:24

as evidence that, that voxel’s

45:30

representation is tied up to

45:30

that feature space. That it is

45:32

related to that feature space in

45:32

some way. And of course, there

45:34

can be spurious correlations,

45:34

and we see this and you know, we

45:36

can try to explain those away in

45:36

various ways. But basically,

45:40

that's the kind of inference

45:40

that we try to make. Right? So,

45:44

so we found that like, right

45:44

hemisphere, we could predict

45:47

right hemisphere, just as well, as we could predict left hemisphere, there was no real

45:49

asymmetry and friction there. I

45:51

remember showing this to another

45:51

grad students when I’d first

45:55

found this and he said, nobody's

45:55

gonna believe you in the

45:59

language world. (Laughter) Like,

45:59

too weird. If you don't find

46:02

left lateralization, like, nobody's gonna believe you. Which, I don’t know, it has

46:05

ended up being very interesting

46:08

in terms of how people think

46:08

about lateralization. So, um,

46:13

Liberty Hamilton, who's a

46:13

longtime collaborator of mine,

46:18

and I'm married to, she also,

46:18

you know, this is kind of a

46:24

bugaboo that we have together

46:24

is, you know, she's seen in a

46:28

lot of her work in

46:28

electrophysiology, that, right

46:30

hemisphere, auditory cortex,

46:30

like definitely represents

46:33

language stuff, for perception,

46:33

at least. And that's really what

46:38

we saw here, too, is that like,

46:38

for language perception, right

46:41

hemisphere, it was engaged in

46:41

sort of semantic representation

46:45

to the same kind of degree as left hemisphere.

46:47

Yeah. And so

46:47

what are you? I mean, do you

46:51

think that is? I mean, I feel

46:51

like, when I first saw your

46:55

paper, that was what jumped out

46:55

at me too. And I struggled with

47:00

it briefly and then I came to

47:00

terms with it. Um, so, it

47:05

doesn't trouble my mental model

47:05

anymore. How do you, how do you

47:09

interpret it? I mean,

47:09

specifically, how do you square

47:13

it with the fact that aphasia

47:13

only results from left

47:16

hemisphere damage?

47:18

Yeah. So, I

47:18

mean, I think there's a broader

47:21

question of like, how we square

47:21

a lot of these results with the

47:24

Aphasia literature, which is

47:24

difficult, right? Because the

47:28

literature says that, there's

47:28

only kind of a small selection

47:30

of areas that if you have

47:30

damaged those areas that it

47:33

causes, you know, this loss of

47:33

semantic information, right?

47:38

Loss of like, word meaning

47:38

information. But we saw these,

47:41

like, much broader, like big

47:41

distributed things all across

47:45

like prefrontal cortex, parietal

47:45

cortex, whatever, this just

47:48

really did not match what people

47:48

had seen in the aphasia

47:50

literature. And, you know,

47:50

especially then for the right

47:54

hemisphere, as well, right, we see it all over the right hemisphere, and that, again,

47:55

just didn't match. So, I think

48:01

of this in kind of two ways. So,

48:01

one is that, you know, what

48:07

we're showing here is kind of

48:07

what types of information are

48:14

correlated with activity in

48:14

these voxels? Right? So, you

48:18

know, if somebody is listening

48:18

to a story, and the story is

48:22

talking about, you know, the

48:22

relationships between people and

48:25

so on, and you're trying to

48:25

process that information, then

48:27

there's a bunch of voxels that become active, there's a bunch of brain areas that become

48:29

active that start to turn on in

48:33

the presence of this kind of

48:33

social information. That doesn't

48:36

necessarily mean that like,

48:36

those are the areas that, you

48:40

know, link, the words that

48:40

you're hearing to the meaning

48:44

that you're hearing. That maybe

48:44

downstream of that, and I think

48:46

most of them actually are downstream of that. They’re involved in like, some kind of

48:48

cognition around this concept,

48:51

but not necessarily like, just

48:51

the the process of like linking

48:55

words to meaning linking

48:55

meanings of words together.

48:58

I think the other linking is so critical here. Yeah.

49:02

Yeah. So we kind

49:02

of can't disentangle that here

49:05

and I think that is probably one

49:05

of the real kind of drivers of

49:10

what people think the Aphasia literature. And I know, this is a popular topic there. The other

49:12

thing that I kind of point to,

49:18

in trying to square this with

49:18

with aphasia, is the fact that

49:22

we see quite a bit of like

49:22

redundant representation across

49:25

cortex, right? We don't see,

49:25

there's just, you know, one

49:28

patch of cortex that cares about

49:28

the social information are one

49:31

patch of cortex that cares

49:31

about, I don't know, time words,

49:35

for example, it's another category that we saw it kind of pop out. It's actually you have

49:37

many patches to care about these

49:41

things, right? We have a whole

49:41

kind of network for each of

49:43

these kinds of concepts kind of

49:43

topics. So, I think it's very

49:49

plausible that like, even if

49:49

these areas are really like

49:53

causally involved in

49:53

representing and processing that

49:56

kind of information that

49:56

damaging Some of them won't be

50:01

sufficient to actually cause the

50:01

deficits that you would see in

50:04

aphasia.

50:05

Yeah. Okay.

50:05

Yeah. I mean, I definitely agree

50:08

with your interpretation. I

50:08

mean, I think I would put it

50:12

like you are really studying

50:12

thought here, right? In a way,

50:17

like it's out like downstream of

50:17

these links, right. So I think

50:21

we think with both hemispheres,

50:21

and you know, just because

50:26

language well, the link with the

50:26

lips but the links are in the

50:28

left hemisphere, right? So if

50:28

you took a person with aphasia,

50:31

and did your study on them, like

50:31

a severe receptive aphasia,

50:35

maybe, I think you probably

50:35

wouldn't see those semantic

50:40

representations in the right hemisphere, either, even though the right hemisphere would be

50:42

intact, right? Because it would

50:45

never get there, because the

50:45

links which are left lateralized

50:48

would not exist, and that would,

50:48

and therefore you wouldn't be

50:51

able to generate that those

50:51

semantic representations from

50:53

the linguistic input. Right? So

50:53

I actually don't think it's

50:56

inconsistent at all on, on

50:56

second thought, although like,

51:00

you know, before your paper came

51:00

out, I think a lot of us kind of

51:04

thought, oh, yeah, the semantic

51:04

representations left lifeless,

51:06

but I don't know if I thought, I

51:06

hope I didn't think that because

51:10

it would be silly. Because, you

51:10

know, one thing that I know from

51:13

working with people with aphasia

51:13

is that they understand the

51:15

world that they live in, right?

51:15

They're not like walking around

51:18

confused, and not knowing what's going on.

51:20

Absolutely.

51:21

It's a language,

51:21

I mean, it's a language deficit.

51:25

And the only patients that don't

51:25

understand the world around

51:28

them, and neurodegenerative

51:28

patients who have bilateral

51:31

damage, that specifically like

51:31

semantic dementia, like when

51:34

it's advanced, or you know,

51:34

Alzheimer's when it's advanced,

51:36

right, but like, it'll any kind

51:36

of lateralized brain damage the

51:39

you don't really get real

51:39

semantic deficits, like, you get

51:42

like semantic control deficits.

51:42

They can't do semantic tasks,

51:46

maybe for a myriad of reasons.

51:46

But you know, you never get

51:50

somebody that doesn't understand

51:50

what's going on. So it actually

51:55

makes total sense.

51:57

It does.

51:58

It's really,

51:58

it's nice to kind of square it.

52:00

Because yeah, it definitely like

52:00

struck me. That was the thing

52:04

that really leapt out at me was

52:04

how bilateral it was and

52:07

symmetrical.

52:09

Yeah, so it's

52:09

definitely symmetrical in terms

52:12

of what we've been talking about

52:12

of just like how well can we

52:15

predict these areas. But it

52:15

turns out that it's actually not

52:18

super symmetrical in terms of

52:18

exactly what information is

52:21

represented there.

52:22

Oh, really?

52:22

So, we see a

52:22

little bit of a hint of that in,

52:25

in this 2016 paper, where

52:25

there's an asymmetry in terms of

52:30

representation of specifically,

52:30

like concrete words, concrete

52:35

concepts, that seems to be more

52:35

left lateralized than right

52:39

lateralized. So, it's like the

52:39

representations are as strong,

52:43

in away, but they're just kind

52:43

of of different things. In more

52:46

recent work that we've yet to

52:46

publish, but we're very excited

52:50

about which is doing a similar

52:50

kind of thing, like we did in

52:53

this paper, but using more

52:53

modern like language models. So

52:57

we're looking at sort of phrase

52:57

representations instead of word

53:00

representations. We see that

53:00

this asymmetry is really

53:04

pronounced in terms of sort of

53:04

representational timescale,

53:08

where the right hemisphere seems

53:08

to represent sort of longer

53:10

timescale information than the

53:10

left hemisphere.

53:13

That's interesting.

53:15

It’s maybe tied

53:15

into this like concrete abstract

53:17

distinction, which is also sort

53:17

of associated with timescale.

53:20

This is my student Shailee Jain,

53:20

who's working on this. We're

53:24

very excited about what this is gonna show.

53:26

Okay, cool.

53:26

Yeah, actually, the next thing I

53:29

want to talk about is one by by

53:29

her, but not not the one you're

53:32

mentioning, I think. But um,

53:32

yeah, one more thing about this

53:35

one, before we move on from it,

53:35

well, two more things, actually.

53:39

So, the lateralization might

53:39

have come as a surprise to

53:42

language people, but the within

53:42

each hemisphere, the areas that,

53:48

where you see the semantic

53:48

predictability match pretty well

53:52

on to what the language

53:52

neuroscience community had kind

53:57

of settled on as the semantic

53:57

network of the brain, right?

54:00

Absolutely.

54:00

Yeah. No, I really love this, I

54:04

think it’s the 2009 review paper

54:04

from Jeff binder.

54:06

Yeah, it's one

54:06

of the most useful papers that I

54:10

go back to again and again. Just

54:10

love that one.

54:13

It's beautiful.

54:13

I love how they break down, you

54:15

know, the different types of

54:15

experiments and you know, what,

54:18

what kind of approaches they

54:18

like and don't like. But there's

54:20

a figure there that I often show

54:20

in talks, which is, it's just a

54:25

brain with little dots on it in

54:25

every place that's been reported

54:28

as like an activation in some

54:28

paper for some semantic task and

54:32

it matches so well what find in

54:32

this work, right?

54:36

It does.

54:36

The entire prefrontal cortex, the entire kind of parietotemporal cortex.

54:38

I often say this is…

54:42

Midline areas,

54:42

too, you know? You have the same

54:45

midline areas, like you both

54:45

have like, you know, you've got

54:47

your medial prefrontal, and then

54:47

precuneus in the middle and not

54:51

yet like, it’s not obvious at

54:51

first glance, because you guys

54:55

use flat maps right? So

54:55

everything is like kind of

54:57

flattened out, because you're

54:57

at, this is a way you can tell

55:00

you you grew up in vision,

55:00

because like…

55:03

Exactly.

55:03

You use flat

55:03

maps, but as soon as you, like,

55:06

just take a step back, you

55:06

realize, oh, that's just the

55:08

semantic network.

55:10

Exactly. Yeah, I

55:10

don't know, sometimes I say it's

55:13

like, it's easier to say the

55:13

parts of the brain that don't

55:16

represent semantic information,

55:16

which is like somatomotor cortex

55:19

and visual cortex. Like those

55:19

are, those are the big ones.

55:22

There's like two big holes in

55:22

this map. Everything else, kind

55:25

of cares to some degree or

55:25

another.

55:29

Yeah. Okay. Oh,

55:29

yeah. The other thing I wanted

55:32

to talk to you about this paper,

55:32

like, you know, it's just like a

55:37

masterpiece of visualization

55:37

among all, among everything

55:41

else, right? Like, it's just so,

55:41

the figures are so beautiful,

55:44

and you've got all these like….

55:45

Thank you.

55:46

You’ve got all these three dimensional animations that can be found on

55:47

the web. Can you kind of talk

55:51

about that aspect of it? Like,

55:51

is that something you really

55:54

enjoy? Like, what kind of, how

55:54

did you develop those skills?

55:57

Like, you know, there's definitely a piece there that’s like, pretty special.

56:01

Yeah, I I love

56:01

it. I love visualization. I,

56:05

yeah, I think partway through

56:05

writing this paper, I went to

56:09

seminar like an Edward Tufte

56:09

seminar. And so I started trying

56:13

to do everything in Tufte style,

56:13

whatever, I love it, I love his

56:16

idea of already call it like

56:16

super graphics, super

56:22

infographics like something

56:22

that's just like, incredibly

56:25

dense with information that you

56:25

can stare at for a long time,

56:27

and you keep seeing new stuff. I

56:27

really liked that idea. And so

56:30

that's what I kind of tried to

56:30

replicate here. So a lot of this

56:34

work is really based on the

56:34

visualizations are based on a

56:38

software package that we

56:38

developed in Jack's lab, pi

56:41

cortex. So this is really led by

56:41

James gal who was a grad student

56:45

there with me, like, brilliant

56:45

programmer, like, polymath, he

56:53

can do so many things. So you

56:53

know, he's a neuroscientist, but

56:55

then he's also, you know, able

56:55

to write this very massive

57:00

amounts of low level code for

57:00

showing these brain images in a

57:03

web browser. Really just like,

57:03

fantastic stuff. So I worked

57:08

with James on developing this

57:08

pipe vortex package, then a

57:11

couple other people in lab,

57:11

especially Mark Lavoie was a was

57:13

a big driver of this as well.

57:13

And, you know, so because we

57:18

were also developing the

57:18

visualization package, alongside

57:23

like doing the science, we could

57:23

make it do anything that we

57:26

wanted to do, right, like any

57:26

idea that we had, like, we

57:29

should make it look like this,

57:29

we can just do that we could

57:31

like spend some time and

57:31

implement that. So I really

57:35

liked the sort of close

57:35

commingling of those two things

57:38

like developing the

57:38

visualization software and the

57:40

science at the same time. And I

57:40

think that's, I think it's nice.

57:44

I think it's powerful.

57:45

Yeah, it's very powerful. I mean, because the data is so multidimensional,

57:47

right? And like, you can't

57:49

really use conventional, if you

57:49

try to display it with

57:52

conventional tools, as you're

57:52

not going to be able to convey

57:55

it. You know, it's so funny that

57:55

you mentioned Tufte, right?

57:58

Because like, I love Tufte as

57:58

well, like my wife, actually,

58:02

she's a librarian, and she, I

58:02

think she introduced me to

58:05

Tufte, a long time ago. Like,

58:05

she gave me a Tufte book for

58:08

Christmas, probably 15 years

58:08

ago. It's ‘The visual display of

58:12

quantitative information’. You probably have read it.

58:14

Wonderful book. Wonderful book. Yeah, I read that in college. My roommate got

58:15

it, and I was like, this is cool.

58:17

Yeah, so she gave it to me for Christmas. And I was just like, transfixed by

58:19

it. I’d spent the rest of the

58:22

holiday reading it and studying

58:22

it. And I just went back to my

58:26

work, so invigorated and I

58:26

worked on this paper, it was

58:29

2009 paper in NeuroImage, about

58:29

support vector, predicting PPA

58:36

subtypes. But all the

58:36

visualizations like I was now

58:39

like, really inspired by this

58:39

Tufte book and I like, spent a

58:42

lot of time on those figures and

58:42

I’d be thinking how would Tufte

58:44

make this figure.

58:46

It's nice.

58:46

That's cool. And

58:46

so yeah, you guys use Python,

58:49

like, what do you have like a,

58:49

what other kinds of technical

58:54

infrastructure do you use for

58:54

developing your stuff?

58:59

Yeah, so um,

58:59

it’s mostly Python, but, you

59:03

know, a lot of the visualization

59:03

is actually done in the web

59:06

browser, like through

59:06

JavaScript. So this was, I

59:11

forgot back in like, 2013,

59:11

maybe. James had been working on

59:16

PI cortex for a little while and

59:16

we were trying to figure out,

59:20

like, we were trying to move

59:20

away from Mayavi, which is like

59:23

a 3d display library that is

59:23

very powerful, but also, like,

59:27

very clunky and like hard to use

59:27

in a lot of ways. And I said,

59:32

like, let's make a standalone,

59:32

you know, like, 3D application

59:35

that you can run on your

59:35

computer, you know, in Python,

59:38

whatever. And James said, no, no, let's let's do it in the browser, let's like actually

59:40

send information to the browser.

59:42

And I was like that, that's

59:42

crazy. That seems really hard.

59:44

Like, why would we do that? And

59:44

he completely ignored me, which

59:46

was like 100%, the right thing

59:46

to do, and he wrote this thing,

59:51

that was Python interacting with

59:51

the browser and 10s of 1000s of

59:56

lines of JavaScript code to

59:56

display these brain images. But

1:00:00

it's, it’s just fantastic. So,

1:00:00

you can interact with the brain

1:00:03

images in all kinds of fun ways.

1:00:03

We can build these like

1:00:07

interactive viewers like we have

1:00:07

for this paper where you can

1:00:09

click on different parts of the

1:00:09

brain, and it'll show you

1:00:11

exactly what that little piece

1:00:11

of brain is doing. So, I think

1:00:14

that was, that was a really big

1:00:14

part of it, is like getting it

1:00:17

all to work in the browser, because that it's also very easy to share with other people.

1:00:20

It’s

1:00:20

transportable. Yeah. Yeah, that

1:00:23

makes sense. Like, it's funny in

1:00:23

my, in my lab, we're also

1:00:27

developing this portal into our

1:00:27

aphasia data, which is not

1:00:31

released yet. But we will be

1:00:31

working on it for a while. And

1:00:34

just like you, I was like,

1:00:34

originally envisaging standalone

1:00:38

application and I was telling

1:00:38

the SLPs in my lab, who collect

1:00:42

all this data, like, oh, this is what I want to do, I want to like have an application, you

1:00:44

know, you'll install it on your

1:00:46

computer, and then you can look at that data, and they would just looked at me like, what's

1:00:48

wrong with you? You know, like,

1:00:50

you'd have to install an application, like what, you're going to download? How would

1:00:52

you, you know, what is that even

1:00:55

like?

1:00:55

What is that? It’s the 90s? God!

1:00:57

And they convinced me to do it in the web. And yeah, like, that's what

1:00:58

we've done. And it's now built

1:01:02

in JavaScript, and it's going to

1:01:02

be a lot more accessible as a

1:01:05

result.

1:01:06

Nice! Yeah, absolutely.

1:01:08

Okay, cool. So,

1:01:08

let’s move on to the next paper,

1:01:12

if we can. This is by Shailee

1:01:12

Jain and yourself in 2018.

1:01:18

Called ‘Incorporating context

1:01:18

into language and coding models

1:01:20

for fMRI’. It's, I think it's an

1:01:20

underappreciated paper. I mean,

1:01:25

it's got like a bunch of

1:01:25

citations, but I hadn't seen it

1:01:27

before. It's published in, you

1:01:27

know, kind of a CS…

1:01:33

Conference, yeah, NeurIPS.

1:01:35

But very

1:01:35

interesting. Because I think

1:01:39

it's like the first fMRI paper,

1:01:39

fMRI encoding paper that like

1:01:46

actually takes context into

1:01:46

account goes beyond the word

1:01:48

level. Right? Is that right? I

1:01:48

don't think there is anything

1:01:51

previous.

1:01:51

I think so, there had been one CS paper.

1:01:53

A MEG study, by

1:01:53

Leila Wehbe. A few, but that's, MEG.

1:01:58

Yeah. Leila had

1:01:58

done this in Meg in like 2014.

1:02:01

Leila is a close collaborator of

1:02:01

our lab, we do a lot of work

1:02:05

together on this kind of stuff.

1:02:05

There was one other like CS

1:02:10

conference paper, I think, from

1:02:10

like the year before, that, it

1:02:13

was a little bit messy. The

1:02:13

results were kind of mixed. But

1:02:17

yeah, I think we were at least

1:02:17

one of the first to really use

1:02:22

these neural network language

1:02:22

models, which were new and

1:02:24

exciting at the time. So, this

1:02:24

is actually the first paper out

1:02:28

of my lab, like Shailee was my

1:02:28

first grad student. This is her

1:02:31

first project in the lab.

1:02:31

(Laughter) It was to try out

1:02:35

these new things, which are neural network language models. Like, let's see what they do.

1:02:37

We've been doing everything with these like word embedding

1:02:39

vectors before that, which are

1:02:41

great, they're beautiful, you

1:02:41

can reason about them in really

1:02:45

nice ways. You can interpret

1:02:45

them in nice ways. But of

1:02:49

course, they, you know, they're

1:02:49

just words, right? You're

1:02:51

predicting the brain response to

1:02:51

somebody telling a story. But

1:02:55

you're just actually

1:02:55

individually predicting the

1:02:58

response to each word then

1:02:58

saying that, like the response,

1:03:00

the story is the sum of the

1:03:00

responses to the words, which is

1:03:03

just obviously blatantly false,

1:03:03

right? Are those models couldn't

1:03:07

capture the kind of richness of

1:03:07

language. So, this was right at

1:03:11

the time, when language models

1:03:11

were starting to become exciting

1:03:16

in the computer science like

1:03:16

natural language processing

1:03:19

worlds. Right around when these

1:03:19

first language models, ELMo and

1:03:25

BERT came out. And so this is

1:03:25

the days of like Sesame Street

1:03:28

language models, which, people

1:03:28

have found were really

1:03:33

interestingly useful. But if you

1:03:33

just train this neural network

1:03:36

to like, predict what the next

1:03:36

word is in a piece of text, or

1:03:39

predict like a random masked out

1:03:39

word from a piece of text, then

1:03:43

it's kind of forced to learn a

1:03:43

lot of interesting stuff about

1:03:46

how language works.

1:03:47

Yeah, can we

1:03:47

just, yeah, can we just flesh

1:03:50

that out a little bit, because

1:03:50

this is so fundamental to all

1:03:52

this work. And I think

1:03:52

everybody's heard of ChatGPT,

1:03:56

and so on, but I don't know how

1:03:56

many of us, sort of understand

1:04:01

that how much the fundamental

1:04:01

technology is built on this

1:04:03

concept of predicting the next

1:04:03

word. So, can you like just kind

1:04:06

of try to explain that a little

1:04:06

more detail like, what these

1:04:09

models, what's their input?

1:04:09

What's their? I mean, okay, not

1:04:13

that much detail. Obviously,

1:04:13

it's a podcast, not a CS paper.

1:04:17

But, but so what's the input?

1:04:17

What's the output? And then I

1:04:21

guess, why? Why does what's the

1:04:21

architecture?

1:04:25

Yeah. So, it's

1:04:25

pretty simple. You have a big

1:04:31

corpus of text, right? So like

1:04:31

documents that are 1000s of

1:04:34

words long potentially, say all

1:04:34

of Wikipedia, for example, this

1:04:37

we use to train some of these

1:04:37

models. You feed the model, the

1:04:42

words, one by one. That reads

1:04:42

the words and at every step,

1:04:46

every time you feed it a word,

1:04:46

you ask it to guess what the

1:04:49

next word is? And it guesses

1:04:49

like a probability distribution

1:04:52

across all the words of like

1:04:52

what it thinks the next word

1:04:54

might be. In these early models,

1:04:54

we used recurrent neural network

1:05:00

specifically LSTMs, long short

1:05:00

term memory networks, which were

1:05:04

pretty popular as language

1:05:04

models at the time. So, this

1:05:08

network, kind of you feed it

1:05:08

words one at a time, and it

1:05:13

keeps its state, right? So, what

1:05:13

it will sort of produce, what it

1:05:19

computes at each time step is a

1:05:19

function of like, what the word

1:05:22

was that came in, and what its

1:05:22

state was at the previous time

1:05:25

step. So, it combines those two

1:05:25

things to try to guess what the

1:05:28

next word is going to be. So,

1:05:28

this seems kind of elementary,

1:05:33

right? It's pretty simple. You

1:05:33

just guess what the next word

1:05:35

is. But, you know, what, turned

1:05:35

out to be really cool about this

1:05:40

and why people were so excited

1:05:40

about this and natural language

1:05:43

processing world was, you know

1:05:43

that in order to do this

1:05:47

effectively, in order to guess

1:05:47

the next word, accurately, you

1:05:52

need to do a lot of stuff, you

1:05:52

need to know how syntax works,

1:05:55

you need to know about parts of

1:05:55

speech, you need to know a lot

1:05:57

about how semantics works,

1:05:57

right? You need to know like,

1:05:59

which words go together? You

1:05:59

know, what are…

1:06:02

What are the representations of the individual words here? Because

1:06:04

they, you still have to vectorize the individual words,

1:06:05

right?

1:06:08

Yeah. Yeah. So,

1:06:08

so, in this model, there's a

1:06:14

like a word embedding layer.

1:06:14

That's like the initial thing.

1:06:16

So we go from a vocabulary of, I

1:06:16

don't know, 10,000 words, down

1:06:20

to like a 400 dimensional

1:06:20

vector. And we're embedding is

1:06:22

also learned as a part of this

1:06:22

model.

1:06:25

How does it learn that?

1:06:27

Back

1:06:27

propergation. It's a key to all

1:06:30

this. So, you start off with the

1:06:30

words being one-hot vectors.

1:06:38

That's actually a lie. Let me,

1:06:38

let me back up. You start off

1:06:41

with assigning a random

1:06:41

embedding vector for each word.

1:06:44

Oh, okay.

1:06:45

And then those

1:06:45

embedding parameters, just like

1:06:48

the values in that embedding,

1:06:48

you can compute a gradient, you

1:06:52

compute this derivative that's

1:06:52

like, you take the loss at the

1:06:55

end of the model, which is, how

1:06:55

wrong was it in predicting the

1:06:58

next word, and then just take

1:06:58

the derivative of that loss with

1:07:01

respect to these embedding

1:07:01

parameters, and then use that

1:07:05

to, you know, change the embedding parameters a little bit, and you keep doing this for

1:07:06

1000s and 1000s of steps, and it

1:07:09

learns these word embeddings. It

1:07:09

learns like, very effective word

1:07:11

embeddings.

1:07:12

Okay, I didn't

1:07:12

understand that. I had thought

1:07:14

that, that you still had to kind

1:07:14

of put in like, kind of an old,

1:07:19

old school like word embedding

1:07:19

as the first step that you're

1:07:22

saying that's actually part of

1:07:22

the whole architecture that kind

1:07:25

of just comes down from the same

1:07:25

algorithm that gives you the

1:07:29

sensitivity to past words, is

1:07:29

also developing the

1:07:32

representation to each

1:07:32

individual word. Okay, I didn’t

1:07:35

understand that as well.

1:07:36

in the early days, a lot of people did this, where they initialized with,

1:07:38

with preset word embeddings.

1:07:41

But, it was found pretty quickly

1:07:41

that as long as you have

1:07:44

sufficient data to train the

1:07:44

model, things work much better,

1:07:46

if you like, train the word

1:07:46

embeddings at the same time as

1:07:49

the rest of the model.

1:07:50

Okay. Yeah. So,

1:07:50

just I'm sure you've looked at

1:07:55

this. I don't know if it's in any of your papers, but it probably is. But like, if you

1:07:57

use the kind of just getting

1:08:00

away from the contextual aspect

1:08:00

of this paper, if you use word

1:08:03

embeddings that are derived in

1:08:03

this way, does that work much

1:08:06

better than the sort of ones

1:08:06

that you use in your 2016,

1:08:09

Nature paper? Or does it work

1:08:09

much the same?

1:08:12

It's about the same, they end up being actually very similar in a lot of ways.

1:08:16

Okay.

1:08:16

The, the, yeah,

1:08:16

it's interesting. They're,

1:08:20

they're like, a bunch of

1:08:20

different ways of generating

1:08:22

word embeddings. I could geek

1:08:22

out about this for a long time.

1:08:25

But in the in the very old

1:08:25

school word embeddings, like

1:08:30

latent semantic analysis are

1:08:30

generated by looking at how

1:08:34

words co occur across documents.

1:08:34

The newer things like Word2Vec

1:08:38

and GloVe are also looking at

1:08:38

word co-occurrence Word2Vec is

1:08:41

actually like a neural network model was trained to do this. GloVe is just a statistical

1:08:43

thing. The word embeddings that

1:08:46

we used in my 2016 paper were

1:08:46

bespoke thing that I came up

1:08:49

with, but they capture really

1:08:49

the same kind of thing, right?

1:08:54

It was it was just using kind of

1:08:54

predefined dimensions instead of

1:08:58

these, like, learnt dimensions.

1:08:58

And the word embeddings that you

1:09:01

get out of neural network models

1:09:01

are super similar. Like they

1:09:03

they act very similar because

1:09:03

they're, they're just capturing

1:09:06

the same thing, which is right.

1:09:07

So, co-occurrence is a big part of it, right?

1:09:09

Yeah, definitely.

1:09:10

Okay. Okay, so

1:09:10

now I understand that. So, you

1:09:15

start you have random representations of the words that are learned to become

1:09:17

differentiated, kind of based on

1:09:20

character, to represent the

1:09:20

semantics at the end of your

1:09:23

words, and then you've got

1:09:23

hidden layers that are

1:09:27

representing previous words that

1:09:27

have been saying, like, what

1:09:29

kind of range does, do you look

1:09:29

at in this paper? Like, how far

1:09:32

back can it look?

1:09:33

Yeah, um, I

1:09:33

think, we look back only like 20

1:09:37

words, here. So, we're

1:09:37

manipulating like, how many

1:09:41

words go into the model? Like

1:09:41

how many words of context is it

1:09:44

see before, before the current

1:09:44

word? And what we found is it

1:09:47

basically like the more words we

1:09:47

feed in, the better it gets,

1:09:51

right? As we, as it sort of sees

1:09:51

more context, the

1:09:54

representations are better

1:09:54

matched to whatever's happening

1:09:57

in the brain. Our model

1:09:57

predictions get get better

1:10:00

there. So we can use this to

1:10:00

kind of like, look at, in a

1:10:03

course way, context sensitivity

1:10:03

across cortex, like which parts

1:10:06

of cortex, you know, really, you

1:10:06

know, are affected by

1:10:10

information, 20 words back and

1:10:10

which ones maybe only care

1:10:13

about, like what the most recent

1:10:13

word is. So, you know, we see

1:10:17

kind of things that we'd expect

1:10:17

to see, like auditory cortex

1:10:19

only cares about the current

1:10:19

word, for the most part, right?

1:10:22

It's mostly caring about like

1:10:22

the sound of the word. So that's

1:10:24

not unexpected. Whereas areas

1:10:24

like precuneus, TPJ really care

1:10:30

more about, like, what happened

1:10:30

a while back, or maybe some

1:10:34

integration of information

1:10:34

across a couple dozen words.

1:10:38

Right. Yeah. So you can kind of map out the language, well the semantic

1:10:39

network, I guess, I'd say, in

1:10:43

terms of how deep it is, and

1:10:43

like how, how contextually

1:10:48

dependent it is, like going from

1:10:48

single words to longer strings.

1:10:53

And most of these models are

1:10:53

outperforming the single word

1:10:56

models pretty much throughout the brain, right?

1:10:59

Yeah, very

1:10:59

handily, which was very, like

1:11:01

that was just a central exciting

1:11:01

result to us, right, that we'd

1:11:04

had these word embeddings.

1:11:04

Honestly, our word embedding

1:11:06

models had been fixed since

1:11:06

2013, at that point, so it was

1:11:09

like five years of just messing

1:11:09

with word embeddings and finding

1:11:12

nothing that work better, right?

1:11:12

We tried 1000 different

1:11:16

variations of word embeddings.

1:11:16

And like, nothing actually…

1:11:18

I guess. Yeah,

1:11:18

my previous question you just

1:11:20

had basically told me like,

1:11:20

yeah, these state of the art

1:11:22

single word representations

1:11:22

don't really do better than your

1:11:25

bespoke ones from from the 2016,

1:11:25

Paper. So yeah, that so that

1:11:29

wasn't really a, an avenue for

1:11:29

improvement. But this is.

1:11:32

Yeah, yeah. But

1:11:32

this, like, instantly was just

1:11:36

better. Was like head and

1:11:36

shoulders better than the word

1:11:38

embeddings. That was really

1:11:38

exciting, right? This This was

1:11:42

the first time that we were

1:11:42

getting kind of getting it

1:11:45

representations of longer term

1:11:45

meaning, which is something that

1:11:50

of a course, we want to look at…..

1:11:52

Yeah. And

1:11:52

combinatorial meaning

1:11:55

presumably, yeah, it's getting

1:11:55

more and more like, like, as you

1:11:58

go further and further, you're

1:11:58

getting more and more language.

1:12:01

You know? (Laughter)

1:12:02

Exactly. And it

1:12:02

keeps working better better at

1:12:04

predicting the brain. Right?

1:12:04

That number go up….

1:12:06

Yeah. Not

1:12:06

surprising. Yeah. So this is a

1:12:08

very cool paper. Showing, you

1:12:08

know, the how much you gain by

1:12:14

adding context?

1:12:15

Yeah. Yeah. It's

1:12:15

kind of like I think it's like,

1:12:18

kind of known to people in our

1:12:18

like, little fields. But um,

1:12:21

most neuroscientists don't read

1:12:21

these CS Conference papers. So I

1:12:24

think a lot of people just like

1:12:24

haven't, haven't seen it.

1:12:27

No, it has

1:12:27

plenty. Like I said, it has

1:12:29

plenty of citations like well

1:12:29

over 100. But like, I don't

1:12:34

think that, I mean, I hadn't

1:12:34

seen them until I started, like,

1:12:37

you know, looking in more depth,

1:12:37

so that I could talk to you.

1:12:41

That's a shame, because it's really good.

1:12:42

Good. Thank you.

1:12:44

So let's move

1:12:44

now to a paper that is not

1:12:49

currently published, but will be

1:12:49

published by the time people

1:12:51

hear this podcast. This is Tang

1:12:51

et al. And by the time you hear

1:12:56

this, it will be just out in

1:12:56

Nature Neuroscience.

1:13:00

Yep.

1:13:01

Super cool

1:13:01

paper. Can you tell me about

1:13:04

what you've done in this one?

1:13:06

Yeah, yeah. So,

1:13:06

this is, this is our decoding

1:13:09

paper. So, this is, we're no

1:13:09

longer focused on encoding. I'm

1:13:13

just trying to predict how the

1:13:13

brain responds to language.

1:13:16

We're now trying to reverse that

1:13:16

right to take the brain

1:13:18

responses and turn that into

1:13:18

like, what were the words that

1:13:21

the person was hearing?

1:13:23

Okay, you're

1:13:23

reading minds? Basically.

1:13:26

We try to avoid

1:13:26

that term. But, but yeah, same

1:13:29

idea. So, our approach here is

1:13:29

really, it's driven by things

1:13:36

that were done back in Jack

1:13:36

Gallant's lab. So, Shinji

1:13:40

Nishimoto, in particular, who

1:13:40

done this like video decoding

1:13:42

work, he developed this whole

1:13:42

framework for, he and some other

1:13:47

folks there, Thomas Naselaris

1:13:47

and Kendrick Kay in particular,

1:13:50

they develop this framework for

1:13:50

how do you turn an encoding

1:13:53

model into a decoding model?

1:13:53

Right? We know how to build

1:13:55

these very good encoding models.

1:13:55

But if you want to do decoding,

1:13:58

if you want to figure out like, what was the stimulus from brain activity? How do you do that?

1:14:00

And just if you fit a direct

1:14:04

decoding model, where you like,

1:14:04

just try to do regression in the

1:14:07

opposite direction. So, take the

1:14:07

brain data as your input and

1:14:10

your stimulus as the output.

1:14:10

That ends up to not work or be

1:14:14

like very difficult in a number

1:14:14

of ways, mostly to do with sort

1:14:19

of statistical dependence

1:14:19

between things in the stimulus,

1:14:22

like if you're predicting multiple stimulus features, you're not predicting them in a

1:14:25

way that actually respects like

1:14:28

the covariance between those

1:14:28

features. And that ends up being

1:14:30

pretty important for getting

1:14:30

this stuff to work. So, in this

1:14:35

paper, we use this kind of

1:14:35

Bayesian decoding framework that

1:14:38

they developed. The basic idea

1:14:38

is, you just kind of guess. So,

1:14:43

we guess like what might the

1:14:43

stimulus be? What words might

1:14:45

the person have heard? And then

1:14:45

we can check how good that

1:14:47

guesses by using our encoding

1:14:47

model. So, in this paper, we had

1:14:51

you know, this, we've had a

1:14:51

couple of years of advancement

1:14:54

in language models. Just an

1:14:54

insanely rapidly developing

1:14:59

fields right now. So, when we

1:14:59

started working on this decoding

1:15:04

stuff, we were using GPT, like

1:15:04

the original OG GPT, from 2018,

1:15:10

2019. And that's what we ended

1:15:10

up like, that’s in the published

1:15:13

paper. Of course, there's, you

1:15:13

know, things have changed a lot

1:15:17

in the intervening years, but it

1:15:17

still is good enough for, for

1:15:20

this to work. So, these GPT

1:15:20

based encoding models works

1:15:24

terribly well. It's doing more

1:15:24

or less the same thing as the

1:15:27

language models in Shailee’s

1:15:27

paper, in fact, she developed

1:15:30

these GPT encoding models.

1:15:33

Yeah, can we

1:15:33

just pause and get a little

1:15:35

detail on that? So, you know,

1:15:35

everybody's heard of ChatGPT,

1:15:40

but can you sort of explain like

1:15:40

(a) what does it stand for and

1:15:45

(b) how does it differ from

1:15:45

like, what's the crucial

1:15:49

difference between that and the

1:15:49

long short term memory models

1:15:54

that you used in the 2018, paper?

1:15:56

Yeah, so GPT is

1:15:56

a generative pre trained

1:16:00

transformer. That was its

1:16:00

original moniker. And the basic

1:16:05

idea is, it's using this

1:16:05

different architecture. So, it’s

1:16:08

no longer using a recurrent

1:16:08

neural network. It's using a

1:16:10

network called a transformer

1:16:10

that was invented in 2017.

1:16:15

By Google, right?

1:16:16

Yeah , yeah.

1:16:16

Ashish Peswani, who is actually

1:16:19

an author on Leila’s RNN

1:16:19

predicting MEG paper back in

1:16:23

2014, which is an interesting

1:16:23

detail. Or he was like one of

1:16:29

the one of the authors on the

1:16:29

original transformer paper too.

1:16:31

So, transformers, they use very

1:16:31

different mechanisms than

1:16:38

recurrent neural networks. They

1:16:38

use what we call an attention

1:16:42

mechanism, the self attention

1:16:42

mechanism, were essentially to

1:16:45

try to build up some

1:16:45

representation of each word or

1:16:48

each token in an input. The

1:16:48

model can like attend across its

1:16:53

inputs attend across its

1:16:53

context. So it can pick out you

1:16:55

know, information from many

1:16:55

words that have come before and

1:16:59

use that to inform what it's

1:16:59

doing right now. And, you know,

1:17:04

what's, what's really different

1:17:04

about this, compared to

1:17:08

recurrent neural networks, is

1:17:08

that transformers don't have the

1:17:12

kind of limited memory capacity

1:17:12

that RNNs have. And I think

1:17:15

that's that's really one of the fundamental things that makes them work so darn well, for so

1:17:16

many things. You know, the

1:17:21

recurrent neural network, you

1:17:21

feed it one word at a time, and

1:17:24

then it has to pack information

1:17:24

into its internal state. So, it

1:17:28

maybe has like 1000 dimensions

1:17:28

of like internal states, right?

1:17:30

That are like, that's its entire

1:17:30

memory that's like everything it

1:17:33

knows is in those 1000

1:17:33

dimensions. And you know, if you

1:17:36

feed it hundreds of words, it

1:17:36

has to, you know, if it wants to

1:17:39

remember something about those

1:17:39

hundreds of words, it has to

1:17:41

pack it all into that 1000

1:17:41

dimensional vector.

1:17:44

Right.

1:17:44

So, it's hard to do, it's hard to do. And especially because the kind of

1:17:46

supervisory signals at long

1:17:48

ranges end up being very sparse

1:17:48

in language like, it's rare for

1:17:52

something 200 words ago to

1:17:52

really influence like what the

1:17:55

next word is in a piece of text.

1:17:55

It's very important when it

1:17:58

does, but it's it's pretty rare.

1:17:58

So, that ends up being kind of

1:18:02

too weak a signal for RNNs to

1:18:02

really pick up on it. But these

1:18:05

transformer models, they can

1:18:05

just kind of arbitrarily look

1:18:07

back, they can say like, you

1:18:07

know what, from anything in my

1:18:11

past was relevant to this thing

1:18:11

that I'm looking at right now.

1:18:13

And then just pick that thing

1:18:13

out and use it. And that means

1:18:17

that it doesn't have this

1:18:17

limited capacity, in the same

1:18:19

way. It has a like much greater

1:18:19

memory capacity, working memory

1:18:23

capacity, effectively. Which

1:18:23

just makes it like incredibly

1:18:29

powerful at doing these things.

1:18:29

There's also other reasons why

1:18:32

transformers have kind of taken

1:18:32

over this world now. They end up

1:18:36

being extremely efficient to

1:18:36

train and run on our current GP

1:18:42

hardware, which is kind of like

1:18:42

a weird reason why this model

1:18:45

will be very good. But it's a

1:18:45

you know, a technical reason why

1:18:48

people could train much bigger

1:18:48

transformer models much more

1:18:51

effectively than like big RNN

1:18:51

models. They've really like

1:18:55

taken over this world now.

1:18:57

Okay, now that was really useful.

1:18:59

I teach a class on neural networks in the computer science department

1:19:01

here. I just finished that like

1:19:03

last week. And our last module

1:19:03

was on transformers, which was a

1:19:08

lot of fun. So we talked about how transformers work.

1:19:10

Okay, I would

1:19:10

like to, I'd like to order that.

1:19:14

It's fun. It's fun class.

1:19:15

Yeah. I bet.

1:19:15

Okay. So let's start. Yeah,

1:19:20

let's get back to your study.

1:19:20

So, you know, how are you using

1:19:25

using GPT models instead of the

1:19:25

RNNs. And yeah, you were just

1:19:30

telling me, you're about to tell

1:19:30

me about well you were telling

1:19:34

me about the challenges of

1:19:34

encoding rather than decoding,

1:19:36

right?

1:19:37

Yeah, yeah. So

1:19:37

um, you know, we replaced the

1:19:40

RNNs with these GPT models, the

1:19:40

transformer based models, again,

1:19:43

we get a big boost in encoding

1:19:43

performance, we can predict the

1:19:46

brain much better. But now what

1:19:46

we're going to do is try to

1:19:50

reverse this do the decoding.

1:19:50

So, we are using this Bayesian

1:19:55

decoding approach where

1:19:55

essentially we're just like

1:19:57

guessing sequences of words and

1:19:57

then for any guessed sequence,

1:20:01

we can use our encoding model to

1:20:01

say like, how, how well does

1:20:05

this match the actual brain data

1:20:05

that we see? Right? So, we get a

1:20:08

sequence of words, we predict

1:20:08

how the brain would respond to

1:20:10

that sequence of words and then

1:20:10

we compare that prediction to

1:20:13

the actual brain response that we observe.

1:20:15

Yep.

1:20:16

Like, this is

1:20:16

kind of the core loop in this

1:20:18

method. And then….

1:20:20

it's called a

1:20:20

beam search algorithm, right?

1:20:23

Yeah. So beam

1:20:23

search, is really like we keep

1:20:25

multiple guesses sort of active

1:20:25

at a time. We guess like, what's

1:20:29

the next word in each of these

1:20:29

multiple guesses, and then we

1:20:31

throw out like the worst ones,

1:20:31

but we keep this sort of active

1:20:34

set of 20 to 100 different like

1:20:34

current hypotheses for what the

1:20:40

what the text was that we're

1:20:40

trying to decode. This ends up

1:20:43

being kind of important because

1:20:43

it helps us correct for the the

1:20:47

sluggishness of the hemodynamic

1:20:47

response function, which is one

1:20:49

of the real challenges in doing

1:20:49

this kind of decoding, right? So

1:20:53

we're trying to pick out, you

1:20:53

know, the language that somebody

1:20:56

is hearing. Lots of words happen

1:20:56

in language, right? Like words

1:20:59

can happen pretty quickly. And

1:20:59

with fMRI, like one snapshot

1:21:03

brain image is summing across,

1:21:03

like 20, 30 words, maybe if

1:21:07

somebody's speaking at a pretty

1:21:07

rapid clip.

1:21:10

So doing this

1:21:10

beam search, where we, we have

1:21:10

Yeah. multiple hypotheses. So that

1:21:16

means the model can kind of the

1:21:19

model can make a mistake, and

1:21:19

then kind of go back and correct

1:21:21

it, right? Because it's not

1:21:21

locked into like one best guess.

1:21:26

That ends up being really

1:21:26

important for like, being able

1:21:30

to correct for the fact that it

1:21:30

has this slow sort of

1:21:35

information that it can get

1:21:35

something at first and then see

1:21:38

later information that can make

1:21:38

it sort of update what happened

1:21:40

before. Right. Can we,

1:21:41

can we talk about the, sorry,

1:21:47

I'm just trying to like, think

1:21:47

about how to, to make all this

1:21:51

clear? Can we talk about the

1:21:51

sort of structure of the

1:21:56

experiment, because I think it

1:21:56

will really help to understand

1:21:58

like, what the participants do,

1:21:58

and then you know, what the task

1:22:03

that you set yourself in terms

1:22:03

of decoding their brains like,

1:22:06

because then I think the

1:22:06

mechanisms will then make more

1:22:08

sense. Do you know what I mean?

1:22:10

Yeah,

1:22:10

absolutely. So the basic

1:22:14

experiment that we do is just

1:22:14

the same that we've been doing

1:22:16

before we have people laying in

1:22:16

the scanner and listen to

1:22:19

podcasts, mostly the Moth,

1:22:19

still.

1:22:21

You really should have them listen to the language neuroscience podcast.

1:22:25

Oh, yeah, that'd be good.

1:22:25

I think it would

1:22:25

work a lot better. But in this

1:22:29

one, you've got them doing it

1:22:29

for 16 hours, right?

1:22:32

Yeah. Yeah. So…

1:22:33

That’s a lot of data. Okay. So just to

1:22:34

make it clear, you train them on

1:22:34

We’re making

1:22:34

bigger datasets, which ends up

1:22:37

being really important when we're looking at something high level, like semantics, right? So

1:22:39

if we were just trying to build

1:22:42

models of like phonetic

1:22:42

representation, there's only

1:22:46

like 40 phonemes in English, you

1:22:46

know. You can hear them in many

1:22:49

combinations, but like, you only

1:22:49

need sort of so much variability

1:22:49

like, 16 hours of data, build

1:22:49

these models, and then you take

1:22:52

in your data to kind of map out

1:22:52

a phonetic representation. TIMIT

1:22:55

is great for this right, the

1:22:55

TIMIT corpus, it's like every

1:22:58

phoneme in many different

1:22:58

combinations. So you can get

1:23:00

good stuff from TIMIT. This is

1:23:00

what like Eddie Chang's lab does

1:23:03

a lot of. But there's a lot more

1:23:03

different kinds of semantics,

1:23:07

right? There's a lot more different kinds of ideas that can be expressed there, you can

1:23:09

think a lot of different kinds

1:23:11

of thoughts, right? So, to kind

1:23:11

of map this out in detail, you

1:23:18

need to go much deeper, you need

1:23:18

to go like much broader in terms

1:23:21

of what you look at. So that's

1:23:21

why we just keep adding more

1:23:25

data. In this case, yeah, we had

1:23:25

people come back more than a

1:23:29

dozen times, over the course of

1:23:29

months, we just keep scanning

1:23:32

them over and over again, each

1:23:32

time listening to like new

1:23:35

stories, different stories,

1:23:35

right? So we see like, different

1:23:38

aspects of these ideas and how

1:23:38

they're represented in these

1:23:40

people's brains. And what we

1:23:40

find is that, like, encoding,

1:23:45

performance really relies on the

1:23:45

amount of data, especially for

1:23:47

these like semantic encoding

1:23:47

models, like as you increase the

1:23:49

amount of data, it just keeps

1:23:49

getting better. The same for

1:23:49

like, let's say, 10 minutes of

1:23:49

data. I don't know exactly how

1:23:52

this decoding performance. So

1:23:52

just keeps getting better as we

1:23:55

add more data. But so, you know,

1:23:55

we have this ton of data of

1:23:59

people listening to stories, we

1:23:59

can build our encoding models,

1:23:59

much it is, but something small. Exactly. Yeah.

1:24:01

10. that's well and good. You know,

1:24:01

our initial tests of the decoder

1:24:06

were basically just like, we had

1:24:06

somebody, you know, listen to a

1:24:09

story. That's our test story

1:24:09

that we use for predicting brain

1:24:13

responses. And we just tried to

1:24:13

decode the words in that story

1:24:16

instead of, you know, predict

1:24:16

the brain responses to that

1:24:19

story. And eventually, through

1:24:19

like quite a bit of trial and

1:24:23

error and figuring out what were

1:24:23

the important aspects here, we

1:24:23

And you then

1:24:23

basically feed the model the

1:24:28

got that to work pretty well. We

1:24:28

were pretty excited by this. It

1:24:30

was, uh, you know, it started

1:24:30

spitting out you know, not like

1:24:36

exactly the words in the story,

1:24:36

but, like a pretty decent

1:24:39

paraphrase of what the words in

1:24:39

the story were.

1:24:55

brain data from the person

1:24:55

listening to this unseen story

1:24:59

and you try and get the model to

1:24:59

generate what the story was that

1:25:03

the person heard. Right?

1:25:05

Yes.

1:25:06

So in other words, the only way that's going to work is if you're, I know you

1:25:08

hate to use the phrase, but

1:25:11

like, if you're, you have to

1:25:11

read their mind, because the

1:25:13

model has no access to what

1:25:13

story they were played. So, the

1:25:15

only way the model is gonna know

1:25:15

the story is by reading their

1:25:17

mind.

1:25:19

Is it reading their brain? Like, we don't know where the mind is? It's

1:25:21

somewhere near to the brain. Definitely reading what's

1:25:22

happening in the brain.

1:25:26

It's the brain.

1:25:26

As you know. It's the same

1:25:30

thing. Okay. So yeah, so it

1:25:30

starts, you have some success

1:25:35

with that.

1:25:36

Yeah, yeah. So

1:25:36

um, there was kind of a

1:25:38

startling moment. I think this was during the pandemic. We're all like working at home. And

1:25:40

Jerry showed me some results that were like, Oh, my God,

1:25:41

this, this works. This is like giving us things that sounds

1:25:43

like the story. It's actually

1:25:52

pretty accurate at this point.

1:25:52

This is very exciting to us.

1:25:56

Right? So you know, we can now

1:25:56

decode a story that somebody is

1:26:00

hearing, right, which is kind of

1:26:00

step one. Like that's, that's

1:26:03

interesting by itself. But

1:26:03

that's not even, really

1:26:05

potentially that useful. So at

1:26:05

that point, we went back and did

1:26:09

some follow up experiments. So

1:26:09

we took the same subjects that

1:26:11

we've been scanning and…

1:26:13

Oh, hang on. Can

1:26:13

we can we like, talk about that

1:26:16

result from your paper? Can we kind of just share it with our listeners?

1:26:19

Yeah. Absolutely.

1:26:20

We're talking figure one here, right?

1:26:22

Yes.

1:26:22

Okay. So, you

1:26:22

say it's not that interesting. I

1:26:25

think it's very interesting.

1:26:25

(Laughter) Okay, I'm gonna say

1:26:29

the actual stimulus that the

1:26:29

subject heard, and you're going

1:26:33

to tell me the decoded stimulus

1:26:33

that your model produced based

1:26:37

on reading their mind, or whatever you want to call it. Okay. I got up from the air

1:26:39

mattress and press my face

1:26:43

against the glass of the bedroom

1:26:43

window, expecting to see eyes

1:26:46

staring back at me. But instead

1:26:46

of finding only darkness,

1:26:50

I just continued

1:26:50

to walk up to the window and

1:26:52

open the glass. I stood on my

1:26:52

toes and peered out. I didn't

1:26:56

see anything and looked up

1:26:56

again. I saw nothing.

1:26:59

Wow. Okay, let's

1:26:59

do some more. This is good. I

1:27:03

didn't know whether to scream or

1:27:03

cry or run away. Instead, I

1:27:07

said, leave me alone. I don't

1:27:07

need your help. Adam

1:27:09

disappeared, and I cleaned up

1:27:09

alone crying.

1:27:13

started to

1:27:13

scream and cry. And then she

1:27:15

just said, I told you to leave

1:27:15

me alone. You can't hurt me. I'm

1:27:19

sorry. And then he stormed off.

1:27:19

I thought he had left. I started

1:27:22

to cry.

1:27:24

Let's do one

1:27:24

more. That night, I went

1:27:27

upstairs to what had been our

1:27:27

bedroom and not knowing what

1:27:29

else to do. I turned out the

1:27:29

lights and lay down on the

1:27:32

floor.

1:27:34

We got back to

1:27:34

my dorm room. I had no idea

1:27:36

where my bed was. I just assumed

1:27:36

I would sleep on it. But instead

1:27:39

I lay down on the floor.

1:27:42

That's pretty

1:27:42

amazing. You know…

1:27:44

Can we do the last one? The last one the last one always used as demo.

1:27:47

Okay, last one.

1:27:47

I don't have my driver's license

1:27:51

yet. And I just jumped out right

1:27:51

when I needed to. And she says,

1:27:54

well, why don't you come back

1:27:54

to my house and I'll give you a

1:27:56

ride. I say okay.

1:28:00

She's not ready. She has not even started to learn to drive yet. I had to

1:28:02

push her out of the car. I said,

1:28:05

we will take her home now. And

1:28:05

she agreed.

1:28:07

It's incredible.

1:28:08

Right? It

1:28:08

actually works. We're getting

1:28:10

this out of fMRI data. fMRI,

1:28:10

which is like the worst of all

1:28:13

neuroimaging methods like except

1:28:13

for all the other ones.

1:28:16

Except it is the best. Yeah. Okay.

1:28:17

It is awful in

1:28:17

so many ways, and yet, we're

1:28:21

getting out like, it's not word

1:28:21

for word. In fact, the word

1:28:24

error rate is god awful, right?

1:28:24

Sorry, my dog is excited about

1:28:28

something. In the word error

1:28:28

rate is like 94%. Here, it's not

1:28:32

getting the exact words for the

1:28:32

most part.

1:28:35

It's getting the

1:28:35

gist, right? It’s getting the

1:28:35

Sure. paraphrase of what’s happening.

1:28:40

And you have

1:28:40

some, you know, some kind of

1:28:42

intuitive ways of quantifying

1:28:42

how well it's doing that don't

1:28:45

rely on it being word for word

1:28:45

match. And it's, it's all quite

1:28:48

intuitive and explained well on

1:28:48

the paper.

1:28:52

Yeah, yeah. So

1:28:52

this was I mean, very exciting.

1:28:55

When we saw this, you know, we

1:28:55

could read out the story that

1:29:00

somebody was hearing, and kind

1:29:00

of the fact that it was a

1:29:03

paraphrase was also interesting

1:29:03

to us that it's like, you know,

1:29:05

we're not getting it. It really

1:29:05

seems like some low level

1:29:09

representation or getting

1:29:09

something high level, right?

1:29:11

And you wouldn't

1:29:11

with fMRI, right? Like, I mean,

1:29:13

like, maybe with Eddie Chang's

1:29:13

data, you could read the

1:29:16

phonemes and get it that way.

1:29:18

Right. Which they do that beautifully.

1:29:20

You're never, you're never gonna be able to do that with fMRI.

1:29:22

Yeah. Yeah. But

1:29:22

the the ideas, right, like,

1:29:27

what's the what's the thought

1:29:27

behind the sense? That probably

1:29:32

like changes slowly enough that

1:29:32

we can see it. It's kind of

1:29:35

isolated with fMRI. Right? The

1:29:35

individual words, they're all

1:29:37

mashed up. That's, that's a

1:29:37

that's a mess. But, uh, you

1:29:40

know, each idea kind of evolves

1:29:40

over a few seconds and that's

1:29:44

something that we have a hope of

1:29:44

pulling out with fMRI.

1:29:47

Yeah. Okay, so

1:29:47

very cool. And then you take it

1:29:52

in a lot of other directions

1:29:52

from there. Which one should we

1:29:57

talk about? You Let's talk about

1:29:57

Okay. Do you need the whole

1:30:03

brain to do this? Or can you do

1:30:03

this with just parts of the

1:30:06

brain?

Rate

Get this podcast via API

From The Podcast

The Language Neuroscience Podcast

A podcast about the scientific study of language and the brain. Neuroscientist Stephen Wilson talks with leading and up-and-coming researchers about their work and ideas. This podcast is geared to an audience of scientists who are working in the field of language neuroscience, from students to postdocs to faculty.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Download Audio Filehttps://www.buzzsprout.com/1573849/12756071-encoding-and-decoding-semantic-representations-with-alexander-huth.mp3

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More