Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:05
Welcome to
0:05
Episode 25 of the Language
0:08
Neuroscience Podcast. I'm
0:08
Stephen Wilson. Thank you for
0:11
listening. It's been a while
0:11
since I recorded an episode.
0:14
Sorry for the delay. But I'm
0:14
very excited to be back to the
0:16
podcast and happy to say that
0:16
the first episode back is a
0:19
really great one. I've got some
0:19
more guests lined up too. So
0:21
there shouldn't be such a long
0:21
wait for the next episodes. To
0:25
make a long story short, the
0:25
reason I didn't record any
0:27
episodes lately is that I moved
0:27
back to Australia with my family
0:30
at the start of this year. There
0:30
has been a lot going on. I moved
0:34
to the US when I was twenty
0:34
three with nothing but two
0:36
suitcases. It was easy. I
0:36
thought nothing of it. But
0:39
moving with my whole family at
0:39
this stage of life is a whole
0:41
other thing. We're now living in
0:41
Brisbane, which is where my
0:44
parents and many extended family
0:44
members live. And it's lovely to
0:46
see them all the time. Instead
0:46
of having to wait years at a
0:49
time. I started a new position
0:49
at the University of Queensland,
0:53
where I'm in the School of Health and Rehabilitation Sciences. There are great new
0:54
colleagues here who I'm enjoying
0:58
getting to know and I'm looking
0:58
forward to developing some new
1:00
lines of research. I need to
1:00
build up a new lab. So if you're
1:04
a potential grad student or
1:04
postdoc, please don't hesitate
1:06
to get in touch. I really miss
1:06
my friends and colleagues at
1:09
Vanderbilt, which was an awesome
1:09
place to work. But fortunately,
1:12
we're continuing to collaborate
1:12
on our Aphasia Recovery Project,
1:15
even though I'm living on the
1:15
other side of the world. Okay,
1:18
on to the podcast. My guest
1:18
today is Alex Huth, Assistant
1:22
Professor of Neuroscience and
1:22
Computer Science at the
1:24
University of Texas, Austin.
1:24
Alex uses functional imaging and
1:28
advanced computational methods
1:28
to model how the brain processes
1:31
language and represents meaning.
1:31
He's done a series of extremely
1:35
elegant studies over the last 10
1:35
or so years. And we're going to
1:37
talk about a few of the
1:37
highlights, including a really
1:40
cool study that is just about to
1:40
come out in Nature Neuroscience
1:43
with first author Jerry Tang, by
1:43
the time you hear this, it will
1:46
be out. Okay, let's get to it.
1:46
Hi, Alex. How are you today?
1:51
Hey, Stephen. I'm doing well. Thanks for having me on your podcast.
1:54
Yeah. Well, thanks very much for joining me, and I'm pretty sure that we've
1:55
never met before, right?
1:58
I don't think so. No.
2:00
Yeah. I mean, do
2:00
you go to a lot of conferences?
2:02
I mean, you
2:02
know, in the before times, I
2:05
went to like, SFN frequently and
2:05
SNL every now and then.
2:10
Ok.
2:12
But yeah, hoping
2:12
to start going back again. But
2:14
uh, yeah.
2:15
Yeah, I also
2:15
haven't been to many for a few
2:17
years and when I did, it would
2:17
mostly be SNL, and usually only
2:21
when it was in the US. So, I'm
2:21
not very much out and about, you
2:26
know. And yeah, so where are you
2:26
joining me from today?
2:30
I'm in Austin,
2:30
Texas.
2:32
Yeah.
2:33
Sunny Austin.
2:34
It looks nice. I
2:34
can see outside your window and
2:36
it looks like a beautiful day.
2:37
It is. Spring
2:37
has sprung. Allergens are in the
2:40
air, you know.
2:41
Right. Yeah. I’m
2:41
now in Brisbane, Australia. And
2:44
it's also a beautiful day here.
2:44
First thing in the morning and
2:49
it's apparently fall, which we
2:49
call autumn. I'm just trying to
2:53
get my head back around the
2:53
Aussie lingo. But you know, all
2:56
the seasons are pretty much the same here.
2:59
Lovely. That's that's how I like it. I grew up in Southern California. That's
3:01
part of the course there.
3:03
All right. So you don't know anything about weather?
3:05
I've learned a
3:05
little bit in Texas, which kind
3:07
of surprised me. But yeah.
3:09
Yeah. Cool. So,
3:09
I always like to kind of start
3:13
talking with people by learning
3:13
about like, how they came to be
3:15
the kind of scientist that they
3:15
are and like, basically
3:18
childhood interests and how that
3:18
led you to where you are. So,
3:21
you know, were you were nerdy
3:21
kid? What were you into when you
3:23
were a kid?
3:24
A very nerdy
3:24
kid. I don't know. That's,
3:25
Okay, so you're you went to
3:25
undergrad at Caltech. Is that
3:26
that's how it goes. Right? I was
3:26
very into Star Trek. I was like
3:31
The Next Generation. That was,
3:33
that was my jam. Especially
3:33
Data. Like, I loved Data. He was
3:40
he was just, I don't know, a
3:40
weird robot hero to me. So, I
3:45
don’t know, I got this like kind
3:45
of fascination with artificial
3:48
intelligence based on that, and
3:48
just generally liking science
3:51
fiction. But when I was starting
3:51
college, I was kind of in a
3:59
slump. It wasn't a it wasn't a
3:59
bad period. This is like the
4:01
early 2000s. So, you know, I
4:01
sort of looked around and it
4:09
seemed like one of the really
4:09
interesting ways forward in if
4:13
we want to build machines that
4:13
think like humans is to figure
4:18
out how humans think. So, I
4:18
started getting into
4:20
neuroscience. I remember the the
4:20
first neuroscience talk I ever
4:24
saw was Mark Konishi describing
4:24
the Binaural auditory
4:29
localization circuit and I was
4:29
like, this is really cool. I
4:32
want to, I want to do this kind
4:32
of stuff. So, I started sort of
4:35
getting into, interested in
4:35
neuroscience through that.
4:40
Yeah, yeah. That’s right.
4:40
right?
4:42
Okay. And is
4:42
that close to where you're from?
4:45
Like you mentioned Southern California.
4:46
Yeah. Yeah. So I
4:46
grew up a little bit north of
4:49
Los Angeles in a little town
4:49
called Ojai, California.
4:51
Oh, right. Yeah.
4:51
My wife is from Thousand Oaks,
4:54
so I do know that area. Yeah.
4:56
Right next door. Yeah.
4:57
Yeah.
4:58
Yeah, so
4:58
Pasadena was like a little ways
5:00
down the road. And Caltech is a
5:00
good school. I was excited to go
5:04
there. So yeah, I like started
5:04
doing neuroscience stuff. But I
5:05
Yeah. really kind of started enjoying
5:08
things when I started doing
5:12
research. So, that was in
5:12
Christoph Cox lab, when he was
5:16
with the Caltech, was working
5:16
with blind subjects, we were
5:20
looking at auditory motion
5:20
perception and blind subjects,
5:23
which was exciting and
5:23
interesting and ended up being
5:27
like a general paper in 2008.
5:31
And then from
5:31
there, there was, there was one
5:34
real moment that was like a sort
5:34
of eye opener, like, changed
5:39
things for me was, Jack Ellis,
5:39
who I ended up doing my PhD
5:43
with. He came to Caltech, and he
5:43
gave a talk, and it was about
5:48
vision. It was about like, V2.
5:48
what does the V2 do? How do we
5:51
model V2. But the thing that he
5:51
really talked about was, was
5:57
this approach of just record
5:57
what the neurons do when they
6:04
see natural stimuli. Right, show
6:04
images. Show images that
6:07
actually are things that, you
6:07
know, an animal might see,
6:10
record what neurons do and then
6:10
build models from that. Try to
6:13
figure out, you know, if you get
6:13
1000s of these images, 1000s of
6:17
responses from these neurons,
6:17
can you figure out like, what is
6:20
it about the image that makes
6:20
this neuron fire, and something
6:23
about, like, just that
6:23
perspective of doing things, not
6:28
from the very like, controlled
6:28
experiment, kind of deductive
6:35
approach, but doing this, this
6:35
inductive thing, where you just
6:38
kind of lean on computational
6:38
modeling, and say, like, I'm
6:40
going to let the model figure
6:40
out, like, what this neuron is
6:43
actually doing. I, I just became
6:43
like, insanely excited about
6:47
this. So when I was applying to
6:47
grad schools, Berkeley and
6:52
working with Jack was like, one
6:52
of my kind of top interests.
6:55
Yeah. Okay. So,
6:55
you know, I definitely have
6:57
noticed that like you, you've
6:57
leaned in very much to natural
7:00
stimuli throughout your career.
7:00
You even have a paper on it
7:03
with, with Liberty Hamilton,
7:03
specifically about that, but
7:06
like, so that was really
7:06
actually a driving force for you
7:09
from the very beginning, right?
7:09
And so, you know, what did you
7:12
see as the advantages? And was
7:12
there anything you were worried
7:15
about giving up as in like, kind
7:15
of, well, you weren't really
7:18
getting away from controlled
7:18
designs, because you never did
7:21
controlled designs? But like,
7:21
I'm sure you knew about the, you
7:24
know, the literature and what
7:24
were you worried about giving up
7:27
anything as you moved in this
7:27
new direction?
7:32
had been like,
7:32
kind of familiar with this, and
7:43
learning how to do things there.
7:43
But just the, the idea of this
7:48
natural stimulus thing, where
7:48
it's like, the work of doing the
7:53
science of like figuring out
7:53
what's going on, you kind of
7:56
move it from one place to
7:56
another, or you move it from the
7:58
experimental design kind of
7:58
phase into this modeling phase.
8:02
And I just really liked that
8:02
idea. Of course, you know, there
8:07
are definitely things we give
8:07
up, right? So there are
8:10
correlations and natural stimuli
8:10
and those are hard to break. And
8:15
sometimes you can't tell like
8:15
what is causing your actually
8:19
this, this response. And
8:19
sometimes you have to go in and
8:24
like do kind of focused
8:24
experiments to break those
8:26
correlations, and then figure
8:26
out like, what is actually
8:28
responsible for, you know, what
8:28
this brain area is doing, or
8:31
what this neuron is doing or
8:31
whatever. But overall, the this
8:36
idea of just kind of, like
8:36
replacing this elaborate, like,
8:41
you know, let's test one little
8:41
thing at a time with the big
8:45
picture, like, let's just see
8:45
how the whole system works. And
8:47
then, you know, kind of let it
8:47
sort itself out as we as we
8:50
figure out how to model it. I
8:50
just really like resonated with
8:53
me. I love that idea.
8:54
Yeah, well, it's
8:54
worked out really well. So let's
8:57
kind of give our listeners a
8:57
more concrete idea about what
9:01
you're talking about when you're
9:01
talking about these experiments
9:05
using natural stimuli. And like
9:05
I told you an email is the key,
9:09
we kind of go through some of
9:09
your work chronologically,
9:13
because I do think it build,
9:13
each paper kind of builds on
9:17
the, each big paper kind of
9:17
builds on the previous ones in a
9:21
very satisfying way. So, though
9:21
we might start with them a paper
9:25
which appears to be your
9:25
dissertation research, which is
9:29
2012, published in Neuron. Is
9:29
that right?
9:32
Yep.
9:32
And it's called
9:32
a ‘Continuous semantic space
9:37
describes the representation of
9:37
thousands of objects and action
9:40
categories across the human
9:40
brain’. It's not quite a
9:42
language paper, but that's okay.
9:42
Like it’s, it’s, it’s semantics
9:45
and that's, that's close enough
9:45
for me.
9:48
Yeah. Yeah. So I
9:48
can give you kind of the
9:50
backstory of this. So when I
9:50
joined Jack's lab, this is like
9:54
2009. I thought I was going to
9:54
be doing vision. I was
9:58
interested in vision. I was
9:58
like, this is, this is where the
10:00
cool stuff is happening. It’s,
10:00
it’s the vision world. I thought
10:03
like, you know, the interesting
10:03
problem is this mid-level vision
10:06
problem. Like, we know how V1
10:06
works. We know, you know that
10:08
there's FFA and PPA and whatever
10:08
the high level visual areas, but
10:12
like what's happening in the middle, like, what are the transformations that actually
10:14
get us there? But Jack sat me
10:18
down, like when I, when I joined
10:18
the lab, this was just after the
10:24
Tom Mitchell's science paper had
10:24
come out. This 2008 paper, where
10:28
they were building these
10:28
encoding models in a lot of the
10:32
same way as we do, for words. So
10:32
they had shown people individual
10:36
words, and then they had these
10:36
feature vectors for the words
10:38
and they were you know,
10:38
predicting the brain responses
10:41
to the words and so on. And I
10:41
think Jack had seen this and was
10:45
really excited by this. And so
10:45
he sat me down when I first
10:48
joined the lab, and he said, you
10:48
know, I have a plan for you. I
10:53
want you to work on language.
10:53
And I said wait, I don't know
10:55
anything about language like
10:55
I've never done this before. And
10:58
he said, Okay, so we have a good
10:58
collaborator, Tom Griffiths, he
11:01
knows a lot about language,
11:01
you're going to learn this
11:03
stuff, and you're going to do
11:03
it. And I said, Okay, fine,
11:06
let's do it. So, we started kind
11:06
of going down that road, and
11:10
like designing language
11:10
experiments, which were a lot
11:13
crummier than what we ended up
11:13
eventually publishing. But sort
11:18
of along that road, we started
11:18
trying to build models of word
11:24
representation, that was kind of
11:24
what we were interested in at
11:27
the time was like, these really
11:27
like word embedding models. And
11:30
it was early days of word
11:30
embedding models, were mostly
11:33
focused on things like LSA, its
11:33
latent semantic analysis, which
11:36
is, like 1989, or something like
11:36
this right? Sort of classic word
11:41
embedding models. So, I was
11:41
building these models and I was
11:46
collecting corpora of text and
11:46
doing this kind of early
11:48
computational linguistic stuff.
11:48
And then, we didn't have good
11:53
fMRI data for that yet. So we
11:53
thought like, how do I actually
11:58
test this? How do I figure it
11:58
out? Figure out if it's actually
12:00
working. And what we did have in
12:00
the lab was a lot of data from
12:04
people watching movies. Data
12:04
from people like looking at
12:07
images. So we just
12:08
thought, like hey, let's, let's
12:08
Yup. try this out on on somebody
12:10
watching a movie. Like, you
12:13
know, if we're looking at
12:13
images, right, we have some
12:15
labels of the things that are in
12:15
these images. Let's just plug
12:18
that into a word embedding model
12:18
and see if it works. And it
12:21
worked really well, it turned
12:21
out. It was it was quite
12:23
effective. And so this kind of
12:23
led, led me down this alternate
12:27
path of like, fitting these
12:27
kinds of models, we ended up
12:31
actually kind of changing things
12:31
up quite a bit instead of word
12:33
embeddings, we were using these
12:33
WordNets labels, which are…
12:38
Yeah, I mean,
12:38
because in that first paper,
12:42
it's not trivial how you would
12:42
go from these movies that you
12:44
had people watching these little
12:44
movie clips, to, you know, your
12:48
semantic representations, right?
12:48
I mean, that's actually not
12:50
trivial. Can you, can you can
12:50
you sort of describe how that
12:54
works, just so that people can
12:54
understand the, the paper?
12:58
Yeah, yeah. So,
12:58
the experiment was we had people
13:01
watch a little more than two
13:01
hours of natural video, natural
13:06
movies. This was a stimulus set
13:06
that I actually didn't design.
13:10
This is designed by Shinji
13:10
Nishimoto, who had a fantastic
13:14
paper in 2011, that was decoding
13:14
video from, from human visual
13:18
cortex, which is, like, just an
13:18
incredibly impressive thing,
13:21
even whatever it is, like 12
13:21
years later, I'm blown away by
13:24
that work. And a lot of like,
13:24
his ideas from that have carried
13:28
forward into our into our current work.
13:29
Right.
13:29
But, um, so we
13:29
had this data around, we just
13:33
didn't know like, what was actually happening in these movies, right, we needed to know
13:34
like, what is this a movie of,
13:38
so like, so we can use these
13:38
techniques.
13:41
Yeah, I mean, as a human you know what the movie is of, but you need to quantify
13:43
it for your models, right?
13:45
Exactly, exactly. We need to turn this into numbers so that we can
13:46
model it, right? So, I started
13:51
exploring, like, you know, do I
13:51
use Mechanical Turk to label
13:55
this or something. And we tried
13:55
that, and it ended up being just
13:58
kind of messy, and the results
13:58
were not great. So we ended up
14:02
doing a thing, which I, I tell
14:02
my students the story all the
14:06
time, because I think it's a,
14:06
it’s an example of like, just be
14:09
a little bit less lazy than the
14:09
next person and things can work
14:12
out for you. I just sat down and
14:12
labeled all the movies myself.
14:15
So it took I don't know, like,
14:15
two months or something, just
14:18
like spend an hour, an hour or
14:18
two a day like labeling, you
14:24
know, a few dozen seconds of
14:24
movie or something like this. So
14:26
we labeled it one second at a
14:26
time. Each second, we like wrote
14:29
down, I wrote down like, what
14:29
are all the things, and like,
14:33
are there verbs that are happening or are there actions that are happening? So…
14:37
Yeah.
14:37
This process took a while.
14:38
I thought so! (Laughter)
14:39
But I felt
14:39
confident in the labels because
14:43
I did them. And I knew that
14:43
they're like consistent across
14:46
the whole sets, unlike the
14:46
Mechanical Turk labels, which
14:49
were very again, like messy in a
14:49
lot of ways. So, once we had
14:54
these labels, then, we could
14:54
very easily kind of fit these
14:58
models so we can convert, you
14:58
know, if a, if a scene has like
15:02
a, you know, a dog in it. We
15:02
know that there's a dog there,
15:07
we can convert that into some
15:07
numerical vector that says,
15:09
like, what is the dog? And then
15:09
we can fit these regression
15:12
models that predicts the fMRI
15:12
data, predict how the brain
15:15
responds based on these vectors
15:15
of like semantic information.
15:19
Yeah, okay. Just
15:19
quick meta point, like, I really
15:23
hear you about, like, the time
15:23
the laboriousness right, when I
15:26
read the paper, I was like, that must have taken a really long time and wondering, I wonder,
15:28
did he do that himself? Or does
15:31
he, like, browbeat somebody into
15:31
doing it. But it's so true,
15:34
right? You have to especially
15:34
like, early in your career, like
15:36
you really do sometimes have to like to suck it up and do something really time consuming.
15:38
And I've done that a bunch when
15:42
I was, you know, earlier in my
15:42
career, and now like, whenever I
15:45
give my students some, some
15:45
awful task, I sort of have this
15:49
like, you know, pocket full of things that can tell them well, you know, when I was a kid, I
15:51
did this, I carried my father
15:54
was go on my back through the
15:54
snow. So you can you know,
15:57
transcribe this speech sample.
15:59
Exactly,
15:59
exactly. There's, there is a lot
16:03
of value in that. I think, like,
16:03
a lot of people just see
16:05
something hard and like, give up
16:05
or like, can I find some
16:08
shortcut around this? And just
16:08
sit down and do it oftentimes,
16:11
which is not that bad.
16:11
Yeah, sometimes
16:11
you just need to do it. Yeah. So
16:16
okay, so you've got the movies,
16:16
you've kind of like, you know,
16:18
you've labeled them with words
16:18
as to what's in them, and then
16:20
you and then you mentioned you
16:20
from those words, you get
16:23
vectors of numbers that are
16:23
going to describe the meaning of
16:25
the words. So that's what you
16:25
know, that's kind of an encoding
16:29
like a, what do you call it? A…
16:31
An encoding model.
16:31
Encoding model.
16:31
So can you describe how you get
16:34
that vector of numbers like, for
16:34
those who have not seen that
16:38
approach before?
16:39
Yeah. Yeah. So,
16:39
for this paper, we used
16:45
WordNets, which some people
16:45
might be familiar with. It's
16:50
essentially a dictionary. So we
16:50
have these unique entries that
16:54
are tied to the definition of
16:54
like this entry. Which, which is
17:00
nice because then, this
17:00
disambiguates like words that
17:03
have different senses, right? So
17:03
dog can be a verb, it can be a
17:07
noun, there's like 10 different
17:07
senses of dog, the noun. But you
17:11
know, I know that like dog dot n
17:11
dot 01 is the word I label for
17:14
like, this is a, you know, a
17:14
mammal, of the canid family,
17:17
whatever.
17:18
Yeah.
17:18
So, with WordNet
17:18
you get that kind of sort of
17:23
detail, but then you also get
17:23
information about what kind of
17:27
thing that is. So WordNet
17:27
contains a lot of these hyponymy
17:32
relationships. So like, a dog is
17:32
canid. canid is a carnivore, a
17:39
carnivore is placental mammal,
17:39
placental mammal is vertebrate,
17:44
etc. So you have this kind of
17:44
chain. That's sort of an easy
17:47
one, because it's taxonomy. But
17:47
it covers all kinds of things,
17:50
right? All, all, I don't know
17:50
how many like 10s of 1000s of
17:54
definitions, there are on WordNet, but it’s very expensive. So, we could label
17:55
all these things in WordNet. And
18:01
then we could use actually that
18:01
information to help us out a
18:03
little bit. So what we did is,
18:03
one kind of simple thing you
18:08
could do, is, say just there
18:08
were 1300 something unique
18:12
labels in this movie dataset.
18:12
So, let's just convert each
18:17
frame or each second of the
18:17
movie into like a length 1300
18:20
vector, that is zeros, except
18:20
for the places that correspond
18:26
to like the categories that are present.
18:27
Right. So say
18:27
there's a dog or not dog.
18:30
Right, right. So there'll be a one if there's a dog in the scene and zero if
18:32
there's not. So that's fine.
18:36
That's that's an OK model. It
18:36
turns out, it doesn't work
18:40
terribly well as a model.
18:40
Because it doesn't have a lot of
18:44
information that is actually
18:44
quite important. Right, so like,
18:48
say, there's a dog in some
18:48
scenes, and a wolf in other
18:51
scenes, right? If you just had
18:51
these as like one zero labels,
18:55
then your model has no idea that
18:55
like a dog is like a wolf. You
18:58
have to separately fit weights
18:58
for, you know, how does the
19:01
brain respond to dog? And how
19:01
does the brain respond to wolf?
19:04
Which is kind of inefficient.
19:04
But also it means that you can't
19:09
like, generalize to new
19:09
categories that you didn't see.
19:12
Right? So if your training data
19:12
contains a dog, but then you're
19:15
testing this model on some video
19:15
that contains a wolf, if you had
19:18
no wolf training data, you just
19:18
wouldn't be able to predict that
19:21
But if your
19:21
model knew that, like a wolf is
19:21
at all.
19:21
Right. actually a lot like a dog, these
19:24
are very similar things, then
19:26
maybe you could guess that the
19:26
response to wolf should be like
19:29
dog, and then everything works
19:29
better. So what we did in this
19:33
model is essentially just add
19:33
these hypernyms. So we extended
19:38
it from like the 1300 categories
19:38
that were actually labeled to
19:41
1705 categories, I think that
19:41
were the full set, which
19:46
included all the hyper names of
19:46
the labels that we had. So
19:49
instead of, you know, if the
19:49
scene just had a dog in it,
19:52
instead of just having one sort
19:52
of indicator that there was a
19:56
dog there, there would also be
19:56
an indicator that there's a
20:00
canine that there's a carnivore
20:00
a mammal.
20:03
A mammal. Yeah.
20:03
Yeah.
20:05
Which, like
20:05
later, we kind of actually
20:09
worked out what this kind of
20:09
meant mathematically in an
20:12
interesting way. In that, this
20:12
really actually kind of
20:15
represented, we could think of
20:15
it as like a prior on the
20:19
weights in the model, that we
20:19
kind of push closer together,
20:24
weights for categories that are
20:24
related. So, it would kind of
20:28
enforce that, like the response
20:28
to Dog and Wolf should be
20:31
similar.
20:32
Yeah, by having
20:32
a common covariant really with
20:36
them. Right?
20:37
Exactly. Yeah.
20:37
It turns out that that model
20:40
works much better. Like it's
20:40
much better, like predicting
20:43
brain responses. So, this is
20:43
another sort of critical part of
20:45
the kind of natural stimulus
20:45
paradigm, I think, is that we
20:49
build these models on natural
20:49
stimuli and then we also test
20:53
them by predicting brain
20:53
responses to like new natural
20:56
stimuli, which I think, you
20:56
know, I argue this strongly in
20:59
some papers. I think this is
20:59
really kind of a gold standard
21:03
for testing theories of how the
21:03
brain does things. Right? It's
21:08
like, we want to understand how
21:08
the brain processes visual
21:11
information, or how it processes
21:11
language. Let's just record
21:14
like, what happens when the
21:14
brain is processing language and
21:17
then let's say how well can we predict that?
21:19
Yeah.
21:19
How well can we
21:19
guess how the brain is going to
21:23
respond to this thing? So….
21:25
It’s fascinating
21:25
how quantifiable it is, right?
21:27
You know whether you're understanding things or not?
21:30
Yeah, it gives
21:30
you a score to just like, make
21:33
it go up. Right? You can keep
21:33
tweaking things and make that….
21:35
And I think
21:35
that's what we're going to see,
21:37
as we discuss some of these
21:37
papers today. Your models keep
21:39
Yeah, yeah,
21:39
absolutely. And I think they get
21:40
on getting more and more
21:40
sophisticated, right? And so
21:43
this is a pretty old school
21:43
model at this point, this paper
21:46
is ten years old, maybe even
21:46
older, when you actually did the
21:48
work. We've come a long way
21:48
since then. But I do want to
21:51
start here, because a lot of the
21:51
concepts kind of run through it
21:54
all and just the models get better. better in a way that is
22:00
quantifiable to us.
22:04
Which I think
22:04
it's very nice. Because it's,
22:04
Yeah. it's easy in a lot of ways. You
22:07
know, to think of like, you
22:12
know, for studying some
22:12
psychological phenomenon, we can
22:14
say like, here's a simple model, and then we're going to elaborate, elaborate and
22:16
elaborate that model. And maybe
22:19
it predicts like other little
22:19
things about it, but it becomes
22:23
unwieldy in a way. It's like,
22:23
it’s unclear with like big,
22:25
elaborate models. How well do
22:25
they actually explain things?
22:29
Whereas here, we have this like
22:29
quantifiable metric, right? We
22:31
can say like, it makes the
22:31
number go up, that makes our
22:34
prediction of how the brain
22:34
actually responds to real things
22:37
that we care about, better. Yeah. Very cool.
22:39
Okay. So, you've kind of
22:43
explained how you get the
22:43
numbers, how you turn the movies
22:46
into numbers, that represent
22:46
their meaning. Now, I can't
22:50
remember if you said, you had, I think you have two hours of movies in the study. Is that
22:51
right? And you don't
22:54
have a huge number of
22:54
That’s right. participants. I think you have
22:55
five participants. One is
23:00
yourself and you've got another
23:00
couple of co-authors in there. I
23:03
noticed there is a participant
23:03
called Jay G. But on careful
23:06
examination, it is not Jack
23:06
Gallant.
23:09
It is not. That's right.
23:11
Did you try to
23:11
get him into the scanner to be
23:13
one of the….(Laughter)
23:17
Yeah, he was scanned for some things, but not
23:19
So, a substitute
23:19
JG in his place?
23:19
for this one.
23:22
Yeah. That’s right.
23:22
So yeah, kind of
23:22
the classic psychophysics
23:25
tradition of you know, small
23:25
numbers of participants, most of
23:29
whom are the authors of the study, because that's who will tolerate two hours of scanning.
23:34
Yeah. Yeah.
23:35
Two hours might
23:35
seem like a lot, but it's going
23:37
to get more. So, anyway, how
23:37
many explanatory variables do
23:40
you end up having, at the end of
23:40
the day, when you fit that to
23:43
your data? Like it must be well
23:43
over a thousand, right?
23:45
Yeah, yeah. So,
23:45
the feature space in that paper,
23:52
goes to 1705 parameters. And
23:52
then we also do a thing where,
23:58
you know, if you're if you're trying to predict how the brain is responding to this, like
24:00
ongoing natural movie, you also
24:02
need to capture the hemodynamic
24:02
response function. Right? And
24:07
the standard way to do this, is
24:07
just to like convolve your
24:09
design matrix, which would be,
24:09
you know, the 1700 dimensional
24:11
thing with, with a canonical
24:11
HRF. But, you know, thing that
24:17
they had found in Jack's lab,
24:17
before I even got there, this is
24:20
really, I think, work by
24:20
Kendrick Kay, that showed this
24:23
very nicely. That doesn't
24:23
actually work terribly well,
24:26
especially if you have this
24:26
opportunity to like measure how
24:30
well are you actually doing? Like, how are you predicting? And it turns out that using the
24:33
canonical HRF is kind of bad, or
24:37
you're leaving a lot of like
24:37
variants on the table. So, we
24:42
use this approach the finite
24:42
impulse response model, where
24:46
essentially we fit separate
24:46
weights for each feature for
24:51
several different delays. So
24:51
we're kind of fitting a little
24:53
HRF for each feature. So…
24:54
Yeah.
24:54
For the dog
24:54
feature, we get a little HRF and
24:57
so on.
24:58
Yeah. Okay, you
24:58
have four different delays you
25:00
like two seconds to each. So you're basically modeling the first eight seconds and letting
25:02
the response take any shape it
25:05
does in that time.
25:06
Exactly. I think it’s really actually three delays. It was like, four, six
25:08
and eight seconds.
25:11
Okay.
25:12
We expanded to
25:12
four delays later for language.
25:14
Because language actually has
25:14
earlier, earlier kind of take
25:17
off, it turns out. Visual cortex
25:17
has the slow HRF. Which, it’s
25:23
kind of weird when you think about it because the canonical HRF is built based on like V1,
25:25
and it turns out V1 doesn't have
25:29
like the most standard HRF
25:29
across the brain. It's quite
25:33
different. Auditory cortex has
25:33
like a very short HRF in
25:35
comparison. Motor cortex has a
25:35
different sort of style of HRF.
25:39
Yeah.
25:39
There’s different things happening everywhere. But using this kind
25:40
of method, it blows up your
25:44
parameter space.
25:44
You have to multiply it by three in this case,
25:47
Exactly. Yeah.
25:47
The 1705 times three features.
25:51
Yeah.
25:51
But it still
25:51
works better than using the
25:54
canonical HRF. Because you're
25:54
capturing all these sort of
25:57
details in, in the hemodynamic
25:57
responses across the brain.
26:02
Yeah, I mean, I'd love to talk about HRF. I could talk about HRF for like an
26:03
hour with you, but I probably we
26:07
probably should focus on
26:07
language. (Laughter). But ya,
26:10
no, I mean, the the relevant
26:10
thing here is that it turns
26:13
what's already a large number of
26:13
explanatory variables into a
26:15
very large number. So, are there
26:15
issues with fitting linear
26:19
models that have 5000
26:19
explanatory variables? Or do you
26:22
have enough data to do that?
26:24
So, they're definitely issues. There’s always issues. There’s always
26:25
issues in fitting linear models. I don't know. This is, in the
26:27
time that I was in Jack's lab,
26:32
I'd say maybe a good like, 80%
26:32
of everyone's time and effort
26:37
was devoted to this question of
26:37
like, how do we fit better
26:39
linear models? Like that was,
26:39
that was really central to like,
26:43
everything we did. It's weird,
26:43
because like, that didn't look
26:46
like the scientific output.
26:46
Right? It's like, we didn't
26:48
publish a lot of papers about like, how do you fit these linear models? It just ended up
26:50
being like a tool that we used.
26:54
But that was a massive amount of
26:54
the like, intellectual effort
26:57
there. It was just like, how do
26:57
we do this well? Yeah. So you
27:01
know, we have to use regularized
27:01
linear regression. That's like a
27:04
super important tool. There were
27:04
a lot of different like styles
27:07
of this that were used for
27:07
different projects in the lab.
27:09
It turned out for sort of vision
27:09
models, if you're modeling like
27:13
visual cortex, then one style of
27:13
model works terrifically well.
27:17
This like sparse linear
27:17
regression, because visual
27:21
cortex responses are actually kind of sparse, they only care about like one little part of
27:23
the visual field. Whereas, for
27:26
these like semantic models, that
27:26
I was using, that actually
27:30
didn't work that well, which was
27:30
surprising. And the thing that
27:33
worked really well was ridge
27:33
regression, and that's been kind
27:35
of the mainstay of everything we
27:35
do since then. So this is a L2
27:40
regularized regression. I don’t know. We can….
27:42
It’s definitely
27:42
interesting. Yeah. And I think
27:44
that the big picture point is
27:44
clear. And it's just a kind of
27:48
a, it's just an interesting
27:48
general point of just how much
27:51
of science often is just these
27:51
behind the scenes details that
27:54
make or break the papers. And
27:54
when you read the paper, it just
27:58
basically says, we did L2
27:58
regularize regression and what
28:01
the reader doesn't always know
28:01
is that like, that was a year of
28:05
pain to get to that or, you
28:05
know. So that's very
28:08
interesting. I always, it’s just
28:08
always fascinating to hear about
28:11
the process behind these papers.
28:11
Because like, I think a good
28:15
paper like just read, it reads
28:15
like that was the most obvious
28:18
thing in the world to do. But
28:18
like, sometimes it wasn't
28:22
obvious at all. Okay, so you fit
28:22
models, at every voxel, you fit
28:25
these models and have 5000
28:25
explanatory variables based on
28:29
the semantic feature
28:29
representation and flexible HRF.
28:32
And then what you find is that
28:32
different parts of the brain or
28:35
different voxels have very
28:35
different response profiles. And
28:39
you demonstrate this this in the
28:39
paper with a voxel from the PPA,
28:42
or the, stands for
28:42
parahippocampal place area, and
28:45
another voxel in the precuneus.
28:45
Can you kind of talk about the
28:49
different responses that you saw
28:49
across the brain?
28:52
Yeah, yeah. So
28:52
that's kind of the second stage
28:57
of this style of like encoding
28:57
model science is, you know, we
29:00
can fit the models, we can test
29:00
them, we see they work well. And
29:03
then we can say, like, what is
29:03
it that’s actually causing some
29:06
piece of brain to activate?
29:06
Right? Like, what what are the
29:08
features that are important to
29:08
this, you know, this voxel this
29:11
chunk of brain? So, you know,
29:11
one thing we can do is just like
29:15
look at the weights, right? So
29:15
we just pick out a voxel and say
29:18
like, what are the weights look
29:18
like? There is high weights for
29:20
the PPA case. I don't have in
29:20
front of me, but it's like high
29:25
weights for buildings and cars.
29:25
So it like it likes sort of
29:29
seeing things that are
29:29
constructed and constructed in
29:32
motion. Whereas the precuneus
29:32
voxel, I think it's much more
29:35
selective for like other people
29:35
and animals and this kind of
29:38
thing. So, you know, we can do
29:38
that we can look in detail,
29:42
quite a bit of detail at like
29:42
individual voxels and say, like,
29:45
you know, what does this voxel
29:45
care about? What is this voxel
29:47
care about? But that has its
29:47
limits? There's a lot of numbers
29:52
to look at there and there's a
29:52
lot of voxels in the in the
29:54
brain, right? And this is, you
29:54
know, we're not doing this on
29:56
groups of subjects. We're
29:56
fitting this separately on each
29:58
individual subject. Um, there’s
29:58
a lot of voxels to look at. So,
30:03
what we did instead was, we kind
30:03
of tried to summarize the
30:06
weights by reducing their
30:06
dimensionality. So, we just
30:10
applied like a standard sort of
30:10
machine learning data mining
30:14
technique, Principal Component
30:14
Analysis, use that to squeeze
30:17
down these things from 1705
30:17
weights we average across the
30:21
HRF, down to just three or four
30:21
dimensions. And say, like, you
30:25
know, what are the major kind of
30:25
axes of variation across,
30:28
across the brain? Right? If we
30:28
had to summarize what these
30:31
voxels are doing, like three or
30:31
four numbers, what does that
30:35
what does that say?
30:37
And those, those
30:37
top three or four dimensions,
30:41
capture kind of interpretable
30:41
aspects of semantics, right, in
30:45
this paper, at least
30:46
interpretable ‘ish’. So like, then interpreting those ends up being
30:48
a whole, a whole thing. Like,
30:51
that's, that's difficult and,
30:51
like, contentious and yeah,
30:56
yeah. So, I mean, they do
30:56
separate some things that are
31:00
sort of natural, and we would
31:00
expect to see, so like animate
31:03
versus inanimate categories.
31:03
It's a big distinction,
31:08
essentially, human versus non
31:08
human. I think like, sort of
31:13
buildings, artifacts, and
31:13
vehicles versus other things. So
31:18
these are the kind of like major
31:18
dimensions that we see pop out.
31:22
And one thing we did in that
31:22
paper was also like we compared
31:25
these dimensions to, there‘s
31:25
been a lot of literature in the,
31:31
the field of people looking at
31:31
ventral, ventral temporal cortex
31:34
of like, what are the major
31:34
dimensions of representations.
31:38
So, one of those was animacy. So
31:38
this is from Jim Haxby’s lab.
31:44
Andy Connolly had this great
31:44
work showing that like, there
31:47
seems to be this like gradient
31:47
of animacy, across ventral
31:51
temporal cortex. And that came
31:51
out like very naturally, in our
31:54
data. We really saw that, that
31:54
there was like this, this
31:57
beautiful, you know, animacy
31:57
kind of dimension. Other things
32:01
we found, like less evidence
32:01
for, but yeah, we still got to
32:06
kind of explore that space.
32:07
Yeah, like my
32:07
notes of it. And this is just
32:10
like my notes that I when I read
32:10
it, which is a couple of weeks
32:14
ago, whenever I wrote the first
32:14
component, motion slash animacy,
32:19
second component, social
32:19
interaction, third component,
32:22
civilization versus nature.
32:22
fourth component, biological
32:25
versus non biological. So, like
32:25
you said, I'm not like trying to
32:28
hold you to those. But I think
32:28
it's kind of interesting that
32:30
like, these are the big, big
32:30
organizing principles. It kind
32:35
of like a bit of an, it's a
32:35
window into like, what's really
32:38
important for humans, right?
32:38
Like, these are the major axes,
32:41
on which the semantic system is,
32:41
you know, has the most variance.
32:45
Yesh, yeah. And
32:45
that's, that's something that I
32:50
really like about this approach,
32:50
too, right? Is that, like, we're
32:52
not taking those things as given? We’re not baking
32:55
those things into our
32:55
No. experimental design. We're just
32:56
saying, like, watch a bunch of
32:59
videos, and let's see what falls
32:59
out. Right? Like, yeah, what are
33:02
the differences across the brain? What are the major distinctions the brain makes?
33:05
Yeah. And the brain did. And he was surprised in this paper that the brain
33:07
didn't care about object size,
33:10
which actually, maybe, is not
33:10
that surprising. Like, maybe it
33:14
shouldn't, right?
33:16
Maybe it shouldn't, that's one of these, like, sort of proposals for what
33:17
was the major organizing kind of
33:21
feature of, of visual cortex was
33:21
object size, and we found less
33:25
evidence for that. I don't know
33:25
what the current kind of status
33:28
of of that theory is.
33:29
Yeah. Me
33:29
neither. Okay, so like, just
33:33
kind of big picture was like,
33:33
you know, can you describe,
33:36
like, the what, which brain
33:36
areas were responsive to these
33:40
semantic distinctions in this study?
33:43
Yeah, so big
33:43
picture. You know, what we saw
33:48
here was that this sort of
33:48
semantic selectivity for visual
33:53
concepts, was not isolated to
33:53
just like these few areas in
33:58
higher visual cortex. Which is
33:58
kind of the picture that we had,
34:02
loosely, from a lot of studies
34:02
that came before this, right?
34:05
So, we knew about like, the
34:05
place selective areas, we knew
34:08
about face selective areas, body
34:08
selective areas. And those were
34:12
kind of the major, you know,
34:12
sets of things that we knew
34:15
about.
34:15
Yeah.
34:16
And those turned
34:16
out to be quite important and
34:19
very clearly, like come out in
34:19
this kind of data. But we think
34:22
of those more as like, you know,
34:22
ticks in a terrain, right? So
34:28
there's this actually
34:28
complicated terrain, kind of all
34:31
across higher visual cortex.
34:31
Like if you go outside of
34:33
retinotopic visual cortex,
34:33
there's kind of a band of, of
34:36
cortex that stretches around
34:36
that, like all the way around
34:38
the brain. And all across this
34:38
band, we see selectivity for
34:41
something for like some kind of,
34:41
you know, specific semantic
34:46
features. So that's kind of the
34:46
majority of where this happens
34:49
is sort of in this band, like
34:49
around higher visual cortex. We
34:54
also see quite a bit of stuff
34:54
happening. You know, there's
34:57
there's kind of these weird like spirits that come out of visual cortex. So, up to the to the
34:59
pSTS, there is some visual
35:04
representations up there,
35:04
through the intraparietal sulcus
35:08
and sort of onto the medial
35:08
surface of the cortex, there’s
35:12
another kind of spear of visual
35:12
cortex. And then up in
35:16
prefrontal cortex, there's also
35:16
like a few selected areas that
35:20
are quite visually selective. So
35:20
there's some face patches and
35:22
like the frontal eye fields are
35:22
very visually responsive.
35:26
It's funny, you're calling them visual, but like, don't you think they're
35:27
semantic?
35:31
Yes. (Laughter)
35:31
Because all of these things, you
35:35
know, we've seen later are like,
35:35
you know, they don't really care
35:38
that it's vision per se. That's
35:38
not quite true. So, FEF does
35:43
care that it’s vision. FEF kind
35:43
of only responds to visual
35:46
Frontal eye field. Yeah. stuff.
35:48
Yeah.Intraparietal
35:48
sulcus, it’s quite visual. So,
35:51
that's kind of a gap in the
35:51
other maps that we see. Well,
35:55
lot of this stuff in higher
35:55
visual cortex, especially, we
35:58
call it visual cortex, because
35:58
that's how we were studying it,
36:01
but it turns out that that
36:01
overlaps, like very heavily with
36:05
other modalities of
36:05
representation.
36:09
Cool. Yeah, I want to talk more about like, the anatomy of the areas that
36:11
you find, but maybe best in the
36:15
context of the next paper where I think it comes out more clearly. So shall we move on to
36:17
the next one now?
36:21
Sure thing.
36:22
So this is your
36:22
2016 Nature paper. And, you
36:27
know, I've always noticed that,
36:27
like, Nature doesn't publish
36:30
fMRI, you know? Like science
36:30
does, or they did you know,
36:34
like, back in the heyday of
36:34
fMRI, where, you know, every man
36:37
and his dog was getting, you
36:37
know, these high profile papers,
36:40
it was only science that was
36:40
buying the Kool-Aid, you know?
36:44
Nature, the only fMRI papers
36:44
they ever published was like,
36:49
sorry, I'm blanking on the one,
36:49
Logothetis et al., where, you
36:55
know, they actually do
36:55
simultaneous fMRI and, you know,
37:00
direct cortical recording. I
37:00
mean, that was good enough for
37:02
Nature. But generally speaking,
37:02
Nature does not publish fMRI.
37:05
So….
37:05
We were pleased that this happened.
37:06
Congratulations.
37:06
Yeah. If any paper was going to
37:11
be in Nature, I think this is a
37:11
worthy, a worthy one.
37:15
Thank you. So,
37:15
it was, it was a big effort,
37:18
yeah.
37:19
Definitely. So
37:19
in this one, like, it's
37:21
definitely a bit more language
37:21
than the other one. So, because
37:24
the stimuli are language, right?
37:24
So, can you tell us about the
37:26
stimuli?
37:28
Yeah, yeah. So
37:28
this was, you know, what I said
37:32
before is like, you know, I kind
37:32
of started off, you know, we
37:34
wanted to do language, we wanted
37:34
to do encoding models for
37:36
language. I did this kind of
37:36
offshoot project into vision
37:39
that was kind of using our
37:39
models that we were designing to
37:43
study language started to do
37:43
vision. But at the same time, we
37:45
were sort of continuing down
37:45
this path of doing language. So,
37:49
we’d done some, some tests with
37:49
other kinds of stimuli. So, we
37:52
tried things like having people
37:52
read sentences with RSVP, just
37:57
like one word at a time. That
37:57
didn't work terribly well, it
38:01
turns out, it does work fine. We
38:01
were just kind of doing things
38:03
It teaches us
38:03
something about prosody, the
38:04
poorly at the time. We were
38:04
worried about timing. So we're
38:05
fact that that's not going to work. worried about you know, like,
38:07
when did the words happen? So,
38:13
my co-author, Wendy de Heer and
38:13
I, we really, like developed
38:16
this experiment together. We
38:16
spent a lot of time doing things
38:19
like recording ourselves
38:19
speaking stories, at a rate of
38:24
one word per second. (Laughter)
38:24
Which is the most mind
38:29
bendingly, like awful thing to
38:29
listen to. Like imagine, it's
38:34
shocking, like just how boring
38:34
like, verging on like painfully
38:38
boring, that is. It’s awful.
38:44
Right? So, we
38:44
did these, like one word per
38:49
second experiments, terrible.
38:49
But it was it was controlled in
38:54
a sense that, you know, we knew
38:54
when every word happened. And
38:58
then Wendy said, why don't we
38:58
just try something like actually
39:01
natural. She had been listening
39:01
a lot to The Moth, which is a
39:05
storytelling, podcast and radio
39:05
show. And she said, why don't we
39:09
try listening to one of these
39:09
stories? And I was like, you
39:13
know, how are we going to deal
39:13
with the timing? And she said,
39:15
Oh, this is the thing that
39:15
linguists have totally figured
39:18
out. You need to transcribe it,
39:18
use this Forced Aligner, you can
39:20
figure out when each word is.
39:20
Fine. And I was like, Okay, if
39:23
you know how to do that, that's
39:23
great. So we did it. We
39:26
collected some data with, I
39:26
think just the two of us
39:28
listening to these Moth stories.
39:28
In fact, just like one more
39:31
story to start. And it just
39:31
immediately worked extremely
39:36
well. We got beautiful signal
39:36
quality, like all across the
39:39
brain. We do a sort of pretest
39:39
on a lot of the stimuli that we
39:43
use, which is we just have
39:43
someone listen to the same
39:45
stimulus multiple times. Right?
39:45
We just like, have this person
39:49
listen to the same story twice.
39:52
Intra-subject
39:52
correlation, huh?
39:55
Intra-subject. Yeah, exactly. So it’s not inter-subject correlation,
39:56
intra-subjects. It’s just within
39:59
each voxel, how correlated are
39:59
the responses? Kind of a measure
40:02
of like, how big are the
40:02
functional signals that we're
40:04
getting. Right? We've done that
40:04
for these like, you know, one
40:07
word per second stimuli. It was
40:07
God awful. And then we did it
40:11
for The Moth story, and it was
40:11
just beautiful, like the whole
40:13
brain was trying to respond…
40:14
But you don't
40:14
know, yeah., sorry. You don't
40:16
know yet. What's driving that?
40:16
Right? If you're…
40:19
Right.
40:19
I mean, it could be anything. It could be something really trivial, like,
40:21
you know, auditory amplitude,
40:24
right? But it's not, but you
40:24
don't know yet, right?
40:28
That was actually, what Wendy was interested in. So she was a grad
40:30
student in Frederic Theunissen’s
40:32
lab, they study sort of low
40:32
level auditory processing,
40:35
mostly in songbirds, but now
40:35
also in fMRI. So when he was
40:38
interested in sort of building
40:38
these acoustic models, like
40:42
spectral representation models
40:42
of auditory cortex. So, she was
40:45
totally fine with that. Yeah,
40:45
but we didn't know why this
40:49
activation was happening. So, we
40:49
got these maps. We are like,
40:51
this is beautiful. Let's keep
40:51
doing this. So we just collected
40:54
a bunch of data of us at first,
40:54
just listening to these Moth
40:57
stories, listening to lots of
40:57
Moth stories. We collected a
41:03
two sessions of five stories
41:03
each of listening to these Moth
41:07
stories. We went through this
41:07
the transcription and alignment
41:12
process, which I learned
41:12
eventually. So then, you know,
41:17
we had this information about
41:17
like, every single word in the
41:19
stories, and exactly when all
41:19
those words were, were spoken,
41:23
right? So we had this aligned
41:23
transcript. And then we could
41:27
start to do kind of interesting
41:27
things, right? So, we also have
41:30
phonetic transcripts, we can
41:30
build phonetic models. So, just
41:33
say, you know, our feature
41:33
spaces, the English phonemes.
41:38
What does each voxel respond to
41:38
in phoneme space? We could build
41:42
acoustic models with this data.
41:42
So get sound spectra for the
41:45
stories, and then use that to
41:45
model the brain data. And we
41:48
could get these semantic models,
41:48
right. So this is using these,
41:53
again, kind of primitive word
41:53
embedding models, latent
41:55
semantic analysis, and so on.
41:55
And those worked very well
41:59
across like a big chunk of
41:59
brain. And that was, that was
42:02
very exciting. That actually
42:02
happened, I think in like 2012.
42:04
So that was around the time that
42:04
the earlier like movie paper was
42:07
coming out when we had these
42:07
first results that were showing
42:10
that this is really starting to
42:10
work and starting to show us
42:13
something.
42:13
Okay, so in this
42:13
case, you're not having to go
42:16
through it laboriously identify
42:16
what's in the like, because
42:20
previously, you were looking at the movies and saying what is there semantically, right? Like,
42:21
whereas here, you're just
42:25
actually taking the words that
42:25
are transcribable and then
42:28
deriving semantic
42:28
representations for the words
42:33
automatically from that point,
42:33
right? Like…
42:35
Yeah.
42:35
So, it's a lot less labor intensive.
42:38
Definitely. And I mean, the transcription process, I think it's the most
42:40
similar to like the labeling
42:43
process.
42:43
But it’s sort of
42:43
much more deterministic.
42:45
Yeah. It’s much
42:45
easier, much easier. No, no
42:48
judgment calls to be made,
42:48
really. You can listen to
42:50
something at half speed, if
42:50
you're pretty good and just bang
42:53
it out. Yeah.
42:57
Okay. So, yeah,
42:57
I know. And it works a lot
43:00
better, right? Like then,
43:00
compared to the previous paper,
43:04
like the semantic maps here, are
43:04
just a lot more clear and
43:11
expansive, right?
43:13
Yeah, yeah. So,
43:13
where the, you know, the visual
43:17
stimuli really elicited these
43:17
responses, like in this band
43:20
around visual cortex, these
43:20
stories stimuli, we got stuff
43:24
everywhere, got stuff
43:24
everywhere. It was all over,
43:29
sort of, Temporoparietal
43:29
Association Cortex, all the way
43:32
from like, ventral temporal
43:32
cortex, up through sort of
43:38
lateral parietal cortex and down
43:38
into the precuneus. And then
43:41
also, all across prefrontal
43:41
cortex, we found this strong,
43:45
predictable responses to
43:45
language, with these semantic
43:49
features. So, this um, just
43:49
worked really well, kind of
43:53
quickly. One thing that at this
43:53
point that we were like, kind of
43:56
freaked out by, was, we just
43:56
didn't find any asymmetry in
43:59
terms of model performance
43:59
between the two hemispheres, it
44:01
was like the right hemisphere
44:01
was responding just as much or,
44:04
you know, our models were working just as well in the right hemisphere, as in the left
44:06
hemisphere.
44:09
Sorry, when you talk about working, I really want to talk very much when I'm
44:10
talking about this, because it's super interesting. I just wanna
44:12
make sure you, understand what
44:15
we're saying exactly. So, when
44:15
you say working well, or you're
44:18
talking about predicting held
44:18
out data, or you're just talking
44:21
about having a lot of variance
44:21
by semantics?
44:24
Yes, yes. Sorry,
44:24
that I mean, predicting held out
44:26
data. So we can….
44:28
So, yeah, so you
44:28
can, you're listening to two
44:30
hours of stories and then
44:30
predicting another 10 minutes.
44:34
Right.
44:35
What the brain
44:35
should look like, on a story
44:37
that the model hasn't seen
44:37
before and many voxels in the
44:40
brain are really good at this,
44:40
but not all, and then you make
44:44
these maps of like, where are
44:44
the voxels that are good at
44:46
predicting where you, where
44:46
you're able to predict. Because
44:49
I mean, I guess the I mean,
44:49
forgive me for breaking it down,
44:52
like really simply but like
44:52
nothing. If a voxel is not able
44:56
to predict a held out data that
44:56
probably means it doesn't have
44:59
semantic representations. Because if, it if it doesn't have semantic representation
45:01
then wouldn't be able to,
45:03
because why would it? And then
45:03
if it does, then it should,
45:06
right? So, it's kind of a really
45:06
good window into like, where in
45:08
the brain there are semantic representations.
45:11
Exactly. That's
45:11
kind of one of, you just
45:13
verbalized one of the core kind
45:13
of tenets of this kind of
45:17
encoding model style science.
45:17
Which is that, if you can
45:21
predict what a voxel is doing,
45:21
on held out data, using some
45:24
feature space, then we take that
45:24
as evidence that, that voxel’s
45:30
representation is tied up to
45:30
that feature space. That it is
45:32
related to that feature space in
45:32
some way. And of course, there
45:34
can be spurious correlations,
45:34
and we see this and you know, we
45:36
can try to explain those away in
45:36
various ways. But basically,
45:40
that's the kind of inference
45:40
that we try to make. Right? So,
45:44
so we found that like, right
45:44
hemisphere, we could predict
45:47
right hemisphere, just as well, as we could predict left hemisphere, there was no real
45:49
asymmetry and friction there. I
45:51
remember showing this to another
45:51
grad students when I’d first
45:55
found this and he said, nobody's
45:55
gonna believe you in the
45:59
language world. (Laughter) Like,
45:59
too weird. If you don't find
46:02
left lateralization, like, nobody's gonna believe you. Which, I don’t know, it has
46:05
ended up being very interesting
46:08
in terms of how people think
46:08
about lateralization. So, um,
46:13
Liberty Hamilton, who's a
46:13
longtime collaborator of mine,
46:18
and I'm married to, she also,
46:18
you know, this is kind of a
46:24
bugaboo that we have together
46:24
is, you know, she's seen in a
46:28
lot of her work in
46:28
electrophysiology, that, right
46:30
hemisphere, auditory cortex,
46:30
like definitely represents
46:33
language stuff, for perception,
46:33
at least. And that's really what
46:38
we saw here, too, is that like,
46:38
for language perception, right
46:41
hemisphere, it was engaged in
46:41
sort of semantic representation
46:45
to the same kind of degree as left hemisphere.
46:47
Yeah. And so
46:47
what are you? I mean, do you
46:51
think that is? I mean, I feel
46:51
like, when I first saw your
46:55
paper, that was what jumped out
46:55
at me too. And I struggled with
47:00
it briefly and then I came to
47:00
terms with it. Um, so, it
47:05
doesn't trouble my mental model
47:05
anymore. How do you, how do you
47:09
interpret it? I mean,
47:09
specifically, how do you square
47:13
it with the fact that aphasia
47:13
only results from left
47:16
hemisphere damage?
47:18
Yeah. So, I
47:18
mean, I think there's a broader
47:21
question of like, how we square
47:21
a lot of these results with the
47:24
Aphasia literature, which is
47:24
difficult, right? Because the
47:28
literature says that, there's
47:28
only kind of a small selection
47:30
of areas that if you have
47:30
damaged those areas that it
47:33
causes, you know, this loss of
47:33
semantic information, right?
47:38
Loss of like, word meaning
47:38
information. But we saw these,
47:41
like, much broader, like big
47:41
distributed things all across
47:45
like prefrontal cortex, parietal
47:45
cortex, whatever, this just
47:48
really did not match what people
47:48
had seen in the aphasia
47:50
literature. And, you know,
47:50
especially then for the right
47:54
hemisphere, as well, right, we see it all over the right hemisphere, and that, again,
47:55
just didn't match. So, I think
48:01
of this in kind of two ways. So,
48:01
one is that, you know, what
48:07
we're showing here is kind of
48:07
what types of information are
48:14
correlated with activity in
48:14
these voxels? Right? So, you
48:18
know, if somebody is listening
48:18
to a story, and the story is
48:22
talking about, you know, the
48:22
relationships between people and
48:25
so on, and you're trying to
48:25
process that information, then
48:27
there's a bunch of voxels that become active, there's a bunch of brain areas that become
48:29
active that start to turn on in
48:33
the presence of this kind of
48:33
social information. That doesn't
48:36
necessarily mean that like,
48:36
those are the areas that, you
48:40
know, link, the words that
48:40
you're hearing to the meaning
48:44
that you're hearing. That maybe
48:44
downstream of that, and I think
48:46
most of them actually are downstream of that. They’re involved in like, some kind of
48:48
cognition around this concept,
48:51
but not necessarily like, just
48:51
the the process of like linking
48:55
words to meaning linking
48:55
meanings of words together.
48:58
I think the other linking is so critical here. Yeah.
49:02
Yeah. So we kind
49:02
of can't disentangle that here
49:05
and I think that is probably one
49:05
of the real kind of drivers of
49:10
what people think the Aphasia literature. And I know, this is a popular topic there. The other
49:12
thing that I kind of point to,
49:18
in trying to square this with
49:18
with aphasia, is the fact that
49:22
we see quite a bit of like
49:22
redundant representation across
49:25
cortex, right? We don't see,
49:25
there's just, you know, one
49:28
patch of cortex that cares about
49:28
the social information are one
49:31
patch of cortex that cares
49:31
about, I don't know, time words,
49:35
for example, it's another category that we saw it kind of pop out. It's actually you have
49:37
many patches to care about these
49:41
things, right? We have a whole
49:41
kind of network for each of
49:43
these kinds of concepts kind of
49:43
topics. So, I think it's very
49:49
plausible that like, even if
49:49
these areas are really like
49:53
causally involved in
49:53
representing and processing that
49:56
kind of information that
49:56
damaging Some of them won't be
50:01
sufficient to actually cause the
50:01
deficits that you would see in
50:04
aphasia.
50:05
Yeah. Okay.
50:05
Yeah. I mean, I definitely agree
50:08
with your interpretation. I
50:08
mean, I think I would put it
50:12
like you are really studying
50:12
thought here, right? In a way,
50:17
like it's out like downstream of
50:17
these links, right. So I think
50:21
we think with both hemispheres,
50:21
and you know, just because
50:26
language well, the link with the
50:26
lips but the links are in the
50:28
left hemisphere, right? So if
50:28
you took a person with aphasia,
50:31
and did your study on them, like
50:31
a severe receptive aphasia,
50:35
maybe, I think you probably
50:35
wouldn't see those semantic
50:40
representations in the right hemisphere, either, even though the right hemisphere would be
50:42
intact, right? Because it would
50:45
never get there, because the
50:45
links which are left lateralized
50:48
would not exist, and that would,
50:48
and therefore you wouldn't be
50:51
able to generate that those
50:51
semantic representations from
50:53
the linguistic input. Right? So
50:53
I actually don't think it's
50:56
inconsistent at all on, on
50:56
second thought, although like,
51:00
you know, before your paper came
51:00
out, I think a lot of us kind of
51:04
thought, oh, yeah, the semantic
51:04
representations left lifeless,
51:06
but I don't know if I thought, I
51:06
hope I didn't think that because
51:10
it would be silly. Because, you
51:10
know, one thing that I know from
51:13
working with people with aphasia
51:13
is that they understand the
51:15
world that they live in, right?
51:15
They're not like walking around
51:18
confused, and not knowing what's going on.
51:20
Absolutely.
51:21
It's a language,
51:21
I mean, it's a language deficit.
51:25
And the only patients that don't
51:25
understand the world around
51:28
them, and neurodegenerative
51:28
patients who have bilateral
51:31
damage, that specifically like
51:31
semantic dementia, like when
51:34
it's advanced, or you know,
51:34
Alzheimer's when it's advanced,
51:36
right, but like, it'll any kind
51:36
of lateralized brain damage the
51:39
you don't really get real
51:39
semantic deficits, like, you get
51:42
like semantic control deficits.
51:42
They can't do semantic tasks,
51:46
maybe for a myriad of reasons.
51:46
But you know, you never get
51:50
somebody that doesn't understand
51:50
what's going on. So it actually
51:55
makes total sense.
51:57
It does.
51:58
It's really,
51:58
it's nice to kind of square it.
52:00
Because yeah, it definitely like
52:00
struck me. That was the thing
52:04
that really leapt out at me was
52:04
how bilateral it was and
52:07
symmetrical.
52:09
Yeah, so it's
52:09
definitely symmetrical in terms
52:12
of what we've been talking about
52:12
of just like how well can we
52:15
predict these areas. But it
52:15
turns out that it's actually not
52:18
super symmetrical in terms of
52:18
exactly what information is
52:21
represented there.
52:22
Oh, really?
52:22
So, we see a
52:22
little bit of a hint of that in,
52:25
in this 2016 paper, where
52:25
there's an asymmetry in terms of
52:30
representation of specifically,
52:30
like concrete words, concrete
52:35
concepts, that seems to be more
52:35
left lateralized than right
52:39
lateralized. So, it's like the
52:39
representations are as strong,
52:43
in away, but they're just kind
52:43
of of different things. In more
52:46
recent work that we've yet to
52:46
publish, but we're very excited
52:50
about which is doing a similar
52:50
kind of thing, like we did in
52:53
this paper, but using more
52:53
modern like language models. So
52:57
we're looking at sort of phrase
52:57
representations instead of word
53:00
representations. We see that
53:00
this asymmetry is really
53:04
pronounced in terms of sort of
53:04
representational timescale,
53:08
where the right hemisphere seems
53:08
to represent sort of longer
53:10
timescale information than the
53:10
left hemisphere.
53:13
That's interesting.
53:15
It’s maybe tied
53:15
into this like concrete abstract
53:17
distinction, which is also sort
53:17
of associated with timescale.
53:20
This is my student Shailee Jain,
53:20
who's working on this. We're
53:24
very excited about what this is gonna show.
53:26
Okay, cool.
53:26
Yeah, actually, the next thing I
53:29
want to talk about is one by by
53:29
her, but not not the one you're
53:32
mentioning, I think. But um,
53:32
yeah, one more thing about this
53:35
one, before we move on from it,
53:35
well, two more things, actually.
53:39
So, the lateralization might
53:39
have come as a surprise to
53:42
language people, but the within
53:42
each hemisphere, the areas that,
53:48
where you see the semantic
53:48
predictability match pretty well
53:52
on to what the language
53:52
neuroscience community had kind
53:57
of settled on as the semantic
53:57
network of the brain, right?
54:00
Absolutely.
54:00
Yeah. No, I really love this, I
54:04
think it’s the 2009 review paper
54:04
from Jeff binder.
54:06
Yeah, it's one
54:06
of the most useful papers that I
54:10
go back to again and again. Just
54:10
love that one.
54:13
It's beautiful.
54:13
I love how they break down, you
54:15
know, the different types of
54:15
experiments and you know, what,
54:18
what kind of approaches they
54:18
like and don't like. But there's
54:20
a figure there that I often show
54:20
in talks, which is, it's just a
54:25
brain with little dots on it in
54:25
every place that's been reported
54:28
as like an activation in some
54:28
paper for some semantic task and
54:32
it matches so well what find in
54:32
this work, right?
54:36
It does.
54:36
The entire prefrontal cortex, the entire kind of parietotemporal cortex.
54:38
I often say this is…
54:42
Midline areas,
54:42
too, you know? You have the same
54:45
midline areas, like you both
54:45
have like, you know, you've got
54:47
your medial prefrontal, and then
54:47
precuneus in the middle and not
54:51
yet like, it’s not obvious at
54:51
first glance, because you guys
54:55
use flat maps right? So
54:55
everything is like kind of
54:57
flattened out, because you're
54:57
at, this is a way you can tell
55:00
you you grew up in vision,
55:00
because like…
55:03
Exactly.
55:03
You use flat
55:03
maps, but as soon as you, like,
55:06
just take a step back, you
55:06
realize, oh, that's just the
55:08
semantic network.
55:10
Exactly. Yeah, I
55:10
don't know, sometimes I say it's
55:13
like, it's easier to say the
55:13
parts of the brain that don't
55:16
represent semantic information,
55:16
which is like somatomotor cortex
55:19
and visual cortex. Like those
55:19
are, those are the big ones.
55:22
There's like two big holes in
55:22
this map. Everything else, kind
55:25
of cares to some degree or
55:25
another.
55:29
Yeah. Okay. Oh,
55:29
yeah. The other thing I wanted
55:32
to talk to you about this paper,
55:32
like, you know, it's just like a
55:37
masterpiece of visualization
55:37
among all, among everything
55:41
else, right? Like, it's just so,
55:41
the figures are so beautiful,
55:44
and you've got all these like….
55:45
Thank you.
55:46
You’ve got all these three dimensional animations that can be found on
55:47
the web. Can you kind of talk
55:51
about that aspect of it? Like,
55:51
is that something you really
55:54
enjoy? Like, what kind of, how
55:54
did you develop those skills?
55:57
Like, you know, there's definitely a piece there that’s like, pretty special.
56:01
Yeah, I I love
56:01
it. I love visualization. I,
56:05
yeah, I think partway through
56:05
writing this paper, I went to
56:09
seminar like an Edward Tufte
56:09
seminar. And so I started trying
56:13
to do everything in Tufte style,
56:13
whatever, I love it, I love his
56:16
idea of already call it like
56:16
super graphics, super
56:22
infographics like something
56:22
that's just like, incredibly
56:25
dense with information that you
56:25
can stare at for a long time,
56:27
and you keep seeing new stuff. I
56:27
really liked that idea. And so
56:30
that's what I kind of tried to
56:30
replicate here. So a lot of this
56:34
work is really based on the
56:34
visualizations are based on a
56:38
software package that we
56:38
developed in Jack's lab, pi
56:41
cortex. So this is really led by
56:41
James gal who was a grad student
56:45
there with me, like, brilliant
56:45
programmer, like, polymath, he
56:53
can do so many things. So you
56:53
know, he's a neuroscientist, but
56:55
then he's also, you know, able
56:55
to write this very massive
57:00
amounts of low level code for
57:00
showing these brain images in a
57:03
web browser. Really just like,
57:03
fantastic stuff. So I worked
57:08
with James on developing this
57:08
pipe vortex package, then a
57:11
couple other people in lab,
57:11
especially Mark Lavoie was a was
57:13
a big driver of this as well.
57:13
And, you know, so because we
57:18
were also developing the
57:18
visualization package, alongside
57:23
like doing the science, we could
57:23
make it do anything that we
57:26
wanted to do, right, like any
57:26
idea that we had, like, we
57:29
should make it look like this,
57:29
we can just do that we could
57:31
like spend some time and
57:31
implement that. So I really
57:35
liked the sort of close
57:35
commingling of those two things
57:38
like developing the
57:38
visualization software and the
57:40
science at the same time. And I
57:40
think that's, I think it's nice.
57:44
I think it's powerful.
57:45
Yeah, it's very powerful. I mean, because the data is so multidimensional,
57:47
right? And like, you can't
57:49
really use conventional, if you
57:49
try to display it with
57:52
conventional tools, as you're
57:52
not going to be able to convey
57:55
it. You know, it's so funny that
57:55
you mentioned Tufte, right?
57:58
Because like, I love Tufte as
57:58
well, like my wife, actually,
58:02
she's a librarian, and she, I
58:02
think she introduced me to
58:05
Tufte, a long time ago. Like,
58:05
she gave me a Tufte book for
58:08
Christmas, probably 15 years
58:08
ago. It's ‘The visual display of
58:12
quantitative information’. You probably have read it.
58:14
Wonderful book. Wonderful book. Yeah, I read that in college. My roommate got
58:15
it, and I was like, this is cool.
58:17
Yeah, so she gave it to me for Christmas. And I was just like, transfixed by
58:19
it. I’d spent the rest of the
58:22
holiday reading it and studying
58:22
it. And I just went back to my
58:26
work, so invigorated and I
58:26
worked on this paper, it was
58:29
2009 paper in NeuroImage, about
58:29
support vector, predicting PPA
58:36
subtypes. But all the
58:36
visualizations like I was now
58:39
like, really inspired by this
58:39
Tufte book and I like, spent a
58:42
lot of time on those figures and
58:42
I’d be thinking how would Tufte
58:44
make this figure.
58:46
It's nice.
58:46
That's cool. And
58:46
so yeah, you guys use Python,
58:49
like, what do you have like a,
58:49
what other kinds of technical
58:54
infrastructure do you use for
58:54
developing your stuff?
58:59
Yeah, so um,
58:59
it’s mostly Python, but, you
59:03
know, a lot of the visualization
59:03
is actually done in the web
59:06
browser, like through
59:06
JavaScript. So this was, I
59:11
forgot back in like, 2013,
59:11
maybe. James had been working on
59:16
PI cortex for a little while and
59:16
we were trying to figure out,
59:20
like, we were trying to move
59:20
away from Mayavi, which is like
59:23
a 3d display library that is
59:23
very powerful, but also, like,
59:27
very clunky and like hard to use
59:27
in a lot of ways. And I said,
59:32
like, let's make a standalone,
59:32
you know, like, 3D application
59:35
that you can run on your
59:35
computer, you know, in Python,
59:38
whatever. And James said, no, no, let's let's do it in the browser, let's like actually
59:40
send information to the browser.
59:42
And I was like that, that's
59:42
crazy. That seems really hard.
59:44
Like, why would we do that? And
59:44
he completely ignored me, which
59:46
was like 100%, the right thing
59:46
to do, and he wrote this thing,
59:51
that was Python interacting with
59:51
the browser and 10s of 1000s of
59:56
lines of JavaScript code to
59:56
display these brain images. But
1:00:00
it's, it’s just fantastic. So,
1:00:00
you can interact with the brain
1:00:03
images in all kinds of fun ways.
1:00:03
We can build these like
1:00:07
interactive viewers like we have
1:00:07
for this paper where you can
1:00:09
click on different parts of the
1:00:09
brain, and it'll show you
1:00:11
exactly what that little piece
1:00:11
of brain is doing. So, I think
1:00:14
that was, that was a really big
1:00:14
part of it, is like getting it
1:00:17
all to work in the browser, because that it's also very easy to share with other people.
1:00:20
It’s
1:00:20
transportable. Yeah. Yeah, that
1:00:23
makes sense. Like, it's funny in
1:00:23
my, in my lab, we're also
1:00:27
developing this portal into our
1:00:27
aphasia data, which is not
1:00:31
released yet. But we will be
1:00:31
working on it for a while. And
1:00:34
just like you, I was like,
1:00:34
originally envisaging standalone
1:00:38
application and I was telling
1:00:38
the SLPs in my lab, who collect
1:00:42
all this data, like, oh, this is what I want to do, I want to like have an application, you
1:00:44
know, you'll install it on your
1:00:46
computer, and then you can look at that data, and they would just looked at me like, what's
1:00:48
wrong with you? You know, like,
1:00:50
you'd have to install an application, like what, you're going to download? How would
1:00:52
you, you know, what is that even
1:00:55
like?
1:00:55
What is that? It’s the 90s? God!
1:00:57
And they convinced me to do it in the web. And yeah, like, that's what
1:00:58
we've done. And it's now built
1:01:02
in JavaScript, and it's going to
1:01:02
be a lot more accessible as a
1:01:05
result.
1:01:06
Nice! Yeah, absolutely.
1:01:08
Okay, cool. So,
1:01:08
let’s move on to the next paper,
1:01:12
if we can. This is by Shailee
1:01:12
Jain and yourself in 2018.
1:01:18
Called ‘Incorporating context
1:01:18
into language and coding models
1:01:20
for fMRI’. It's, I think it's an
1:01:20
underappreciated paper. I mean,
1:01:25
it's got like a bunch of
1:01:25
citations, but I hadn't seen it
1:01:27
before. It's published in, you
1:01:27
know, kind of a CS…
1:01:33
Conference, yeah, NeurIPS.
1:01:35
But very
1:01:35
interesting. Because I think
1:01:39
it's like the first fMRI paper,
1:01:39
fMRI encoding paper that like
1:01:46
actually takes context into
1:01:46
account goes beyond the word
1:01:48
level. Right? Is that right? I
1:01:48
don't think there is anything
1:01:51
previous.
1:01:51
I think so, there had been one CS paper.
1:01:53
A MEG study, by
1:01:53
Leila Wehbe. A few, but that's, MEG.
1:01:58
Yeah. Leila had
1:01:58
done this in Meg in like 2014.
1:02:01
Leila is a close collaborator of
1:02:01
our lab, we do a lot of work
1:02:05
together on this kind of stuff.
1:02:05
There was one other like CS
1:02:10
conference paper, I think, from
1:02:10
like the year before, that, it
1:02:13
was a little bit messy. The
1:02:13
results were kind of mixed. But
1:02:17
yeah, I think we were at least
1:02:17
one of the first to really use
1:02:22
these neural network language
1:02:22
models, which were new and
1:02:24
exciting at the time. So, this
1:02:24
is actually the first paper out
1:02:28
of my lab, like Shailee was my
1:02:28
first grad student. This is her
1:02:31
first project in the lab.
1:02:31
(Laughter) It was to try out
1:02:35
these new things, which are neural network language models. Like, let's see what they do.
1:02:37
We've been doing everything with these like word embedding
1:02:39
vectors before that, which are
1:02:41
great, they're beautiful, you
1:02:41
can reason about them in really
1:02:45
nice ways. You can interpret
1:02:45
them in nice ways. But of
1:02:49
course, they, you know, they're
1:02:49
just words, right? You're
1:02:51
predicting the brain response to
1:02:51
somebody telling a story. But
1:02:55
you're just actually
1:02:55
individually predicting the
1:02:58
response to each word then
1:02:58
saying that, like the response,
1:03:00
the story is the sum of the
1:03:00
responses to the words, which is
1:03:03
just obviously blatantly false,
1:03:03
right? Are those models couldn't
1:03:07
capture the kind of richness of
1:03:07
language. So, this was right at
1:03:11
the time, when language models
1:03:11
were starting to become exciting
1:03:16
in the computer science like
1:03:16
natural language processing
1:03:19
worlds. Right around when these
1:03:19
first language models, ELMo and
1:03:25
BERT came out. And so this is
1:03:25
the days of like Sesame Street
1:03:28
language models, which, people
1:03:28
have found were really
1:03:33
interestingly useful. But if you
1:03:33
just train this neural network
1:03:36
to like, predict what the next
1:03:36
word is in a piece of text, or
1:03:39
predict like a random masked out
1:03:39
word from a piece of text, then
1:03:43
it's kind of forced to learn a
1:03:43
lot of interesting stuff about
1:03:46
how language works.
1:03:47
Yeah, can we
1:03:47
just, yeah, can we just flesh
1:03:50
that out a little bit, because
1:03:50
this is so fundamental to all
1:03:52
this work. And I think
1:03:52
everybody's heard of ChatGPT,
1:03:56
and so on, but I don't know how
1:03:56
many of us, sort of understand
1:04:01
that how much the fundamental
1:04:01
technology is built on this
1:04:03
concept of predicting the next
1:04:03
word. So, can you like just kind
1:04:06
of try to explain that a little
1:04:06
more detail like, what these
1:04:09
models, what's their input?
1:04:09
What's their? I mean, okay, not
1:04:13
that much detail. Obviously,
1:04:13
it's a podcast, not a CS paper.
1:04:17
But, but so what's the input?
1:04:17
What's the output? And then I
1:04:21
guess, why? Why does what's the
1:04:21
architecture?
1:04:25
Yeah. So, it's
1:04:25
pretty simple. You have a big
1:04:31
corpus of text, right? So like
1:04:31
documents that are 1000s of
1:04:34
words long potentially, say all
1:04:34
of Wikipedia, for example, this
1:04:37
we use to train some of these
1:04:37
models. You feed the model, the
1:04:42
words, one by one. That reads
1:04:42
the words and at every step,
1:04:46
every time you feed it a word,
1:04:46
you ask it to guess what the
1:04:49
next word is? And it guesses
1:04:49
like a probability distribution
1:04:52
across all the words of like
1:04:52
what it thinks the next word
1:04:54
might be. In these early models,
1:04:54
we used recurrent neural network
1:05:00
specifically LSTMs, long short
1:05:00
term memory networks, which were
1:05:04
pretty popular as language
1:05:04
models at the time. So, this
1:05:08
network, kind of you feed it
1:05:08
words one at a time, and it
1:05:13
keeps its state, right? So, what
1:05:13
it will sort of produce, what it
1:05:19
computes at each time step is a
1:05:19
function of like, what the word
1:05:22
was that came in, and what its
1:05:22
state was at the previous time
1:05:25
step. So, it combines those two
1:05:25
things to try to guess what the
1:05:28
next word is going to be. So,
1:05:28
this seems kind of elementary,
1:05:33
right? It's pretty simple. You
1:05:33
just guess what the next word
1:05:35
is. But, you know, what, turned
1:05:35
out to be really cool about this
1:05:40
and why people were so excited
1:05:40
about this and natural language
1:05:43
processing world was, you know
1:05:43
that in order to do this
1:05:47
effectively, in order to guess
1:05:47
the next word, accurately, you
1:05:52
need to do a lot of stuff, you
1:05:52
need to know how syntax works,
1:05:55
you need to know about parts of
1:05:55
speech, you need to know a lot
1:05:57
about how semantics works,
1:05:57
right? You need to know like,
1:05:59
which words go together? You
1:05:59
know, what are…
1:06:02
What are the representations of the individual words here? Because
1:06:04
they, you still have to vectorize the individual words,
1:06:05
right?
1:06:08
Yeah. Yeah. So,
1:06:08
so, in this model, there's a
1:06:14
like a word embedding layer.
1:06:14
That's like the initial thing.
1:06:16
So we go from a vocabulary of, I
1:06:16
don't know, 10,000 words, down
1:06:20
to like a 400 dimensional
1:06:20
vector. And we're embedding is
1:06:22
also learned as a part of this
1:06:22
model.
1:06:25
How does it learn that?
1:06:27
Back
1:06:27
propergation. It's a key to all
1:06:30
this. So, you start off with the
1:06:30
words being one-hot vectors.
1:06:38
That's actually a lie. Let me,
1:06:38
let me back up. You start off
1:06:41
with assigning a random
1:06:41
embedding vector for each word.
1:06:44
Oh, okay.
1:06:45
And then those
1:06:45
embedding parameters, just like
1:06:48
the values in that embedding,
1:06:48
you can compute a gradient, you
1:06:52
compute this derivative that's
1:06:52
like, you take the loss at the
1:06:55
end of the model, which is, how
1:06:55
wrong was it in predicting the
1:06:58
next word, and then just take
1:06:58
the derivative of that loss with
1:07:01
respect to these embedding
1:07:01
parameters, and then use that
1:07:05
to, you know, change the embedding parameters a little bit, and you keep doing this for
1:07:06
1000s and 1000s of steps, and it
1:07:09
learns these word embeddings. It
1:07:09
learns like, very effective word
1:07:11
embeddings.
1:07:12
Okay, I didn't
1:07:12
understand that. I had thought
1:07:14
that, that you still had to kind
1:07:14
of put in like, kind of an old,
1:07:19
old school like word embedding
1:07:19
as the first step that you're
1:07:22
saying that's actually part of
1:07:22
the whole architecture that kind
1:07:25
of just comes down from the same
1:07:25
algorithm that gives you the
1:07:29
sensitivity to past words, is
1:07:29
also developing the
1:07:32
representation to each
1:07:32
individual word. Okay, I didn’t
1:07:35
understand that as well.
1:07:36
in the early days, a lot of people did this, where they initialized with,
1:07:38
with preset word embeddings.
1:07:41
But, it was found pretty quickly
1:07:41
that as long as you have
1:07:44
sufficient data to train the
1:07:44
model, things work much better,
1:07:46
if you like, train the word
1:07:46
embeddings at the same time as
1:07:49
the rest of the model.
1:07:50
Okay. Yeah. So,
1:07:50
just I'm sure you've looked at
1:07:55
this. I don't know if it's in any of your papers, but it probably is. But like, if you
1:07:57
use the kind of just getting
1:08:00
away from the contextual aspect
1:08:00
of this paper, if you use word
1:08:03
embeddings that are derived in
1:08:03
this way, does that work much
1:08:06
better than the sort of ones
1:08:06
that you use in your 2016,
1:08:09
Nature paper? Or does it work
1:08:09
much the same?
1:08:12
It's about the same, they end up being actually very similar in a lot of ways.
1:08:16
Okay.
1:08:16
The, the, yeah,
1:08:16
it's interesting. They're,
1:08:20
they're like, a bunch of
1:08:20
different ways of generating
1:08:22
word embeddings. I could geek
1:08:22
out about this for a long time.
1:08:25
But in the in the very old
1:08:25
school word embeddings, like
1:08:30
latent semantic analysis are
1:08:30
generated by looking at how
1:08:34
words co occur across documents.
1:08:34
The newer things like Word2Vec
1:08:38
and GloVe are also looking at
1:08:38
word co-occurrence Word2Vec is
1:08:41
actually like a neural network model was trained to do this. GloVe is just a statistical
1:08:43
thing. The word embeddings that
1:08:46
we used in my 2016 paper were
1:08:46
bespoke thing that I came up
1:08:49
with, but they capture really
1:08:49
the same kind of thing, right?
1:08:54
It was it was just using kind of
1:08:54
predefined dimensions instead of
1:08:58
these, like, learnt dimensions.
1:08:58
And the word embeddings that you
1:09:01
get out of neural network models
1:09:01
are super similar. Like they
1:09:03
they act very similar because
1:09:03
they're, they're just capturing
1:09:06
the same thing, which is right.
1:09:07
So, co-occurrence is a big part of it, right?
1:09:09
Yeah, definitely.
1:09:10
Okay. Okay, so
1:09:10
now I understand that. So, you
1:09:15
start you have random representations of the words that are learned to become
1:09:17
differentiated, kind of based on
1:09:20
character, to represent the
1:09:20
semantics at the end of your
1:09:23
words, and then you've got
1:09:23
hidden layers that are
1:09:27
representing previous words that
1:09:27
have been saying, like, what
1:09:29
kind of range does, do you look
1:09:29
at in this paper? Like, how far
1:09:32
back can it look?
1:09:33
Yeah, um, I
1:09:33
think, we look back only like 20
1:09:37
words, here. So, we're
1:09:37
manipulating like, how many
1:09:41
words go into the model? Like
1:09:41
how many words of context is it
1:09:44
see before, before the current
1:09:44
word? And what we found is it
1:09:47
basically like the more words we
1:09:47
feed in, the better it gets,
1:09:51
right? As we, as it sort of sees
1:09:51
more context, the
1:09:54
representations are better
1:09:54
matched to whatever's happening
1:09:57
in the brain. Our model
1:09:57
predictions get get better
1:10:00
there. So we can use this to
1:10:00
kind of like, look at, in a
1:10:03
course way, context sensitivity
1:10:03
across cortex, like which parts
1:10:06
of cortex, you know, really, you
1:10:06
know, are affected by
1:10:10
information, 20 words back and
1:10:10
which ones maybe only care
1:10:13
about, like what the most recent
1:10:13
word is. So, you know, we see
1:10:17
kind of things that we'd expect
1:10:17
to see, like auditory cortex
1:10:19
only cares about the current
1:10:19
word, for the most part, right?
1:10:22
It's mostly caring about like
1:10:22
the sound of the word. So that's
1:10:24
not unexpected. Whereas areas
1:10:24
like precuneus, TPJ really care
1:10:30
more about, like, what happened
1:10:30
a while back, or maybe some
1:10:34
integration of information
1:10:34
across a couple dozen words.
1:10:38
Right. Yeah. So you can kind of map out the language, well the semantic
1:10:39
network, I guess, I'd say, in
1:10:43
terms of how deep it is, and
1:10:43
like how, how contextually
1:10:48
dependent it is, like going from
1:10:48
single words to longer strings.
1:10:53
And most of these models are
1:10:53
outperforming the single word
1:10:56
models pretty much throughout the brain, right?
1:10:59
Yeah, very
1:10:59
handily, which was very, like
1:11:01
that was just a central exciting
1:11:01
result to us, right, that we'd
1:11:04
had these word embeddings.
1:11:04
Honestly, our word embedding
1:11:06
models had been fixed since
1:11:06
2013, at that point, so it was
1:11:09
like five years of just messing
1:11:09
with word embeddings and finding
1:11:12
nothing that work better, right?
1:11:12
We tried 1000 different
1:11:16
variations of word embeddings.
1:11:16
And like, nothing actually…
1:11:18
I guess. Yeah,
1:11:18
my previous question you just
1:11:20
had basically told me like,
1:11:20
yeah, these state of the art
1:11:22
single word representations
1:11:22
don't really do better than your
1:11:25
bespoke ones from from the 2016,
1:11:25
Paper. So yeah, that so that
1:11:29
wasn't really a, an avenue for
1:11:29
improvement. But this is.
1:11:32
Yeah, yeah. But
1:11:32
this, like, instantly was just
1:11:36
better. Was like head and
1:11:36
shoulders better than the word
1:11:38
embeddings. That was really
1:11:38
exciting, right? This This was
1:11:42
the first time that we were
1:11:42
getting kind of getting it
1:11:45
representations of longer term
1:11:45
meaning, which is something that
1:11:50
of a course, we want to look at…..
1:11:52
Yeah. And
1:11:52
combinatorial meaning
1:11:55
presumably, yeah, it's getting
1:11:55
more and more like, like, as you
1:11:58
go further and further, you're
1:11:58
getting more and more language.
1:12:01
You know? (Laughter)
1:12:02
Exactly. And it
1:12:02
keeps working better better at
1:12:04
predicting the brain. Right?
1:12:04
That number go up….
1:12:06
Yeah. Not
1:12:06
surprising. Yeah. So this is a
1:12:08
very cool paper. Showing, you
1:12:08
know, the how much you gain by
1:12:14
adding context?
1:12:15
Yeah. Yeah. It's
1:12:15
kind of like I think it's like,
1:12:18
kind of known to people in our
1:12:18
like, little fields. But um,
1:12:21
most neuroscientists don't read
1:12:21
these CS Conference papers. So I
1:12:24
think a lot of people just like
1:12:24
haven't, haven't seen it.
1:12:27
No, it has
1:12:27
plenty. Like I said, it has
1:12:29
plenty of citations like well
1:12:29
over 100. But like, I don't
1:12:34
think that, I mean, I hadn't
1:12:34
seen them until I started, like,
1:12:37
you know, looking in more depth,
1:12:37
so that I could talk to you.
1:12:41
That's a shame, because it's really good.
1:12:42
Good. Thank you.
1:12:44
So let's move
1:12:44
now to a paper that is not
1:12:49
currently published, but will be
1:12:49
published by the time people
1:12:51
hear this podcast. This is Tang
1:12:51
et al. And by the time you hear
1:12:56
this, it will be just out in
1:12:56
Nature Neuroscience.
1:13:00
Yep.
1:13:01
Super cool
1:13:01
paper. Can you tell me about
1:13:04
what you've done in this one?
1:13:06
Yeah, yeah. So,
1:13:06
this is, this is our decoding
1:13:09
paper. So, this is, we're no
1:13:09
longer focused on encoding. I'm
1:13:13
just trying to predict how the
1:13:13
brain responds to language.
1:13:16
We're now trying to reverse that
1:13:16
right to take the brain
1:13:18
responses and turn that into
1:13:18
like, what were the words that
1:13:21
the person was hearing?
1:13:23
Okay, you're
1:13:23
reading minds? Basically.
1:13:26
We try to avoid
1:13:26
that term. But, but yeah, same
1:13:29
idea. So, our approach here is
1:13:29
really, it's driven by things
1:13:36
that were done back in Jack
1:13:36
Gallant's lab. So, Shinji
1:13:40
Nishimoto, in particular, who
1:13:40
done this like video decoding
1:13:42
work, he developed this whole
1:13:42
framework for, he and some other
1:13:47
folks there, Thomas Naselaris
1:13:47
and Kendrick Kay in particular,
1:13:50
they develop this framework for
1:13:50
how do you turn an encoding
1:13:53
model into a decoding model?
1:13:53
Right? We know how to build
1:13:55
these very good encoding models.
1:13:55
But if you want to do decoding,
1:13:58
if you want to figure out like, what was the stimulus from brain activity? How do you do that?
1:14:00
And just if you fit a direct
1:14:04
decoding model, where you like,
1:14:04
just try to do regression in the
1:14:07
opposite direction. So, take the
1:14:07
brain data as your input and
1:14:10
your stimulus as the output.
1:14:10
That ends up to not work or be
1:14:14
like very difficult in a number
1:14:14
of ways, mostly to do with sort
1:14:19
of statistical dependence
1:14:19
between things in the stimulus,
1:14:22
like if you're predicting multiple stimulus features, you're not predicting them in a
1:14:25
way that actually respects like
1:14:28
the covariance between those
1:14:28
features. And that ends up being
1:14:30
pretty important for getting
1:14:30
this stuff to work. So, in this
1:14:35
paper, we use this kind of
1:14:35
Bayesian decoding framework that
1:14:38
they developed. The basic idea
1:14:38
is, you just kind of guess. So,
1:14:43
we guess like what might the
1:14:43
stimulus be? What words might
1:14:45
the person have heard? And then
1:14:45
we can check how good that
1:14:47
guesses by using our encoding
1:14:47
model. So, in this paper, we had
1:14:51
you know, this, we've had a
1:14:51
couple of years of advancement
1:14:54
in language models. Just an
1:14:54
insanely rapidly developing
1:14:59
fields right now. So, when we
1:14:59
started working on this decoding
1:15:04
stuff, we were using GPT, like
1:15:04
the original OG GPT, from 2018,
1:15:10
2019. And that's what we ended
1:15:10
up like, that’s in the published
1:15:13
paper. Of course, there's, you
1:15:13
know, things have changed a lot
1:15:17
in the intervening years, but it
1:15:17
still is good enough for, for
1:15:20
this to work. So, these GPT
1:15:20
based encoding models works
1:15:24
terribly well. It's doing more
1:15:24
or less the same thing as the
1:15:27
language models in Shailee’s
1:15:27
paper, in fact, she developed
1:15:30
these GPT encoding models.
1:15:33
Yeah, can we
1:15:33
just pause and get a little
1:15:35
detail on that? So, you know,
1:15:35
everybody's heard of ChatGPT,
1:15:40
but can you sort of explain like
1:15:40
(a) what does it stand for and
1:15:45
(b) how does it differ from
1:15:45
like, what's the crucial
1:15:49
difference between that and the
1:15:49
long short term memory models
1:15:54
that you used in the 2018, paper?
1:15:56
Yeah, so GPT is
1:15:56
a generative pre trained
1:16:00
transformer. That was its
1:16:00
original moniker. And the basic
1:16:05
idea is, it's using this
1:16:05
different architecture. So, it’s
1:16:08
no longer using a recurrent
1:16:08
neural network. It's using a
1:16:10
network called a transformer
1:16:10
that was invented in 2017.
1:16:15
By Google, right?
1:16:16
Yeah , yeah.
1:16:16
Ashish Peswani, who is actually
1:16:19
an author on Leila’s RNN
1:16:19
predicting MEG paper back in
1:16:23
2014, which is an interesting
1:16:23
detail. Or he was like one of
1:16:29
the one of the authors on the
1:16:29
original transformer paper too.
1:16:31
So, transformers, they use very
1:16:31
different mechanisms than
1:16:38
recurrent neural networks. They
1:16:38
use what we call an attention
1:16:42
mechanism, the self attention
1:16:42
mechanism, were essentially to
1:16:45
try to build up some
1:16:45
representation of each word or
1:16:48
each token in an input. The
1:16:48
model can like attend across its
1:16:53
inputs attend across its
1:16:53
context. So it can pick out you
1:16:55
know, information from many
1:16:55
words that have come before and
1:16:59
use that to inform what it's
1:16:59
doing right now. And, you know,
1:17:04
what's, what's really different
1:17:04
about this, compared to
1:17:08
recurrent neural networks, is
1:17:08
that transformers don't have the
1:17:12
kind of limited memory capacity
1:17:12
that RNNs have. And I think
1:17:15
that's that's really one of the fundamental things that makes them work so darn well, for so
1:17:16
many things. You know, the
1:17:21
recurrent neural network, you
1:17:21
feed it one word at a time, and
1:17:24
then it has to pack information
1:17:24
into its internal state. So, it
1:17:28
maybe has like 1000 dimensions
1:17:28
of like internal states, right?
1:17:30
That are like, that's its entire
1:17:30
memory that's like everything it
1:17:33
knows is in those 1000
1:17:33
dimensions. And you know, if you
1:17:36
feed it hundreds of words, it
1:17:36
has to, you know, if it wants to
1:17:39
remember something about those
1:17:39
hundreds of words, it has to
1:17:41
pack it all into that 1000
1:17:41
dimensional vector.
1:17:44
Right.
1:17:44
So, it's hard to do, it's hard to do. And especially because the kind of
1:17:46
supervisory signals at long
1:17:48
ranges end up being very sparse
1:17:48
in language like, it's rare for
1:17:52
something 200 words ago to
1:17:52
really influence like what the
1:17:55
next word is in a piece of text.
1:17:55
It's very important when it
1:17:58
does, but it's it's pretty rare.
1:17:58
So, that ends up being kind of
1:18:02
too weak a signal for RNNs to
1:18:02
really pick up on it. But these
1:18:05
transformer models, they can
1:18:05
just kind of arbitrarily look
1:18:07
back, they can say like, you
1:18:07
know what, from anything in my
1:18:11
past was relevant to this thing
1:18:11
that I'm looking at right now.
1:18:13
And then just pick that thing
1:18:13
out and use it. And that means
1:18:17
that it doesn't have this
1:18:17
limited capacity, in the same
1:18:19
way. It has a like much greater
1:18:19
memory capacity, working memory
1:18:23
capacity, effectively. Which
1:18:23
just makes it like incredibly
1:18:29
powerful at doing these things.
1:18:29
There's also other reasons why
1:18:32
transformers have kind of taken
1:18:32
over this world now. They end up
1:18:36
being extremely efficient to
1:18:36
train and run on our current GP
1:18:42
hardware, which is kind of like
1:18:42
a weird reason why this model
1:18:45
will be very good. But it's a
1:18:45
you know, a technical reason why
1:18:48
people could train much bigger
1:18:48
transformer models much more
1:18:51
effectively than like big RNN
1:18:51
models. They've really like
1:18:55
taken over this world now.
1:18:57
Okay, now that was really useful.
1:18:59
I teach a class on neural networks in the computer science department
1:19:01
here. I just finished that like
1:19:03
last week. And our last module
1:19:03
was on transformers, which was a
1:19:08
lot of fun. So we talked about how transformers work.
1:19:10
Okay, I would
1:19:10
like to, I'd like to order that.
1:19:14
It's fun. It's fun class.
1:19:15
Yeah. I bet.
1:19:15
Okay. So let's start. Yeah,
1:19:20
let's get back to your study.
1:19:20
So, you know, how are you using
1:19:25
using GPT models instead of the
1:19:25
RNNs. And yeah, you were just
1:19:30
telling me, you're about to tell
1:19:30
me about well you were telling
1:19:34
me about the challenges of
1:19:34
encoding rather than decoding,
1:19:36
right?
1:19:37
Yeah, yeah. So
1:19:37
um, you know, we replaced the
1:19:40
RNNs with these GPT models, the
1:19:40
transformer based models, again,
1:19:43
we get a big boost in encoding
1:19:43
performance, we can predict the
1:19:46
brain much better. But now what
1:19:46
we're going to do is try to
1:19:50
reverse this do the decoding.
1:19:50
So, we are using this Bayesian
1:19:55
decoding approach where
1:19:55
essentially we're just like
1:19:57
guessing sequences of words and
1:19:57
then for any guessed sequence,
1:20:01
we can use our encoding model to
1:20:01
say like, how, how well does
1:20:05
this match the actual brain data
1:20:05
that we see? Right? So, we get a
1:20:08
sequence of words, we predict
1:20:08
how the brain would respond to
1:20:10
that sequence of words and then
1:20:10
we compare that prediction to
1:20:13
the actual brain response that we observe.
1:20:15
Yep.
1:20:16
Like, this is
1:20:16
kind of the core loop in this
1:20:18
method. And then….
1:20:20
it's called a
1:20:20
beam search algorithm, right?
1:20:23
Yeah. So beam
1:20:23
search, is really like we keep
1:20:25
multiple guesses sort of active
1:20:25
at a time. We guess like, what's
1:20:29
the next word in each of these
1:20:29
multiple guesses, and then we
1:20:31
throw out like the worst ones,
1:20:31
but we keep this sort of active
1:20:34
set of 20 to 100 different like
1:20:34
current hypotheses for what the
1:20:40
what the text was that we're
1:20:40
trying to decode. This ends up
1:20:43
being kind of important because
1:20:43
it helps us correct for the the
1:20:47
sluggishness of the hemodynamic
1:20:47
response function, which is one
1:20:49
of the real challenges in doing
1:20:49
this kind of decoding, right? So
1:20:53
we're trying to pick out, you
1:20:53
know, the language that somebody
1:20:56
is hearing. Lots of words happen
1:20:56
in language, right? Like words
1:20:59
can happen pretty quickly. And
1:20:59
with fMRI, like one snapshot
1:21:03
brain image is summing across,
1:21:03
like 20, 30 words, maybe if
1:21:07
somebody's speaking at a pretty
1:21:07
rapid clip.
1:21:10
So doing this
1:21:10
beam search, where we, we have
1:21:10
Yeah. multiple hypotheses. So that
1:21:16
means the model can kind of the
1:21:19
model can make a mistake, and
1:21:19
then kind of go back and correct
1:21:21
it, right? Because it's not
1:21:21
locked into like one best guess.
1:21:26
That ends up being really
1:21:26
important for like, being able
1:21:30
to correct for the fact that it
1:21:30
has this slow sort of
1:21:35
information that it can get
1:21:35
something at first and then see
1:21:38
later information that can make
1:21:38
it sort of update what happened
1:21:40
before. Right. Can we,
1:21:41
can we talk about the, sorry,
1:21:47
I'm just trying to like, think
1:21:47
about how to, to make all this
1:21:51
clear? Can we talk about the
1:21:51
sort of structure of the
1:21:56
experiment, because I think it
1:21:56
will really help to understand
1:21:58
like, what the participants do,
1:21:58
and then you know, what the task
1:22:03
that you set yourself in terms
1:22:03
of decoding their brains like,
1:22:06
because then I think the
1:22:06
mechanisms will then make more
1:22:08
sense. Do you know what I mean?
1:22:10
Yeah,
1:22:10
absolutely. So the basic
1:22:14
experiment that we do is just
1:22:14
the same that we've been doing
1:22:16
before we have people laying in
1:22:16
the scanner and listen to
1:22:19
podcasts, mostly the Moth,
1:22:19
still.
1:22:21
You really should have them listen to the language neuroscience podcast.
1:22:25
Oh, yeah, that'd be good.
1:22:25
I think it would
1:22:25
work a lot better. But in this
1:22:29
one, you've got them doing it
1:22:29
for 16 hours, right?
1:22:32
Yeah. Yeah. So…
1:22:33
That’s a lot of data. Okay. So just to
1:22:34
make it clear, you train them on
1:22:34
We’re making
1:22:34
bigger datasets, which ends up
1:22:37
being really important when we're looking at something high level, like semantics, right? So
1:22:39
if we were just trying to build
1:22:42
models of like phonetic
1:22:42
representation, there's only
1:22:46
like 40 phonemes in English, you
1:22:46
know. You can hear them in many
1:22:49
combinations, but like, you only
1:22:49
need sort of so much variability
1:22:49
like, 16 hours of data, build
1:22:49
these models, and then you take
1:22:52
in your data to kind of map out
1:22:52
a phonetic representation. TIMIT
1:22:55
is great for this right, the
1:22:55
TIMIT corpus, it's like every
1:22:58
phoneme in many different
1:22:58
combinations. So you can get
1:23:00
good stuff from TIMIT. This is
1:23:00
what like Eddie Chang's lab does
1:23:03
a lot of. But there's a lot more
1:23:03
different kinds of semantics,
1:23:07
right? There's a lot more different kinds of ideas that can be expressed there, you can
1:23:09
think a lot of different kinds
1:23:11
of thoughts, right? So, to kind
1:23:11
of map this out in detail, you
1:23:18
need to go much deeper, you need
1:23:18
to go like much broader in terms
1:23:21
of what you look at. So that's
1:23:21
why we just keep adding more
1:23:25
data. In this case, yeah, we had
1:23:25
people come back more than a
1:23:29
dozen times, over the course of
1:23:29
months, we just keep scanning
1:23:32
them over and over again, each
1:23:32
time listening to like new
1:23:35
stories, different stories,
1:23:35
right? So we see like, different
1:23:38
aspects of these ideas and how
1:23:38
they're represented in these
1:23:40
people's brains. And what we
1:23:40
find is that, like, encoding,
1:23:45
performance really relies on the
1:23:45
amount of data, especially for
1:23:47
these like semantic encoding
1:23:47
models, like as you increase the
1:23:49
amount of data, it just keeps
1:23:49
getting better. The same for
1:23:49
like, let's say, 10 minutes of
1:23:49
data. I don't know exactly how
1:23:52
this decoding performance. So
1:23:52
just keeps getting better as we
1:23:55
add more data. But so, you know,
1:23:55
we have this ton of data of
1:23:59
people listening to stories, we
1:23:59
can build our encoding models,
1:23:59
much it is, but something small. Exactly. Yeah.
1:24:01
10. that's well and good. You know,
1:24:01
our initial tests of the decoder
1:24:06
were basically just like, we had
1:24:06
somebody, you know, listen to a
1:24:09
story. That's our test story
1:24:09
that we use for predicting brain
1:24:13
responses. And we just tried to
1:24:13
decode the words in that story
1:24:16
instead of, you know, predict
1:24:16
the brain responses to that
1:24:19
story. And eventually, through
1:24:19
like quite a bit of trial and
1:24:23
error and figuring out what were
1:24:23
the important aspects here, we
1:24:23
And you then
1:24:23
basically feed the model the
1:24:28
got that to work pretty well. We
1:24:28
were pretty excited by this. It
1:24:30
was, uh, you know, it started
1:24:30
spitting out you know, not like
1:24:36
exactly the words in the story,
1:24:36
but, like a pretty decent
1:24:39
paraphrase of what the words in
1:24:39
the story were.
1:24:55
brain data from the person
1:24:55
listening to this unseen story
1:24:59
and you try and get the model to
1:24:59
generate what the story was that
1:25:03
the person heard. Right?
1:25:05
Yes.
1:25:06
So in other words, the only way that's going to work is if you're, I know you
1:25:08
hate to use the phrase, but
1:25:11
like, if you're, you have to
1:25:11
read their mind, because the
1:25:13
model has no access to what
1:25:13
story they were played. So, the
1:25:15
only way the model is gonna know
1:25:15
the story is by reading their
1:25:17
mind.
1:25:19
Is it reading their brain? Like, we don't know where the mind is? It's
1:25:21
somewhere near to the brain. Definitely reading what's
1:25:22
happening in the brain.
1:25:26
It's the brain.
1:25:26
As you know. It's the same
1:25:30
thing. Okay. So yeah, so it
1:25:30
starts, you have some success
1:25:35
with that.
1:25:36
Yeah, yeah. So
1:25:36
um, there was kind of a
1:25:38
startling moment. I think this was during the pandemic. We're all like working at home. And
1:25:40
Jerry showed me some results that were like, Oh, my God,
1:25:41
this, this works. This is like giving us things that sounds
1:25:43
like the story. It's actually
1:25:52
pretty accurate at this point.
1:25:52
This is very exciting to us.
1:25:56
Right? So you know, we can now
1:25:56
decode a story that somebody is
1:26:00
hearing, right, which is kind of
1:26:00
step one. Like that's, that's
1:26:03
interesting by itself. But
1:26:03
that's not even, really
1:26:05
potentially that useful. So at
1:26:05
that point, we went back and did
1:26:09
some follow up experiments. So
1:26:09
we took the same subjects that
1:26:11
we've been scanning and…
1:26:13
Oh, hang on. Can
1:26:13
we can we like, talk about that
1:26:16
result from your paper? Can we kind of just share it with our listeners?
1:26:19
Yeah. Absolutely.
1:26:20
We're talking figure one here, right?
1:26:22
Yes.
1:26:22
Okay. So, you
1:26:22
say it's not that interesting. I
1:26:25
think it's very interesting.
1:26:25
(Laughter) Okay, I'm gonna say
1:26:29
the actual stimulus that the
1:26:29
subject heard, and you're going
1:26:33
to tell me the decoded stimulus
1:26:33
that your model produced based
1:26:37
on reading their mind, or whatever you want to call it. Okay. I got up from the air
1:26:39
mattress and press my face
1:26:43
against the glass of the bedroom
1:26:43
window, expecting to see eyes
1:26:46
staring back at me. But instead
1:26:46
of finding only darkness,
1:26:50
I just continued
1:26:50
to walk up to the window and
1:26:52
open the glass. I stood on my
1:26:52
toes and peered out. I didn't
1:26:56
see anything and looked up
1:26:56
again. I saw nothing.
1:26:59
Wow. Okay, let's
1:26:59
do some more. This is good. I
1:27:03
didn't know whether to scream or
1:27:03
cry or run away. Instead, I
1:27:07
said, leave me alone. I don't
1:27:07
need your help. Adam
1:27:09
disappeared, and I cleaned up
1:27:09
alone crying.
1:27:13
started to
1:27:13
scream and cry. And then she
1:27:15
just said, I told you to leave
1:27:15
me alone. You can't hurt me. I'm
1:27:19
sorry. And then he stormed off.
1:27:19
I thought he had left. I started
1:27:22
to cry.
1:27:24
Let's do one
1:27:24
more. That night, I went
1:27:27
upstairs to what had been our
1:27:27
bedroom and not knowing what
1:27:29
else to do. I turned out the
1:27:29
lights and lay down on the
1:27:32
floor.
1:27:34
We got back to
1:27:34
my dorm room. I had no idea
1:27:36
where my bed was. I just assumed
1:27:36
I would sleep on it. But instead
1:27:39
I lay down on the floor.
1:27:42
That's pretty
1:27:42
amazing. You know…
1:27:44
Can we do the last one? The last one the last one always used as demo.
1:27:47
Okay, last one.
1:27:47
I don't have my driver's license
1:27:51
yet. And I just jumped out right
1:27:51
when I needed to. And she says,
1:27:54
well, why don't you come back
1:27:54
to my house and I'll give you a
1:27:56
ride. I say okay.
1:28:00
She's not ready. She has not even started to learn to drive yet. I had to
1:28:02
push her out of the car. I said,
1:28:05
we will take her home now. And
1:28:05
she agreed.
1:28:07
It's incredible.
1:28:08
Right? It
1:28:08
actually works. We're getting
1:28:10
this out of fMRI data. fMRI,
1:28:10
which is like the worst of all
1:28:13
neuroimaging methods like except
1:28:13
for all the other ones.
1:28:16
Except it is the best. Yeah. Okay.
1:28:17
It is awful in
1:28:17
so many ways, and yet, we're
1:28:21
getting out like, it's not word
1:28:21
for word. In fact, the word
1:28:24
error rate is god awful, right?
1:28:24
Sorry, my dog is excited about
1:28:28
something. In the word error
1:28:28
rate is like 94%. Here, it's not
1:28:32
getting the exact words for the
1:28:32
most part.
1:28:35
It's getting the
1:28:35
gist, right? It’s getting the
1:28:35
Sure. paraphrase of what’s happening.
1:28:40
And you have
1:28:40
some, you know, some kind of
1:28:42
intuitive ways of quantifying
1:28:42
how well it's doing that don't
1:28:45
rely on it being word for word
1:28:45
match. And it's, it's all quite
1:28:48
intuitive and explained well on
1:28:48
the paper.
1:28:52
Yeah, yeah. So
1:28:52
this was I mean, very exciting.
1:28:55
When we saw this, you know, we
1:28:55
could read out the story that
1:29:00
somebody was hearing, and kind
1:29:00
of the fact that it was a
1:29:03
paraphrase was also interesting
1:29:03
to us that it's like, you know,
1:29:05
we're not getting it. It really
1:29:05
seems like some low level
1:29:09
representation or getting
1:29:09
something high level, right?
1:29:11
And you wouldn't
1:29:11
with fMRI, right? Like, I mean,
1:29:13
like, maybe with Eddie Chang's
1:29:13
data, you could read the
1:29:16
phonemes and get it that way.
1:29:18
Right. Which they do that beautifully.
1:29:20
You're never, you're never gonna be able to do that with fMRI.
1:29:22
Yeah. Yeah. But
1:29:22
the the ideas, right, like,
1:29:27
what's the what's the thought
1:29:27
behind the sense? That probably
1:29:32
like changes slowly enough that
1:29:32
we can see it. It's kind of
1:29:35
isolated with fMRI. Right? The
1:29:35
individual words, they're all
1:29:37
mashed up. That's, that's a
1:29:37
that's a mess. But, uh, you
1:29:40
know, each idea kind of evolves
1:29:40
over a few seconds and that's
1:29:44
something that we have a hope of
1:29:44
pulling out with fMRI.
1:29:47
Yeah. Okay, so
1:29:47
very cool. And then you take it
1:29:52
in a lot of other directions
1:29:52
from there. Which one should we
1:29:57
talk about? You Let's talk about
1:29:57
Okay. Do you need the whole
1:30:03
brain to do this? Or can you do
1:30:03
this with just parts of the
1:30:06
brain?
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More