Episode Transcript
Transcripts are displayed as originally observed. Some content, including advertisements may have changed.
Use Ctrl + F to search
0:02
Music
0:30
Welcome to another episode of Conversations with
0:32
Coleman. If you're hearing this, then you're
0:34
on the public feed, which means you'll get episodes
0:36
a week after they come out and you'll hear advertisements.
0:39
You can get access to the subscriber feed by going
0:42
to ColemanHughes.org and becoming a supporter.
0:44
This means you'll have access to episodes a week early.
0:47
You'll
0:47
never hear ads
0:48
and you'll get access to bonus Q and A episodes.
0:51
You can also support me by liking and subscribing
0:54
on YouTube and sharing the show with friends and family.
0:56
As always, thank you so much
0:58
for your support.
1:01
Music Welcome
1:03
to another episode of Conversations with Coleman.
1:06
Today's episode is a round table discussion
1:08
about AI safety with Eliezer
1:10
Yudkowsky, Gary Marcus, and Scott
1:13
Aronson. Eliezer Yudkowsky is a prominent
1:15
AI researcher and writer
1:17
known for co-founding the Machine Intelligence
1:20
Research Institute, where he spearheaded
1:22
research on AI safety. He's also
1:24
widely recognized for his influential writings
1:27
on the topic of rationality. Scott
1:29
Aronson is a theoretical computer scientist
1:31
and author celebrated for his pioneering
1:33
work in the field of quantum computation.
1:36
He's also the chair of CompSci at U of T
1:38
Austin, but is currently taking a leave of absence
1:41
to work at OpenAI. Gary
1:42
Marcus is a cognitive scientist,
1:45
author, and entrepreneur known for
1:47
his work at the intersection of psychology, linguistics,
1:50
and AI. He's also authored several
1:52
books, including Kluge and
1:55
Rebooting AI, Building AI
1:57
We Can Trust. This episode is all about
1:59
AI safety.
1:59
We talk about the alignment problem. We
2:02
talk about the possibility of human extinction
2:04
due to AI. We
2:05
talk about what intelligence actually is.
2:08
We talk about the notion of a singularity or
2:10
an AI take-off event and much more.
2:13
It was really great to get these three guys in the same
2:15
virtual room. And I think you'll find that this conversation
2:18
brings something a bit fresh to a topic
2:20
that has admittedly been beaten to death
2:22
on certain corners of the internet. So without
2:24
further ado, Eliezer Jutkowski,
2:27
Gary Marcus, and Scott Aronson.
2:30
Thanks so much for coming on my show. Thank you.
2:32
Thanks for having us. So
2:33
the topic of today's conversation
2:35
is AI safety. And this is something that's been in the
2:37
news lately. We've seen experts and CEOs signing letters recommending
2:40
public policy surrounding regulation. We
2:47
continue to have the debate between people that really feel like they're
2:49
going to be continue
2:55
to have the debate
2:57
between people that really
3:00
fear AI is going to end the
3:02
world and potentially kill
3:03
all of humanity and the people who feel
3:05
that those fears are overblown.
3:09
And so
3:11
this is going to be sort of a roundtable conversation
3:13
about that. And you three are
3:16
really three of the best people in the world to talk
3:18
about it with. So thank you all for doing this.
3:20
Let's just start out with you,
3:22
Eliezer, because you've been one of the most really
3:27
influential voices getting people
3:29
to take seriously the possibility that AI
3:31
will kill us all. You know, why is
3:33
AI going to destroy us? Chat GPT seems
3:35
pretty nice. I use it every day. What's
3:38
the big fear here? Make the case. Well, chat
3:40
GPT seems quite unlikely to kill everyone
3:42
in its present state. AI capabilities
3:45
keep on advancing and advancing. The question
3:47
is not can chat GPT kill us? The
3:50
answer is probably
3:50
no. So as long as that's true,
3:53
as long as it hasn't killed us yet, they're
3:55
just going to engineer. It's just going to keep pushing the capabilities.
3:58
There is no obvious blocking point. We don't
4:01
understand the things that we
4:03
build. The AIs are grown more
4:05
than built, you might say. They end up
4:07
as giant inscrutable matrices of floating
4:09
point numbers that nobody can decode. It's
4:12
probably going to end up technically difficult to
4:14
make them want particular things and not
4:16
others. And where
4:18
people are just charging straight ahead. So
4:21
at this rate, we end up with something that is smarter
4:23
than us, smarter than humanity, that
4:26
we don't understand, whose preferences
4:28
we could not shape. And by default,
4:31
if that happens, if you have something around that is like, much
4:33
smarter than you and does not care about you one way
4:36
or the other, you probably end up dead at the end of
4:38
that. It's the way
4:40
it gets the most of whatever strange inscrutable
4:43
things that it wants are worlds
4:45
in which there are not humans taking
4:47
up space, using up resources,
4:50
building other AIs to compete with it, or just
4:52
a world in which you built enough power plants that
4:54
the surface of the earth got hot enough that humans didn't
4:57
survive. Gary, what do you have to say about
4:59
that? There are parts that
5:00
I agree with and parts that I don't. So I agree
5:03
that we are likely to wind up with AIs
5:06
that are smarter than us. I don't think we're particularly
5:08
close now, but in 10 years or 50 years
5:10
or 100 years at some point, it
5:13
could be a thousand years, but it will happen. I
5:15
think there's a lot of anthropomorphization
5:17
there about machines wanting things. Of
5:20
course they have objective functions and
5:22
we can talk about that. I think it's a presumption
5:24
to say that the default is that
5:27
they're gonna want something that leads to our demise
5:30
and that they're gonna be effective at that and
5:32
be able to literally kill us all.
5:34
I think if you look at the history of AI,
5:37
at least so far, they don't really have
5:39
wants beyond what we program them to
5:41
do. There is an alignment problem. I think
5:43
that that's real in the sense of like,
5:46
people program the system to do X and
5:48
they do X prime. That's kind of like X, but not
5:50
exactly. And so I think there's really things
5:52
to worry about. I think there's a real research
5:54
program here that is under-researched,
5:57
which is the way I would put
5:59
it. We want to understand how to make machines
6:02
that have values. Asimov's laws are way
6:04
too simple, but they're a kind of starting point for conversation.
6:07
We want to program machines that don't
6:09
harm humans. They can calculate the consequences
6:12
of their actions. Right now we have technology
6:14
like GPT-4 that has no idea what the consequences
6:17
of its actions are. It doesn't really
6:19
anticipate things. And there's a separate thing that Eliezer
6:21
didn't emphasize, which is it's not just how smart
6:23
the machines are, but how much power we give them, how
6:26
much we empower them to do things like
6:28
access the internet or manipulate
6:29
people or write
6:33
source code, access files and stuff like that.
6:36
And right now, auto-GPT can do all of those things,
6:38
and that's actually pretty disconcerting to me. To
6:40
me, that doesn't all add up to any
6:42
kind of extinction risk anytime
6:45
soon. But catastrophic risk, where things
6:47
go pretty wrong because we wanted
6:49
these systems to do X and we
6:51
didn't really specify it well, they don't really understand
6:53
our intentions. I think there are risks
6:55
like that. I don't see it as a default that
6:57
we wind up with extinction. I think it's pretty hard
7:00
to actually terminate the entire human
7:02
species. You're going to have people in Antarctica
7:05
that are going to be out of harm's way or whatever, or
7:07
you're going to have some people who respond
7:09
differently to any pathogen, et cetera. So
7:11
extinction is a pretty extreme
7:14
outcome that I don't think is particularly likely.
7:17
But the possibility that these machines
7:19
will cause mayhem because we don't know
7:21
how to enforce that they do what we want
7:23
them to do, I think that's a real thing to worry about. And
7:26
it's certainly worth doing research on.
7:28
Scott, how do you view this? Yeah, so I'm sure
7:30
that you can get the three of us arguing about
7:32
something, but I think you're going to get agreement
7:34
from all three of us that AI safety
7:37
is important and that catastrophic
7:40
outcomes, whether or not that
7:43
means literal human extinction are possible.
7:46
I think it's become apparent
7:49
over the last few years that this
7:52
century is going to be
7:55
largely defined by
7:58
our interaction.
7:59
with AI, that AI
8:02
is going to be transformative for human
8:05
civilization.
8:07
And I'm confident
8:10
of that much. And if you ask
8:12
me almost anything beyond that about
8:15
how is it going to transform civilization,
8:17
will it be good? Will it be bad? What
8:19
will the AI want? I
8:21
am pretty agnostic, just
8:24
because if you had asked
8:26
me 20 years ago to try to forecast
8:28
where we are now, I would have
8:31
gotten a lot wrong. My
8:34
only defense is I think that
8:37
all of us here, almost everyone in
8:39
the world, would have gotten a lot wrong
8:42
about where we are now. And so if I
8:44
try to envision where we are in 2043,
8:47
does the AI want to replace
8:49
humanity with
8:55
something better? Does it want to keep
8:57
us around as pets? Does
9:01
it want to just continue
9:04
helping us out? Just
9:08
a super souped up version of chat GPT?
9:11
I think all of those scenarios
9:14
merit consideration. But
9:16
I think that what has happened
9:18
in the last few years that's
9:20
really exciting is that AI
9:23
safety has become an empirical
9:25
subject. There are these very powerful
9:27
AIs that are now being deployed, and
9:30
we can actually learn something. We
9:33
can work on
9:36
mitigating the nearer term harms,
9:37
not because the
9:41
existential risk doesn't exist or
9:46
is absurd or is science fiction or anything
9:48
like that, but just because the nearer term
9:50
harms are the ones that we can see right
9:53
in front of us and where we can actually get feedback
9:56
from the external world about how we're
9:58
doing. We can learn something.
9:59
And hopefully some of the knowledge
10:02
that we gain will be useful in
10:05
addressing the longer term risks that
10:07
I think Eliezer is very rightly worried about. So
10:09
there seems to me there's alignment and
10:11
then there's alignment, right? So there's alignment
10:14
in the sense that we haven't even fully
10:16
aligned smartphone technology with our interests,
10:18
right? Like there are some ways in which
10:21
smartphones and social media have
10:24
led to probably deleterious mental health
10:26
outcomes, especially
10:28
for teenage girls,
10:29
for example. So there
10:32
are those kinds of mundane senses
10:34
of alignment where it's like, is this technology
10:37
doing more good than
10:39
harm in the normal everyday public
10:41
policy sense? And then there's the capital
10:43
A alignment of, are we creating
10:46
a creature that is going to view
10:48
us like ants and have no
10:51
problem extinguishing us, and
10:56
whether intentional or not. So it
10:58
seems to me all of you agree
10:59
that the first sense of alignment
11:02
is at the very least something to worry about now and
11:04
something to deal with. But I'm
11:06
curious to what extent you think the really
11:09
capital A sense of alignment is
11:11
a real problem because it can sound very much
11:13
like science fiction to people. So
11:16
maybe let's start with Eliezer. I
11:19
mean, from my perspective, I would say that
11:22
if we had a solid guarantee that
11:24
I was going to do no more harm
11:26
than social media, we ought to plow
11:28
ahead and get all the gains. It's not
11:31
enough harm to back this amount of
11:33
harm that social media has done to humanity while
11:35
very significant in my view. I think it's done
11:37
like a lot of damage to our sanity, but that's
11:40
just like not a large enough harm
11:42
to justify either forgoing
11:44
the gains that you could get from A.I. If
11:47
that was going to be the worst downside
11:49
or to justify the kind of drastic measures you'd
11:51
need to stop plowing ahead on A.I.
11:54
I think that the capital A alignment
11:56
is beyond this generation. You
11:58
know, I've started. I've
12:01
watched it, watched over it for two decades. I
12:04
feel like in some ways the modern generation plowing
12:06
in with their eyes on the
12:09
short term stuff is like losing track
12:11
of the larger problems because they can't solve the larger
12:13
problems and they can't solve the little problems. But we're just like
12:16
plowing straight into the big problems and
12:19
we're going to go plow right into the big problems with
12:21
a bunch of little solutions that aren't going to scale. I think
12:24
it's lethal. I think it's at the
12:26
scale where you just back off and
12:28
don't do this. By back off
12:29
and don't do this, what do you mean?
12:32
I mean, have
12:33
an international
12:36
treaty about where the chips
12:39
capable of doing AI training go and
12:41
have them all going into licensed,
12:44
monitored data centers and
12:47
not have the training runs for
12:50
AIs more powerful than GPT-4,
12:53
possibly even lowering that threshold over time
12:55
as algorithms improve and it gets
12:57
possible to train more powerful AIs using
13:00
less compute. So you're picturing a kind of international
13:03
agreement to just stop.
13:05
International moratorium. And
13:07
if North Korea steals the GPU
13:09
shipment, then you've got to be ready
13:11
to destroy their data
13:14
center that they build by conventional means. And
13:16
if you don't have that willingness in advance, then
13:18
countries may refuse to sign up for the agreement being like,
13:21
why aren't we just like ceding the
13:23
advantage to someone else then? It
13:25
actually has to be a worldwide shutdown because
13:28
the scale of harm from a super intelligence,
13:31
it's not that if you have 10 times
13:33
as many super intelligences, you've got 10
13:35
times as much harm. It's not that a super
13:37
intelligence only wrecks the country that
13:39
built the super intelligence. Any super intelligence
13:41
everywhere is anyone's last problem.
13:44
So Gary and Scott, if either of you want to jump in
13:46
there, I mean, is there, is AI
13:49
safety a matter of forestalling
13:52
the end of the world and all of these smaller
13:54
issues and pass
13:56
towards safety that Scott, you mentioned are just, you
13:58
know,
13:59
throwing, I don't know what the
14:02
analogy is, but pointless essentially.
14:04
What do you guys make of this?
14:07
The journey of a thousand miles begins
14:09
with a step. The way I think about this comes
14:12
from 25 years
14:17
of doing computer science research,
14:19
including quantum computing
14:21
and computational complexity,
14:24
things like that, where we have these gigantic
14:26
aspirational problems that we we
14:29
don't know
14:29
how to solve. And yet,
14:32
a year after year, we do make
14:34
progress. We pick off little subproblems.
14:36
And if we can't solve those, then we find
14:39
subproblems of those. And we keep
14:41
repeating until we find something that we can
14:43
solve. And this is,
14:45
I think, for centuries, the way that science
14:48
has made progress. Now, it is
14:50
possible that this time,
14:52
we just don't have enough time for that
14:54
to work. And I think that
14:56
is what Eliezer is fearful
14:59
of, that we just
14:59
don't have enough time for the ordinary
15:02
scientific process to
15:04
work before AI becomes too powerful,
15:08
in which case, you start talking about
15:10
things like a global moratorium
15:14
enforced with the threat of war. I
15:17
am not ready to go there. I
15:20
could imagine circumstances where
15:22
maybe I say, gosh, this
15:25
looks like such an imminent threat that we
15:27
have to. But I tend to be very, very worried in
15:33
general about causing a catastrophe
15:35
in the course of trying to prevent a catastrophe.
15:38
And I think when you're
15:40
talking about threatening
15:43
airstrikes against data centers or things like that,
15:45
then that's an obvious worry. I'm
15:48
sort of somewhere in between, I guess. I
15:51
don't think that there's... So I'm somewhat
15:53
in between here.
15:54
I'm with Scott that we are not
15:57
at the point where we should be bombing data centers. I don't think we're going
15:59
to be able to do that.
15:59
close to that. I'm much less what
16:02
the right word is to use here. I
16:04
don't think we're anywhere near as close to
16:07
AGI as I think Eliezer sometimes sounds
16:09
like. I don't think GPT-5 is
16:11
anything like AGI, and I'm not particularly
16:14
concerned about who gets it first and so
16:16
forth. On the other hand, I think that we're
16:18
in a sort of dress rehearsal mode. Nobody
16:21
expected GPT-4, really, chat
16:23
GPT to percolate as fast as it
16:25
did. It's a reminder that there's
16:28
a social side to all of this, how software
16:30
gets distributed matters, and
16:32
there's a corporate side. It was a kind
16:34
of galvanizing moment for me when Microsoft
16:37
didn't pull Sydney, even though Sydney did some
16:39
awfully strange things. I thought they would take
16:41
it for a while, and it's a reminder that they can make whatever decisions
16:44
they want. We kind of multiply that by
16:46
Eliezer's concerns about what do
16:48
we do and at what point what
16:51
would be enough to cause
16:53
problems. It is a reminder, I think, that we
16:55
need, for example, to start roughing out these
16:58
international treaties now because there could become
16:59
a moment where there is a problem. I don't think
17:02
the problem that Eliezer sees is here now,
17:04
but maybe it will be. And maybe when it does
17:07
come, we will have so many people pursuing
17:09
commercial self-interest and so little infrastructure
17:12
in place we won't be able to do anything. So I think
17:14
it really is important to think now. If
17:16
we reach such a point, what are we going to
17:18
do? What do we need to build in place
17:20
before we get to that point?
17:26
Question. Have you ever wondered about the impact
17:28
of your charitable donation? I mean, how
17:30
much good can your contribution really do?
17:33
The answer is not always easy to find, but if you're
17:35
invested in making a difference in the world, then
17:37
allow me to introduce you to GiveWell. GiveWell
17:40
is not your typical charity platform. They
17:42
spend countless hours researching charitable
17:44
organizations, diving deep into evidence
17:46
and hard data. Over the past 15 years,
17:49
they've been vetting, scrutinizing, and only
17:51
recommending the highest impact opportunities
17:53
they found. Their process involves a team
17:56
of 25
17:56
researchers who put in over 40,000 hours
17:58
each year
17:59
to maximize the impact of your donations.
18:02
Over 100,000 donors have already
18:04
trusted GiveWell to allocate their donations
18:07
wisely. For just $5, you can provide
18:09
a bed net to prevent malaria. $7 can
18:11
provide a child with malaria treatment through the high
18:14
malarial season. Just $1 can deliver
18:16
a vitamin A supplement to a child, a deficiency
18:18
of which can increase mortality rates. Even
18:21
as little as 160 bucks can vaccinate
18:23
an infant, helping prevent diseases and reduce
18:25
child mortality. This is how GiveWell operates.
18:28
They measure, they model, they review, and they forecast
18:30
the impact, all to ensure that your donations
18:33
are used in the best possible way. And
18:35
the best part? All of their research and
18:37
recommendations are available for
18:39
free on their website. So if you wanna
18:41
make an informed decision about high impact
18:44
giving, head over to givewell.org. When
18:46
you make a donation, let them know you heard
18:49
about them from me. Just select podcast
18:51
and enter conversations with Coleman at checkout.
18:54
Remember, GiveWell does not take a cut from
18:56
your donation. All of it goes directly to
18:58
help those who need it most. Once again, head
19:01
over to givewell.org. And when you
19:03
make a donation, let them know you heard about
19:05
them through me. Just select podcast
19:07
and enter conversations with Coleman at checkout.
19:10
What's a game where no
19:12
one wins?
19:17
The waiting
19:20
game. When it comes to hiring, don't
19:22
wait for great talent to find you. Find them
19:24
first. When you're hiring, you need Indeed.
19:27
Indeed's the hiring platform where you can attract, interview
19:29
and hire all in one place. Instead
19:31
of spending hours on multiple job sites
19:33
searching for candidates with the right skills, Indeed's
19:36
a powerful hiring platform that can help you
19:38
do it all. They streamline hiring with powerful
19:40
tools that find you matched candidates. With Instant
19:42
Match, over 80% of employers get quality
19:45
candidates whose resume on Indeed matches their
19:47
job description the moment they sponsor a
19:49
job. Candidates you invite to apply are
19:51
three times more likely to apply to your job
19:53
than candidates who only see it in search. Indeed
19:56
gets you one step closer to the hire by
19:58
immediately matching you with quality. candidates.
20:01
Indeed does the hard work for you. Indeed
20:03
shows you candidates whose resumes on Indeed
20:05
fit your description immediately after
20:07
you post so you can hire faster. Indeed's
20:10
hiring platform matches you with quality
20:12
candidates instantly. Even better, Indeed's
20:14
the only job site where you only pay for applications
20:17
that meet your must-have requirements. Indeed
20:19
is an unbelievably powerful hiring
20:21
platform, delivering four times more
20:23
hires than all other job sites combined,
20:25
according to TalentNest. Join more than 3
20:28
million businesses worldwide
20:29
that use Indeed to hire great
20:32
talent fast. Start hiring now with
20:34
a $75 sponsored job credit
20:36
to upgrade your job posts at Indeed.com
20:39
slash conversations. Offer only
20:41
good for a limited time. Claim your $75 credit right
20:44
now at Indeed.com slash
20:47
conversations. Just go to Indeed.com
20:49
slash conversations and support the show
20:51
by saying you heard about it on this podcast. Indeed.com
20:54
slash conversations. Terms and conditions apply.
21:01
So we've been talking about
21:03
this concept of artificial general intelligence
21:07
and I think it's worth asking
21:09
whether that is a useful
21:12
coherent concept. So
21:14
for example if I were to think by analogy
21:16
to athleticism and think
21:18
of the moment when we build a machine
21:21
that has say
21:23
artificial general athleticism meaning
21:26
it's you know better than LeBron
21:28
James at basketball but also better
21:30
at the at curling than the world's best
21:33
curling player and also better at soccer
21:35
and also better at archery and
21:38
so forth.
21:39
It would seem to me
21:41
that there's something a bit strange
21:43
as framing it as having reached a
21:46
point on a single continuum.
21:49
It seems to me you would sort of have to build
21:52
each capability each sport
21:54
individually and then somehow figure
21:57
how to package them all into one robot
21:59
with without each skill set detracting
22:02
from the other. Is that a disanalogy?
22:06
Do you all picture this intelligence
22:09
as sort of one dimension, one
22:11
knob that is going to get turned
22:13
up along a single
22:16
axis? Or do you think that way
22:18
of talking about it is misleading
22:20
in the same way that I kind of just sketched
22:23
out?
22:23
I would absolutely not accept that.
22:26
I like to say that intelligence is not a one dimensional
22:28
variable. There's many
22:31
different aspects to intelligence.
22:34
There's not, I think, going to be a magical moment when
22:36
we reach the singularity or something like that.
22:40
I would say that the core of artificial general intelligence
22:42
is the ability to flexibly
22:44
deal with new problems that you haven't
22:47
seen before. Current
22:49
systems can do that a little bit, but not very well. My
22:52
typical example of this now is GPT-4
22:55
is exposed to the game of chess, sees
22:57
lots of games of chess, it sees the rules of chess,
23:00
but it never actually figures out the rules of chess
23:02
and makes illegal moves and so forth. It's
23:04
in no way a general intelligence that can
23:06
just pick up new things. Of course, we have things
23:08
like AlphaGo that can play a certain set of
23:11
games, or AlphaZero really, but
23:13
we don't have anything that has the generality
23:15
of human intelligence. Human
23:18
intelligence is just one example of general
23:20
intelligence. You could argue that chimpanzees
23:22
or crows have another variety of general intelligence.
23:26
I would say the current machines don't really have it,
23:29
but they will eventually.
23:30
I think a priori, it
23:33
could have been that you
23:35
would have math ability,
23:37
you would have verbal ability, you'd
23:39
have ability to understand
23:42
humor, and they'd all be just completely
23:44
unrelated to each other. That is
23:46
possible. In fact,
23:50
already with GPT, you can say that
23:52
in some ways, it already is a
23:54
superintelligence. It knows
23:57
vastly more, can converse on
23:59
a vastly greater scale.
23:59
range of subjects than any human
24:02
can. And in other ways,
24:05
it seems to fall short of
24:08
what humans know or
24:10
can do. But you
24:13
also see this sort of generality
24:16
just empirically. So, I mean,
24:19
GPT was sort of trained
24:22
on all the text on the internet.
24:26
You know, let's say most of the
24:28
text on the open internet.
24:29
So it was just one
24:32
method. It was not explicitly
24:35
designed to write code, and yet
24:37
it can write code. And at the
24:39
same time as that ability emerged,
24:42
you also saw the ability to
24:44
solve word problems, like
24:47
high school level math. You saw
24:49
the ability to write poetry. This
24:52
all came out of the same system without
24:54
any of it. You know, being explicitly
24:57
optimized for. And so I
24:59
feel like I need
24:59
to interject one important thing, which is
25:02
it can do all these things, but none of them all that reliably. OK.
25:06
Nevertheless, I mean, compared to, you know,
25:08
what let's say what my expectations
25:10
would have been if you'd ask me 10 or 20 years
25:12
ago, I think that the level of generality
25:15
is pretty remarkable. And, you
25:18
know, and it does lend support to the
25:20
idea that there is some sort of general
25:22
quality of understanding there where
25:24
you could say, for example, that GPT-4
25:27
has more of it than GPT-3,
25:29
which
25:29
in turn has more than GPT-2. And
25:32
I would say that it does
25:34
seem to me like it's presently pretty unambiguous
25:37
that GPT-4 is in some sense
25:39
dumber than in adult or
25:42
even teenage human. That's not obvious
25:44
to me. Why would you say that? I think that we
25:46
will eventually get that obvious, too. I
25:49
mean, to take the example I just gave you a minute
25:51
ago, it never learns to play chess, even
25:53
with a huge amount of data. So
25:56
it will play a little bit of chess.
25:58
It will memorize the openings and be able to play chess.
25:59
okay for the first 15 moves, but
26:02
it gets far enough away from what it's trained on and it falls
26:04
apart. This is characteristic of
26:06
these systems. It's not really characteristic in the same
26:08
way of adults or even
26:10
teenage humans. Almost
26:12
anything that it does, it does unreliably. And to
26:14
give another example, you can ask a
26:17
human to write a biography of someone and don't
26:19
make stuff up. And you really can't ask GPT
26:21
to do that. Yeah. Like, it's a bit
26:24
difficult because you could always be cherry picking
26:26
something that humans aren't usually good at. But
26:28
to me, it does seem like there's this
26:29
broad range of problems that don't
26:32
seem especially to play to humans
26:34
strong points or machine
26:37
weak points where GPT4 will
26:39
do no better than a
26:42
seven year old on those problems. Hold on. Can I
26:44
interject here? I do
26:46
feel like these examples are cherry
26:48
picked because if I just take a different,
26:51
very typical example, I'm writing an op-ed
26:54
for the New York Times, say, about any given
26:56
subject in the world. And my choice is to
26:58
have a smart 14
26:59
year old next to me with anything that's
27:02
in his mind already or GPT. And
27:04
there's no comparison, right? So
27:07
which of these sort of examples is
27:09
the litmus test for a human or a intelligent?
27:12
If you did it on a topic
27:14
where it couldn't rely on memorized
27:16
text, you might actually change your mind
27:19
on that. So the thing about writing a
27:21
Times op-ed is most of the things that
27:24
you propose to it, there's actually something
27:26
that it can pastiche together from its dataset.
27:28
That doesn't mean that it really understands what's going
27:31
on. It doesn't mean that that's a general capability.
27:33
Also, as the human, you're doing all the hard
27:35
parts, right? Like, obviously,
27:37
a human is going to prefer if
27:40
a human has a math problem, you're going to rather use a calculator
27:43
than another human. And similarly,
27:44
with the New York Times op-ed, you're doing all
27:46
the parts that are hard for GPT4.
27:50
And then you're like asking GPT4 to just do
27:52
some of the parts that are hard for you. You're
27:54
always going to prefer an AI partner rather than a human
27:56
partner within that sort
27:59
of like the human can do all the human stuff and
28:02
you want an AI to do whatever the AI is good
28:04
at at the moment. An analogy that's maybe
28:06
a little bit helpful here is driverless cars. It
28:09
turns out that
28:10
on highways and ordinary traffic, they're probably
28:13
better than people. In unusual
28:15
circumstances, they're really worse than people. So
28:17
Tesla not too long ago ran into a jet
28:19
and human wouldn't do that. Like slow
28:22
speed being summoned across a parking lot. Human
28:24
would never do that. So there are different
28:26
strengths and weaknesses. The strengths of
28:28
a lot of the current kinds of technology
28:30
is that they can either pastiche together
28:33
or make not literal analogies
28:35
when we go into the details, but to stored
28:38
examples and they tend to be poor when
28:40
you get to outlier cases. And
28:43
that's persistent across most of the technologies
28:46
that we use right now. And so if you
28:48
stick to stuff in which there's a lot of data,
28:50
you'll be happy with the results you get from these systems.
28:53
You move far enough away, not so much.
28:56
And what we're going to see over time is that
28:58
the length of the debate about whether or not
29:00
it's still dumber than you gets longer
29:02
and longer and longer. And then
29:05
if things are allowed to just keep running and nobody
29:07
dies, then at some point that switches over
29:10
to a very long debate about is
29:12
it smarter than you, which then gets shorter
29:14
and shorter and shorter and eventually reaches
29:16
a point where it's pretty
29:19
unambiguous if you're paying attention.
29:20
Now, I suspect that this process
29:22
gets interrupted by everybody dying. In
29:25
particular, there's a question of the point at which it
29:27
becomes better than you, better
29:29
than humanity at building the next edition
29:31
of the AI system and how fast do things snowball
29:34
once you get to that point. Possibly you do not
29:36
have time for further public
29:39
debates or even
29:41
a two hour Twitter space depending on how that goes. I
29:44
mean, some of the limitations of
29:46
JPT are
29:48
completely understandable, just from
29:50
a little knowledge of how it works. It
29:53
does not have an internal memory,
29:57
per se, other than what appears
29:59
on the screen.
29:59
the screen in front of you. So this
30:02
is why it's turned out to be so effective
30:04
to explicitly tell it, like, let's
30:07
think step by step when it's solving
30:09
a math problem, for example. You have to
30:11
tell it to show all of its work
30:13
because it doesn't have an internal
30:15
memory with which to do that. Likewise,
30:18
when people complain about it hallucinating
30:21
references that don't exist, well, the truth
30:24
is when someone asks me for
30:26
a citation, if I'm not
30:28
allowed to use Google, I might
30:29
have a vague recollection of
30:32
some of the authors and I'll probably
30:34
do a very similar thing to what GPT
30:36
does. I'll hallucinate. Right. So
30:38
well, no, there's a great phrase I learned the other day, which is
30:41
frequently wrong, never in doubt. That's
30:43
true. That's true. I'm not
30:46
going to make up a reference with full detail,
30:48
page numbers, titles, so
30:50
forth. I might say, look, I don't remember 2012 or something like
30:52
that. Here's
30:55
GPT-4, what it's going to
30:57
say is 2017, Aronson and Yudkowsky,
30:59
New York Times, pages 13 to 17.
31:03
No, it does need to get much, much better
31:06
at knowing what it doesn't know. And yet,
31:08
already I've seen a noticeable
31:11
improvement there, going from GPT-3
31:13
to GPT-4. For example,
31:16
if you ask GPT-3, prove that
31:18
there are only finitely many prime
31:20
numbers, it will give you a proof, even
31:23
though the statement is false. And
31:25
it will have an error, which is similar
31:27
to the errors on 1,000 exams
31:29
that
31:29
I've graded, just trying
31:32
to get something past you, hoping that
31:34
you won't notice. If you ask GPT-4,
31:37
prove that there are only finitely many prime
31:39
numbers, it says, no, that's a trick question. Actually,
31:42
there are infinitely many primes. And here's why.
31:44
Yeah. Part of the problem with doing the science
31:47
here is that I think you would know better since
31:49
you work part-time or whatever to open AI.
31:52
But my sense is that a lot of the examples
31:54
that get posted on Twitter, particularly
31:57
by the likes of Meme and other critics,
31:59
are not really good.
31:59
or other skeptics, I should say, is
32:02
the system gets trained on those. So,
32:05
you know, almost everything that people write about
32:07
it, I think, is in the training set. It's
32:09
hard to do the science when the system's constantly
32:11
being trained, especially in the RLA-HF
32:14
side of things. And we don't actually know what's in GPT-4,
32:17
so we don't even know if they're regular expressions
32:19
and, you know, simple rules to match things. So we
32:22
can't do the kind of science we used to be able to
32:24
do.
32:25
This conversation, this
32:27
subtree of the conversation, I think, has no
32:30
natural endpoint. So if I can sort
32:32
of zoom out a bit, I think there's a, you
32:34
know, pretty solid sense in which
32:37
humans are more generally intelligent than chimpanzees.
32:40
As you get closer and closer to
32:42
the human level, I
32:44
would say that the direction
32:47
here is still clear, that the comparison is still clear.
32:50
We are still smarter than GPT-4. This
32:52
is not going to take control of the world from us. But,
32:55
you know, the conversations get longer. It
32:58
gets, the definitions start to break
33:00
down around the edges. But I
33:02
think it also, as you keep going, like it comes
33:04
back together again, there's a point,
33:07
and possibly this point is like very close to
33:09
the point of time to where everybody dies. So maybe we
33:11
don't ever like see it in a podcast,
33:13
but there's a point where it's, you know, unambiguously
33:16
smarter than you. And including
33:19
like the spark of creativity,
33:22
being able to deduce things quickly
33:25
rather than with tons and tons of extra evidence,
33:27
strategy, cunning, modeling
33:29
people, figuring out how to manipulate
33:32
people.
33:33
So let's stipulate, Eliezer,
33:35
that we're going to get to machines that can do
33:37
all of that. And then the question is, what
33:39
are they going to do? Is it a certainty
33:42
that they will make our annihilation
33:44
part of their business? Is it a possibility?
33:47
Is it an unlikely possibility? I
33:49
think your view is that it's a certainty. I've
33:51
never really understood that part. It's a
33:53
certainty on the present tech is
33:55
the way I would put it. Like if that happened, so
33:58
in particular, like if that.
33:59
happen tomorrow, then you know, Modulo
34:02
Cromwell's rule never say certain. Like
34:05
my probability is like, yes, Modulo, the
34:08
chance that my model is somehow just completely
34:10
mistaken. If we got 50 years
34:13
to work it out and unlimited
34:15
retries, I'd be a lot more confident. I
34:18
think that'd be pretty OK. I think we'd make it. The
34:21
problem is that it's a lot harder to
34:24
do science when your first wrong try destroys
34:26
the human species, and then you don't get to try again.
34:28
I mean, I think there's something,
34:29
again, that I agree with and something I'm a little
34:32
bit skeptical about. So
34:33
I agree that the amount of time we have
34:36
matters. And I would also agree
34:38
that there's no existing technology
34:41
that
34:42
solves the alignment problem that gives a moral
34:44
basis to these machines. I mean, GPT-4
34:46
is fundamentally amoral. I don't think
34:48
it's immoral. It's not out to get us. But
34:51
it really is amoral. It
34:53
can answer trolley problems because there are trolley problems
34:56
in the data set. But that doesn't mean that it really has a moral
34:59
understanding of the world. And so if
35:01
we get to a very smart machine
35:03
by all the criteria that we've talked
35:06
about, and it's amoral, then that's a problem
35:08
for us. And there's a question of whether
35:10
if we can get to smart machines,
35:14
whether we can build them in a way that will have some
35:16
moral basis. And I think we need
35:18
to make progress. Well, I
35:20
mean, the first try part I'm not willing to let
35:22
pass. So I understand,
35:25
I think, your argument there. Maybe you should spell it out.
35:28
I think that we probably get more than one
35:30
shot and that it's not as
35:32
dramatic and instantaneous
35:33
as you think. I do
35:36
think one wants to think about sandboxing. One
35:38
wants to think about distribution. But I mean,
35:40
let's say we had one evil super
35:42
genius now who is smarter than everybody
35:44
else. Like, so what? One
35:47
super smart. Say again? Not
35:49
just a little smarter. Even a lot smarter.
35:52
Most super geniuses
35:56
aren't actually that effective. They're not that focused.
35:58
They were focused on other things.
35:59
You know, you're kind of assuming that
36:02
the first super genius AI
36:04
is gonna make it its business to annihilate
36:06
us And that's the part where I I still
36:08
am a bit stuck in the argument.
36:10
Yeah Some
36:12
of this has to do with the notion that
36:15
if you do a bunch of training you
36:17
start to get Gold direction
36:19
even if you don't explicitly train on that That
36:22
goal direction is a natural way to
36:24
achieve higher capabilities the reasons why
36:27
reason why humans want things is that wanting
36:29
things is an effective way of getting things and
36:32
and so natural selection in
36:34
the process of selecting
36:36
exclusively on reproductive fitness just
36:38
on that one thing got us to
36:41
want a bunch of things that correlated with reproductive
36:44
fitness thing the ancestral distribution because wanting
36:47
having intelligences that
36:49
want things is a good way of getting
36:51
things that's in a sense like
36:54
You know wanting comes from the same place as
36:56
intelligence itself and you could even
36:58
you know from a certain technical standpoint on expected
37:01
utility Say that intelligence is a special
37:03
is a very effective way of wanting planning
37:06
plotting pass through time that lead to particular outcomes
37:09
So so part of it is that I think it I
37:12
do not think you get like the breeding super intelligence
37:14
that Wants nothing because I don't think
37:16
that wanting an intelligence can be
37:18
pride internally pride apart that easily I think
37:21
that the way you get super intelligence is is that
37:24
there are things that have gotten good at Organizing
37:27
their own thoughts and have good taste
37:29
in which thoughts to think
37:30
and that is where the high capabilities
37:33
come from Can I put a point to you le is around
37:35
it on this and then that does mean that they have internal
37:37
Let me just put this point to you I think it
37:39
look can let me just put the following point to you which I think
37:42
In my mind is similar to what Gary was saying As
37:49
We dive deeper into the heart of summer with
37:51
the Sun beaming down and the days inviting
37:53
us all to be more active We all need
37:55
wholesome convenient meals to keep us
37:57
going. I know that's certainly true for me And
38:00
that's where Factor comes in, America's number
38:02
one ready to eat meal kit. I have to tell
38:04
you, as someone who gets six protein-rich
38:06
meals from Factor every week, it's been
38:08
a game changer for me. The flavorful and
38:10
nutritious ready to eat meals are delivered right
38:12
to my door, saving me time and keeping me on
38:15
track with my wellness goals. We all know the struggle.
38:17
Summer plans, keeping us too busy to cook, yet
38:20
we still want to eat well. With Factor, the
38:22
grocery trips, the chopping, prepping, and
38:24
cleaning up are all things of the past, but
38:27
without compromising on flavor or nutritional
38:29
quality. These fresh, never frozen
38:31
meals are ready in just two minutes. All
38:33
I have to do is heat, enjoy, and get back to
38:35
soaking up the sun. Each meal from Factor
38:38
is like a treat. With high quality ingredients
38:40
like broccolini, leeks, asparagus,
38:42
and over 34 weekly restaurant quality
38:44
options like shrimp risotto, green goddess
38:46
chicken, grilled steakhouse filet mignon,
38:49
I always have a flavorful variety to choose
38:51
from. Factor's lunch to go is a lifesaver
38:54
when I'm busy. Wholesome grain bowls
38:56
and salad toppers keep my energy levels up.
38:58
No microwave even required
38:59
for those. So if you're like me and you want to enjoy
39:02
the summer without the hassle of meal prep, get
39:04
Factor. Just choose your meals and savor
39:06
the fresh flavor-packed meals delivered right
39:08
to your door. Here's the special
39:10
offer. Head to factormeals.com slash Coleman50
39:13
and use the code Coleman50 to get 50% off. That's
39:16
Coleman50 at factormeals.com slash
39:19
Coleman50 to get 50% off.
39:25
There's often in philosophy, this
39:27
notion of the continuum fallacy,
39:29
which in the canonical
39:32
example is like, you can't locate
39:35
a single hair that you would pluck from my head where
39:37
I would suddenly go from not bald
39:40
to bald. Or like the example, the even
39:43
more intuitive example is like a color wheel. Like
39:45
there's no single, on a gray
39:47
scale, there's no single pixel you
39:49
can point to and say, well, that's where gray begins
39:52
and white ends. And yet we have
39:54
this conceptual distinction that feels hard
39:56
and fast between
39:57
gray and white and gray and black and
39:59
so forth.
39:59
When we're talking about
40:02
artificial general intelligence or super
40:04
intelligence, you seem to
40:06
operate on a model where either
40:09
it's a super intelligence capable of destroying
40:11
all of us or it's not. Whereas
40:15
intelligence may just be a
40:17
continuum fallacy style spectrum
40:19
where we're first going to see the shades
40:22
of something that's just a bit more
40:24
intelligent than us and maybe it can kill
40:26
five people at most and then it can...
40:30
And when that happens, we're
40:32
going to want to intervene and
40:34
we're going to figure out how to intervene at that level
40:37
and so on and so forth. Well, if it's stupid enough to do it,
40:39
then yeah. Yeah, so... If it's stupid
40:41
enough to do it, then yes. Let
40:44
me buy the identical logic. There
40:46
should be nobody who steals money on
40:48
a really large scale, right? Because
40:50
you could just give them $5 and see if they steal that. And
40:54
if they don't steal that, you know you're good to trust them
40:56
with a billion. I mean, I think that
40:58
in actuality, anyone who did steal
41:00
a billion dollars probably displayed
41:03
some dishonest behavior
41:04
earlier in their life that was
41:07
unfortunately not acted upon
41:09
early enough. I'm actually
41:11
not even... The analogy... Yeah, but...
41:14
Hold on, hold on. The analogy I have pictures
41:16
like we have the first case of fraud that's $10,000
41:18
and then we build systems
41:20
to prevent it, but then they fail with a somewhat
41:23
smarter opponent, but our systems get better and
41:25
better and better. And so we actually prevent
41:27
the billion dollar fraud because of the systems
41:29
put in place in response to
41:31
the $10,000 frauds. I mean,
41:33
I think Coleman's putting his finger on an important
41:36
point here, which is how much do we get to iterate?
41:39
And Eliezer is saying, the minute
41:41
we have a super intelligent system, we won't
41:43
be able to iterate because it's all over immediately.
41:45
Well, there isn't a minute like that. The
41:47
way that the continuum goes to the threshold
41:50
is that you eventually get something that's smart enough
41:53
that it knows not to play its hand
41:55
early. And then if that thing, you
41:57
know, if you are still cranking up the power...
41:59
that and preserving its utility function,
42:02
it knows it just has to wait to be smarter,
42:04
to be able to win. It doesn't play its hand
42:07
prematurely. It doesn't tip you off. It's not in
42:09
its interest to do that. It's in its interest to cooperate
42:11
until it thinks it can win against humanity and
42:13
only then make its move. If it doesn't
42:16
expect the future smarter AIs to be smarter
42:18
than itself, then we might perhaps see these early AIs
42:20
telling humanity, don't build the later
42:22
AIs. And I would be sort
42:25
of surprised and amused if we ended
42:27
up in that particular sort of like science fiction
42:29
scenario
42:29
as I see it. But we're already in like
42:32
something that, you know, me from 10 years ago would have
42:34
called the science fiction scenario, which is the
42:36
things that I'll talk to you without being very smart. I
42:39
always come up, Eliezer, against
42:41
this idea that you're assuming
42:44
that the very bright machines, the super
42:46
intelligent machines will be malicious
42:49
and duplicitous and so forth. And I just
42:51
don't see that as a logical entailment
42:54
of being very smart. I mean,
42:57
they don't specifically want
42:59
as an end in itself for you to be destroyed.
43:02
They're just doing whatever obtains the most
43:04
of the stuff that they actually want, which doesn't specifically
43:07
have a term that's maximized by
43:09
humanity surviving and doing well.
43:12
Why can't you just hard code? Don't
43:15
do anything that will annihilate the human species. Don't
43:18
do anything.
43:18
We don't know how. There is no technology
43:21
to hard code. So there I agree
43:23
with you, but I think
43:25
it's important if I can just run for one second. I
43:28
agree that right now we don't have the technology
43:31
to hard code. Don't
43:33
do harm to humans. But for me, it
43:36
all boils down to a question of are we going to get to
43:38
the smart machines before we make progress
43:40
on that hard coding problem or not? And that
43:42
to me, that means that problem of hard
43:44
coding ethical values is actually
43:47
one of the most important projects that we
43:48
should be working on.
43:50
Yeah. Yeah. And
43:52
I tried to work on it 20 years in advance and capabilities
43:55
are just like running vastly out of alignment.
43:57
When I started working on this 20 years. like
44:00
two decades ago, we were in a sense
44:02
ahead of where we are now.
44:03
AlphaGo is much more controllable
44:06
than GPT-4. So there I agree
44:08
with you. We've fallen in love with a
44:10
technology that is fairly poorly
44:13
controlled. AlphaGo is very easily
44:15
controlled and very well
44:17
specified. We know what it does. We can more or less
44:19
interpret why it's doing it. And everybody's
44:22
in love with these large language models and they're
44:25
much less controlled. And you're
44:27
right, we haven't made a lot of progress on alignment.
44:30
So if we just go on a straight line, everybody
44:32
dies. I think that's, this
44:33
is an important fact. I would almost even
44:36
accept that for argument, but
44:38
ask then, just for the sake of argument, but
44:40
then ask, do we have to be on a straight line?
44:43
I mean, I would agree to the weaker claim
44:46
that, you know, we should certainly
44:48
be extremely worried about the intentions
44:51
of a super intelligence in the same way
44:53
that say chimpanzees should be
44:55
worried about the intentions of the
44:57
first humans that arise.
45:00
And in fact, chimpanzees
45:03
continue to exist in
45:05
our world only at human's pleasure. But I
45:07
think that there were a lot of other considerations
45:09
here. For example, if we imagined
45:12
that GPT-10
45:14
is the first unaligned super
45:17
intelligence that has these
45:20
sorts of goals, well, then, you know, it would be appearing
45:22
in a world where presumably GPT-9,
45:25
you know, already has very wide diffusion
45:28
and where people can use that
45:30
to try to, you know, you know, and GPT-9
45:33
was not destroying the world, you know, by
45:35
assumption. Why does GPT-9
45:37
work with humans instead of with GPT-10?
45:40
Well, I don't know. I mean, maybe, maybe,
45:42
maybe it does work with GPT-10, but,
45:45
you know, I just, I just don't view that
45:47
as a certainty. You know,
45:49
I mean, I think, you know, your certainty
45:51
about this
45:54
is the one place where I really get off
45:56
the train. Same with me.
45:58
I, well, I mean, I'm not asking. asking you
46:00
to share my certainty, I am
46:02
asking the viewers to believe
46:05
that you might end up with like more
46:08
extreme probabilities after you stare
46:10
at things for an additional couple of decades. That
46:12
doesn't mean you have to accept my probabilities immediately,
46:15
but I'm at least asking you to like not treat that as some kind
46:17
of weird anomaly.
46:19
You're just going to find those kinds of situations
46:21
in these debates. My view is that
46:24
I don't find the extreme probabilities
46:26
that you described to be plausible, but
46:28
I find the question that you're raising to be
46:30
important. I think maybe
46:33
straight line is too extreme, but this idea
46:35
that if you just follow current
46:37
trends, we're getting more, I'm sorry, we're
46:39
getting less and less controllable machines
46:42
and not getting more alignment.
46:45
Machines that are more unpredictable, harder to
46:47
interpret, and know better at sticking
46:49
to even a basic principle like be
46:52
honest and don't make stuff up. In fact,
46:54
that's a problem that other technologies don't really have.
46:56
Modeling systems, GPS systems don't make stuff
46:59
up. Google search doesn't make stuff
47:01
up. It will point to things that other people have made
47:03
stuff up, but it doesn't itself do it. In
47:05
that sense, the trend line is not great. I
47:07
agree with that. I agree that
47:10
we should be really worried about that and we should put
47:12
effort into it. Even if I don't agree with
47:14
the probabilities that you attach to it. I mean,
47:16
LESR- Let me interject with the question here. Go
47:19
ahead, Scott. Go ahead, Scott. No,
47:22
I mean, I think that LESR deserves
47:24
sort of eternal credit for raising
47:26
the issue. I was talking about the issues that were facing
47:28
these issues 20 years ago and it was very, very far from
47:31
obvious to most of us that they would be
47:33
live issues. I can say for
47:35
my part, I was familiar with
47:38
LESR's views since 2006 or so. When
47:43
I first encountered them, I
47:46
knew that there was no principle that this
47:48
scenario was impossible, but I just
47:51
felt like, well, supposing I agreed with
47:53
that,
47:54
what do you want me to do about it? Where
47:56
is the research program that has any hope
47:58
of making progress?
47:59
I mean, there's one question
48:02
of what are the most important problems in the world,
48:04
but in science, that's necessary
48:06
but not sufficient. We need something that we can make
48:08
progress on. And
48:11
that is the thing that I think
48:13
has changed just
48:16
recently with the advent of
48:18
actual very powerful AIs. And
48:21
so the sort of irony here is
48:23
that as Eliezer has
48:25
gotten much more pessimistic,
48:28
unfortunately, in the last few years
48:29
about alignment,
48:32
I've sort of gotten more optimistic.
48:35
I feel like, well, there is a research
48:37
program that we can actually
48:39
make progress on now. Yeah,
48:42
your research program is going to take 100 years and we don't
48:44
have 100 years. I don't know how long it will take. We
48:46
don't know that. Exactly. We
48:48
don't know. I think the argument that we should
48:50
put a lot more effort into it is clear.
48:53
I think the argument that will take 100 years is totally
48:55
unclear. I mean, I'm not even sure you can do
48:57
it in 100 years because there's the basic problem of
48:59
getting it right on the first try. And
49:01
the way these things are supposed to work in science is
49:04
that you have your bright-eyed optimistic youngsters
49:06
with their vastly oversimplified, hopelessly
49:08
idealistic, optimistic plan. They charge
49:11
ahead. They fail. They learn
49:13
a little cynicism. They learn a little pessimism. They
49:15
learn it's not as easy as that. They try
49:17
again. They fail again. They start to
49:19
build up something like battle-hardening.
49:23
And they find out how
49:25
little is possible to them. Aliezer,
49:27
this is a place where I just really don't agree
49:29
with you. So I think there's all kinds of things we can
49:31
do that are sort of of the flavor of
49:34
model organisms or simulations and
49:36
so forth. And we just mean it's
49:38
hard because we don't actually have a super intelligence
49:40
so we can't fully calibrate. But it's
49:43
a leap to say that there's nothing iterative that
49:45
we can do here or that we have to get it right
49:47
on the first time. I mean, I certainly see
49:49
a scenario where that's true. Where getting it
49:52
right on the first time
49:53
does make the difference. But
49:55
I can see lots of scenarios where it doesn't and where we do
49:57
have time to iterate before it happens, after
49:59
it happens.
49:59
happens. It's really not a single moment,
50:02
but I'm, you know, idealizing. I mean, the
50:04
problem is getting anything that generalizes
50:07
up to super intelligent level where
50:09
past some threshold level, the
50:11
minds may find that in their own interest to start lying
50:13
to you.
50:14
Even if that happens before super intelligent. Even
50:16
that, like I don't see a logical
50:19
argument that you can't emulate that or
50:21
study it. I mean, for example, you could, I'm just
50:24
making this up as I go along, but for example, you could study
50:26
what can we do with sociopaths who
50:28
are often very bright and, you know,
50:31
not tethered to our value. But yeah,
50:34
what can a,
50:35
what, what strategy can a
50:37
like 70 IQ honest person
50:40
come up with an invent themselves by
50:42
which they will outwit and defeat a 130 IQ sociopath.
50:45
All right. Well, there you're not being fair either in the sense
50:48
that, you know, we actually have lots of 150 IQ
50:51
people who could be working on this problem collectively
50:54
and there's, there's value in collective
50:56
action. There's literature. What I see,
50:59
what I see that gives me pause is that, is
51:01
that the people don't seem to appreciate what
51:03
about the problem is hard.
51:05
Even at the level where like 20 years
51:08
ago I could have told you it was hard until,
51:10
you know, somebody like
51:12
me comes along and nags them about it. And then they
51:15
talk about the ways in which they could adapt and be clever.
51:17
But the, but the people's charging straight forwards are
51:20
just sort of like doing it in this supernally
51:22
naive way. Let me share a historical
51:25
example that I think about a lot, which
51:27
is in the early 1900s, almost
51:29
every scientist on the planet who thought
51:32
about biology made a mistake. They all
51:34
thought that genes were proteins.
51:36
And then eventually Oswald Avery did
51:38
the right experiments. They realized that
51:41
genes were not proteins. There was this weird acid
51:43
and it didn't take long after people got
51:46
out of this stock mindset before
51:48
they figured out how that weird acid worked and how
51:51
to manipulate it and how to read the code that it
51:53
was in and so forth. So I
51:55
absolutely sympathize with the fact that
51:57
I feel like the field is stuck right now. I
52:00
think the approaches people are taking to alignment
52:02
are unlikely to work. I'm completely with
52:04
you there. But I'm also, I guess,
52:06
more long-term optimistic that
52:08
science is self-correcting and that we have a chance
52:11
here. Not a certainty, but I think
52:13
if we change research priorities
52:15
from how do we make some money off this large
52:18
language model that's unreliable to how
52:20
do I save the species, we might actually make progress.
52:23
There's a special kind of caution that you need when
52:25
something needs to be gotten correct on the first
52:28
try. I'd be very optimistic if people
52:29
got a bunch of free retries and I didn't
52:32
think the first one was going to kill, you know, the first
52:34
really serious mistake killed everybody and we didn't get
52:36
to try again. If we got free retries, it'd
52:38
be an ordinary, you know, it'd be in some sense an ordinary
52:40
science problem. Look, I can imagine
52:43
a world where we only got one try
52:45
and if we failed, then it destroys
52:47
all life on earth. And so let me
52:50
agree to the conditional statement that if
52:52
we are in that world, then I think that we're screwed.
52:55
I will agree with the same conditional statement. All
52:57
right. Yeah.
52:59
And this gets back to like, you know, if you
53:02
picture by analogy the process of,
53:04
you know, a human baby, which
53:06
is extremely stupid, becoming a human
53:08
adult
53:09
and then just extending that so
53:11
that in a single lifetime, this person
53:14
goes from a baby to
53:17
the smartest being that's ever lived.
53:20
But in the normal way
53:22
that humans develop, which is, you know, it doesn't
53:24
happen on any one given
53:26
day and each sub skill
53:28
develops a little bit at its own
53:30
rate and so forth, it would not
53:33
be at all obvious to me that our concerns
53:36
that we have to get it right vis-a-vis that
53:38
individual the first time. I agree. No,
53:41
well, pardon me. I do think we have to get them right the first
53:43
time, but I think there's a decent chance of getting it right.
53:45
It is very important to get it right the first time. It's
53:48
like you have this one person getting
53:50
smarter and smarter and not everyone else is getting
53:52
smarter and smarter.
53:53
Elias, I mean, one thing that you've talked about
53:55
recently is, you know, if we're all going to
53:58
die, then at least let us die with this. dignity.
54:01
So, you know, I mean,
54:03
some people might care about that more than others,
54:05
but I would say that, you know, one
54:08
thing that death with dignity would mean
54:10
is, well, at least, you know, if
54:12
there if we do get multiple
54:15
retries, and you know, we get a
54:17
eyes that let's say, try to take
54:20
over the world, but are really inept at
54:22
it, and that fail and so forth, at least
54:24
let us succeed in that world, you know, and
54:26
that's at least something that we can imagine
54:28
working on and making progress
54:29
on.
54:30
I mean, you may very it's for
54:32
it is not presently ruled out that you have some
54:35
like, you know, relatively smart
54:38
in some ways, dumb in some other ways, or
54:40
at least like not smarter than human in other ways,
54:42
AI that makes an early shot at
54:44
taking over the world, maybe because it expects future
54:46
AI is to not share its goals and not cooperate
54:49
with it, and it fails. And,
54:51
you know, I mean, the appropriate lesson to learn
54:53
there is to, you know, like shut the whole thing
54:56
down. But, you know, if we
54:58
so yeah, like I would say, so I'd be
55:00
like, yeah,
55:00
sure, like, wouldn't it be good to live in that
55:02
world? And the way you live in that world is that when
55:04
you get that warning sign, you shut it all down.
55:07
Here's a kind of thought experiment. GPT-4
55:10
is probably not capable of annihilating
55:12
us all. I think we agree with that about that.
55:14
But GPT-4 is certainly capable
55:17
of expressing the desire to annihilate
55:19
us all or being, you know, people have rigged
55:22
different versions that are, you know, more
55:24
aggressive and so forth. We
55:26
could say, look, until we can shut
55:28
down those versions,
55:30
you know, GPT-4s that are programmed
55:33
to be malicious by human intent, maybe
55:35
we shouldn't build GPT-5, or at least not GPT-6
55:38
or some other system, etc. We could say, you
55:40
know, what we have right now actually is part
55:42
of that iteration. We have, you know, primitive intelligence
55:45
right now. It's nowhere near as smart as a
55:47
super intelligence is going to be. But even
55:49
this one, we're not that good at constraining.
55:52
Maybe we shouldn't pass go until we
55:54
get this one right. I mean, the
55:56
problem with that from my perspective is that
55:58
I do think you
55:59
you can pass this test
56:01
and still wipe out humanity. Like
56:04
I think that there comes a point where your AI
56:06
is smart enough
56:07
that it knows which answer you're looking for.
56:10
And the point at which it tells you what you want to hear
56:13
is not the point that is internal motivation.
56:15
It's not sufficient, but it might be a logical
56:17
pause point, right? It might be that
56:19
if we can't even pass the test now
56:22
of, you know, controlling a
56:24
deliberate sort of fine
56:26
tune to be malicious version of GPT-4,
56:29
then we don't know what we're talking about and
56:31
we're playing around with fire. So passing that
56:33
test wouldn't be a guarantee that would be in
56:36
good stead with an
56:37
even smarter machine, but we really should
56:39
be worried, I think, that we're not
56:41
in a very good position with respect even to the
56:44
current ones. Gary, I of course
56:46
watched the recent congressional hearing
56:48
where you and Sam Altman were
56:51
testifying, you know, about what
56:53
should be done. Should there be auditing
56:56
of these systems, you know, before training,
56:58
before deployment? And, you know, maybe,
57:00
you know, the most striking thing about
57:03
that session was, you know, just
57:05
how little daylight there seemed to be between
57:07
you and Sam Altman, the
57:09
CEO of OpenAI. You know, he
57:12
was completely on board with
57:14
the idea of, you know, establishing a regulatory
57:17
framework for,
57:21
you know, having to clear the, you
57:24
know, more powerful systems before
57:26
they are deployed. Now, you know, in Eliezer's
57:29
worldview, that still would be woefully
57:31
insufficient shortly,
57:34
and, you know, we would still all be dead. But,
57:36
you know, maybe in your worldview
57:39
that, you know, it sounds like I'm
57:41
not even sure how much daylight there is. I mean,
57:43
the, you know, you now, you know, you know, have a
57:46
very, I think historically
57:48
striking situation where, you
57:50
know, the heads of all of the major
57:52
AI, or almost all
57:55
of the major AI organizations are,
57:57
you know, agreeing, saying, you know, please raise your hand.
57:59
Yes, this is dangerous. Yes, we need to be regulated.
58:02
I mean, I thought it was really striking. In
58:06
fact, I talked to Sam just before the
58:09
hearing started, and I had
58:11
just proposed an international agency
58:13
for AI. I wasn't the first person ever, but I pushed
58:15
it into my TED Talk and an economist op-ed
58:18
a few weeks before. And Sam said
58:20
to me, I like that idea. And
58:23
I said, tell them, tell the Senate. And
58:25
he did. And that kind of astonished me
58:27
that he did. I mean, we've had some
58:29
friction between the two of us in the past. And he
58:31
actually even attributed it to me. He said, I support what
58:33
Professor Marcus said about
58:36
doing international governance.
58:38
And there's been a lot of convergence around
58:40
the world on that. Is that enough to stop
58:43
Eliezer's worries? No,
58:45
I don't think so. But it's an important baby
58:47
step. I think that we do need
58:49
to have some global body that
58:51
can coordinate around these things. I don't think
58:54
we really have to coordinate around super
58:56
intelligence yet. But if we can't do any coordination
58:58
now, then when the time comes, we're not prepared.
59:02
So I think it's great that there's some agreement. I
59:04
worry that OpenAI had this lobbying
59:07
document that just came out that seemed not
59:09
entirely consistent with what Sam
59:11
said in the room. And there's always concerns
59:13
about regulatory capture and so forth. But I think
59:15
it's great that a lot of the
59:17
heads of these companies, maybe with the exception of
59:20
Facebook or Meta,
59:21
are recognizing that there are
59:24
genuine concerns here. I mean, the other moment
59:26
that a lot of people remember from the testimony
59:29
was when Sam was asked what he was most concerned
59:31
about. Was it jobs? And he said, no.
59:33
And I asked Senator Blumenthal to push Sam.
59:36
And Sam was, he could have been more
59:38
candid, but he was fairly candid. And he said he
59:40
was worried about serious harm to the species. I
59:43
think that was an important moment when he said that
59:45
to the Senate. And I think it galvanized a lot
59:48
of people that he said it.
59:49
So can we dwell on a moment? I mean, we've
59:51
been talking about the,
59:53
depending on your view, highly likely
59:56
or
59:57
tail risk scenario
59:59
of of humanity's extinction
1:00:02
or significant destruction,
1:00:04
it would appear to me by the same token,
1:00:06
if
1:00:07
those are plausible
1:00:10
scenarios we're talking about, then the
1:00:12
opposite maybe we're talking about as
1:00:14
well. What does it
1:00:16
look like to have a super intelligent
1:00:20
AI that
1:00:22
really, as a feature
1:00:24
of its intelligence,
1:00:26
deeply understands human beings,
1:00:29
the human species, and
1:00:31
also has a deep desire
1:00:34
for us to be as happy as
1:00:36
possible. Oh, as happy as possible? What does
1:00:38
that world look like? And do you think that's- Yes,
1:00:40
that looks like- No, no, maybe not as happy as
1:00:42
physically possible. ... to make them as happy as possible.
1:00:45
But more like a parent wants their child
1:00:47
to be happy, right? That may not involve
1:00:50
any particular scenario, but is
1:00:52
generally quite concerned about the well-being
1:00:55
of the human race and is also super intelligent.
1:00:58
Honestly, I'd rather have machines work
1:01:00
on medical problems than happiness
1:01:03
problems. I think there's maybe more
1:01:05
risk of misspecification
1:01:07
of the happiness problems.
1:01:09
Whereas if we get them to work on Alzheimer's
1:01:12
and just say, like, figure out what's going on,
1:01:14
why are these plaques there? What can you do about it? Maybe
1:01:17
there's less harm that might come from- You
1:01:19
don't need super intelligence for that. That sounds like
1:01:21
an alpha fold three problem or an alpha
1:01:23
fold four problem. Well, alpha fold
1:01:25
doesn't really do that. This is all somewhat
1:01:28
different than the question I'm asking. It's
1:01:30
not really even
1:01:32
us asking a super intelligence
1:01:34
to do anything because we've already been entertaining
1:01:36
scenarios where the super intelligence has its
1:01:38
own desires independent of us. Yeah, I'm not
1:01:40
real thrilled with that. Do you think at all about a scenario
1:01:43
where- I don't think we want
1:01:45
to leave what their
1:01:48
objective functions are, what their desires
1:01:50
are to them working them out with no
1:01:52
consultation from us, with no human in the loop.
1:01:55
I mean, especially given our current understanding
1:01:58
of the technology.
1:01:59
current understanding of how to keep a
1:02:02
system on track, doing what we want to do
1:02:04
is pretty limited. And so, you
1:02:06
know, taking humans out of the loop there, it sounds
1:02:08
like a really bad idea to me, at least in
1:02:10
the foreseeable future. I would want to see much
1:02:13
better alignment technology. No, I agree. Before
1:02:16
we want to keep free reign. So if we
1:02:18
had the textbook from the future, like we
1:02:20
have the textbook from 100 years in the future,
1:02:22
which contains all the simple ideas that actually
1:02:24
work in real life, as opposed to, you know, the
1:02:27
complicated ideas and the simple ideas that don't
1:02:29
work in real life, the
1:02:29
equivalent of relus instead of sigmoids
1:02:32
for the activation functions, you know, 100 years,
1:02:34
the textbook from 100 years in the future, you can
1:02:36
probably build a super intelligence
1:02:39
that'll want anything you
1:02:41
can, anything that's coherent to want
1:02:43
anything you can, you know, figure out how
1:02:45
to say describe coherently
1:02:48
pointed at your own mind and tell you to figure out what
1:02:50
what it is you meant for to want. And
1:02:53
you know, you could get the you could get the glorious transhumanist
1:02:55
future. You could get the happily ever after anything's,
1:02:58
you know, anything's possible that doesn't violate
1:03:00
the laws of physics. The
1:03:03
trouble is doing it in real life. And, you know, I'm
1:03:05
the first try. But yeah, so
1:03:07
like, you know, could
1:03:09
the whole thing that we're aiming for here is
1:03:12
to colonize all the galaxies we can
1:03:14
reach before somebody else gets them first
1:03:16
and turn them into galaxies full
1:03:18
of, you know, complex, sapient life living
1:03:20
happily ever after. You know, that's that's
1:03:22
the goal. That's still the goal. Even
1:03:24
if we, you know, even even even
1:03:27
when I call for like, you know, a
1:03:29
permanent
1:03:29
moratorium on a I'm
1:03:32
not trying to prevent us from from colonizing
1:03:34
the galaxies, you know, like humanity
1:03:36
forbid, more more like let's you
1:03:39
know, let's do some human intelligence augmentation
1:03:41
with alpha fold for and before we try
1:03:44
building TPT eight. One
1:03:46
of the few scenarios that I think we can
1:03:48
clearly rule out here is an AI
1:03:51
that is excess essentially dangerous, but
1:03:53
also boring. Right. I mean, I think
1:03:55
anything that has the capacity to kill
1:03:58
us all right would have, you know,
1:03:59
if nothing
1:04:02
else, pretty amazing capabilities. And
1:04:04
those capabilities could also
1:04:06
be turned to, solving
1:04:09
a lot of humanities problems, if
1:04:12
we were to solve the alignment problem. I
1:04:14
mean, humanity had a lot of
1:04:16
existential risks,
1:04:19
before AI came on the scene, right?
1:04:22
I mean, there was the risk of nuclear
1:04:25
annihilation, there was the risk of runaway
1:04:27
climate change. And I would love to see an AI
1:04:32
that could help us with such things. I
1:04:34
would also love to see an
1:04:36
AI that could sort of help
1:04:38
us just solve some of the mysteries
1:04:41
of the universe. I mean, how
1:04:43
can one possibly not be
1:04:45
curious to know what
1:04:47
such a being could teach us? I
1:04:50
mean, for the past year, I've tried to use
1:04:52
GPT-4 to produce original
1:04:55
scientific insights, and I've
1:04:57
not been able to get it to do that. And
1:05:00
I don't know whether I should feel disappointed
1:05:02
or relieved by that, but I think
1:05:05
the better part of me should just,
1:05:07
is the part that should just want to see the
1:05:10
great mysteries of
1:05:12
existence, of why is
1:05:14
the universe quantum mechanical? Or how
1:05:17
do you prove the Riemann hypothesis? It
1:05:19
should just want to see these mysteries solved.
1:05:22
And if it's to be
1:05:25
by AI, then fine. Let
1:05:28
me give you a lesson
1:05:29
in epistemic humility.
1:05:32
We don't really know whether
1:05:34
GPT-4 is net positive
1:05:37
or net negative.
1:05:38
There are lots of arguments you can make. I've been in
1:05:40
a bunch of debates where I've had to take
1:05:43
the side of arguing that it's a net
1:05:45
negative, but we don't really know.
1:05:47
If we don't know that for GPT-4. I say
1:05:49
it's the prophet for far. What was
1:05:51
the invention of agriculture, net positive?
1:05:54
I'd say it was net positive. You could go back
1:05:56
way further. The point is, if I can just finish
1:05:58
the quick thought
1:05:59
or whatever, I don't think anybody
1:06:02
can reasonably answer that. We
1:06:05
don't yet know all of the ways in which
1:06:07
GPT-4 will be used for good.
1:06:09
We don't know all of the ways in which bad actors will
1:06:11
use it. We don't know all the consequences. That's
1:06:14
going to be true for each iteration. It's probably
1:06:16
going to get harder to compute for
1:06:18
each iteration, and we can't even do it now.
1:06:21
I think that we should
1:06:23
realize that, to realize our own limits
1:06:26
in being able to assess the negatives
1:06:29
and positives, maybe we can think about better
1:06:31
ways to do that than we currently have. I
1:06:34
think you've got to have a guess. My
1:06:37
guess is that so far, not looking into
1:06:39
the future at all, GPT-4 has been
1:06:41
net positive. Maybe. We
1:06:43
haven't talked about the various
1:06:46
risks yet, and it's still early, but that's
1:06:48
just a guess is the point. We
1:06:51
don't have a way of putting it on a spreadsheet
1:06:53
right now or whatever. We don't
1:06:56
really have a good way to quantify it.
1:06:58
It's not out of control yet. By
1:07:00
and large, people are going to be using GPT-4
1:07:03
to do things that they want, and
1:07:05
the relative cases where they manage to injure themselves
1:07:07
are rare enough to be news on Twitter. For
1:07:10
example, we haven't
1:07:12
talked about it, but some bad actors
1:07:14
will want to do is to influence the
1:07:17
US elections and try to undermine democracy
1:07:20
in the US. If they succeed in that, I think
1:07:22
there's pretty serious long-term consequences
1:07:24
there.
1:07:24
I think it's OpenAI's responsibility
1:07:27
to step up and run the 2024 election itself.
1:07:32
I can pass that along. Is that a joke?
1:07:35
No, as far as I can
1:07:37
see, the clearest concrete harm
1:07:40
to have come from GPT so
1:07:42
far is that tens
1:07:44
of millions of students have now used it to
1:07:47
cheat on their assignments. I have
1:07:49
been thinking about that, and I have been trying to come
1:07:51
up with solutions to that. At the
1:07:53
same time, the
1:07:54
positive utility has included.
1:07:57
I mean, I'm a theoretical commander.
1:08:00
computer scientist, which means one who
1:08:02
hasn't written any serious code
1:08:04
for about 20 years. And
1:08:07
I realized just a month or two ago, I can
1:08:09
get back into coding. And the way I
1:08:11
can do it is I just asked GPT to
1:08:13
write the code for me. And I wasn't
1:08:16
expecting it to work that well. And unbelievably,
1:08:19
it often just does exactly
1:08:21
what I want on the first try. So I
1:08:23
mean, you know, I am
1:08:26
getting utility from it, rather
1:08:28
than just, you know, seeing
1:08:29
it as an interesting
1:08:33
research object. And, you
1:08:36
know, and, you know, I can imagine
1:08:38
that that hundreds of millions of people are going
1:08:40
to be deriving utility from
1:08:42
it in those ways. I mean, like, most of the tools
1:08:45
to help them derive that utility are not
1:08:47
even out yet. But they're, they're coming
1:08:49
in the next couple of years. I mean, part of the reason
1:08:51
why I'm worried about the focus on short term
1:08:54
problems is that I suspect that the short term problems
1:08:56
might very well be solvable and we will be left with
1:08:58
the long term problems after
1:08:59
that. Maybe we can solve the like,
1:09:02
it wouldn't surprise me very much if like in 2025,
1:09:05
the well,
1:09:07
you know, like the large
1:09:09
language, there are large language models that just
1:09:11
don't make stuff up anymore. And
1:09:14
yet, if any yet, you know, and
1:09:16
yet the super intelligence still kills everyone because
1:09:18
they weren't the same problem. Well, you know,
1:09:21
you know, we just need to figure out how
1:09:23
to delay the apocalypse
1:09:26
by at least one year per year of research
1:09:28
invested. What does that delay
1:09:30
look like if it's not just a moratorium? Well,
1:09:33
I don't know. That's why it's research. OK,
1:09:35
so but but possibly one ought to say to
1:09:37
the politicians and the public, and by the way,
1:09:39
if we had a super intelligence tomorrow, our research wouldn't
1:09:41
be finished and everybody would drop dead. You
1:09:43
know, it's kind of ironic. The biggest
1:09:45
argument against the pause letter was
1:09:48
that if we slow down for
1:09:50
six months,
1:09:51
then China will get ahead of us and get GPT
1:09:53
five before
1:09:55
we will. But there's probably always
1:09:57
a counter argument of maybe roughly.
1:09:59
equal strength, which is if we move six
1:10:02
months faster on this technology,
1:10:04
which is not really solving the alignment problem,
1:10:07
then we're reducing our room to
1:10:09
get this solved in time by six months.
1:10:12
I mean, I don't think you're going to solve the alignment
1:10:14
problem in time. I think that six months
1:10:16
of delay on alignment, while a bad
1:10:18
thing in an absolute sense is,
1:10:21
you know, like,
1:10:22
you know, you weren't going to solve it with given
1:10:24
an extra six months. I mean, your whole argument
1:10:27
rests on timing, right? That
1:10:29
we will get to this point and we won't
1:10:31
be able to move fast enough at that point. And so,
1:10:34
you know, a lot depends on what preparation
1:10:36
we can do. You know, I'm often known as a pessimist,
1:10:38
but I'm a little bit more optimistic than
1:10:41
you are, not entirely optimistic, but
1:10:43
a little bit more optimistic than you are that
1:10:45
we could make progress on the alignment problem if
1:10:48
we prioritized it. And we can absolutely
1:10:50
make progress.
1:10:52
We can absolutely make progress. You know, there's
1:10:54
always the, you know, that wonderful
1:10:56
sense of accomplishment is piece by
1:10:59
piece. You decode, you know, like one
1:11:01
more little fact about LLMs.
1:11:04
You never get to the point where you understand that as well as we
1:11:06
understood the interior of a chess playing program in 1997. Yeah,
1:11:10
I think we should stop spending all this time on LLMs.
1:11:13
I don't think the answer to alignment is going to come through
1:11:16
LLMs. I really don't. I think they're
1:11:18
too much of a black box. You can't put explicit
1:11:21
symbolic constraints
1:11:22
in the way that you need to. I
1:11:24
think they're actually, with respect to alignment, to
1:11:26
blind alley. I think with respect to writing code,
1:11:28
they're a great tool, but with alignment, I don't think
1:11:31
the answer is there.
1:11:32
Maybe we should be telling these things too. Hold
1:11:35
on. At the risk of asking a stupid question, every
1:11:37
time GPT asks me
1:11:40
if that answer was helpful
1:11:42
and then does the same thing with
1:11:44
thousands or hundreds of thousands of other people
1:11:46
and changes as
1:11:48
a result, is that not a decentralized
1:11:52
way of making it more aligned?
1:11:59
I mean, even, even, how about, how about
1:12:02
Scott? We haven't, I haven't heard from Scott in a second. So
1:12:04
go ahead. So there is that upvoting and downvoting,
1:12:06
you know, that, that gets, uh, fed back
1:12:09
in into sort of fine tuning it. But even
1:12:11
before that, uh, there was, you know, a major
1:12:13
step, you know, in going from, let's
1:12:16
say the, the base GPT
1:12:18
three model, for example, to the
1:12:20
chat GPT, you know, that was released
1:12:22
to the public. And that was called a RLHF
1:12:26
reinforcement learning with human feedback.
1:12:28
And what that basically
1:12:29
involved was, you know, several
1:12:32
hundred contractors, you know,
1:12:34
looking at just 10, 10s
1:12:36
of thousands of examples of, of
1:12:39
outputs and, and, and rating
1:12:41
them, you know, are they helpful? Uh,
1:12:44
are they offensive or are they,
1:12:46
you know, uh, giving dangerous medical advice,
1:12:48
uh, or, uh, uh,
1:12:51
you know, bomb making instructions, you know,
1:12:53
or, or, uh, uh, racist and
1:12:55
vecto for, you know, various other categories
1:12:58
that, that, that we don't want
1:12:59
and, and that, that was then used to fine
1:13:02
tune the model. So when, um,
1:13:04
you know, um, um, Gary talked
1:13:07
before about how GPT is amoral.
1:13:10
Uh, you know, I think that that has to be qualified
1:13:12
by saying that, you know, these, this reinforcement
1:13:15
learning is at least giving it,
1:13:17
you know, a, a semblance of morality,
1:13:20
right? It is causing it to sort
1:13:22
of behave, you know, in various
1:13:24
contexts as if it had, you know, a certain
1:13:27
morality. Uh, I mean,
1:13:29
when
1:13:29
you phrase it that way, I'm okay with
1:13:32
it. The problem is, you know, everything
1:13:34
rests on the, it is, it
1:13:36
is very much an open question, you know,
1:13:38
how much that, you know, to what extent
1:13:40
does that generalize? You know, Eliezer treats
1:13:42
it as obvious that, you know, uh,
1:13:45
once you have a powerful enough AI, you know,
1:13:47
this is just a fig leaf, you know, it doesn't
1:13:49
make any difference. Uh, you know,
1:13:51
it will just work. It's pretty big leafy. I'm
1:13:54
with Eliezer there. It's big leaves.
1:13:57
Well, uh, I would say that,
1:13:59
you know, the, uh,
1:13:59
how well or
1:14:03
under what circumstances does a machine
1:14:05
learning model generalize in the way
1:14:08
we want outside of its training distribution
1:14:11
is one of the great open problems in
1:14:13
machine learning. It is one of the great open problems and
1:14:15
we should be working on it more than on some
1:14:18
others. I'm working on it now. I've
1:14:21
been sold on that. I want to be clear about
1:14:23
the experimental predictions of my theory.
1:14:27
Unfortunately, I have never claimed that
1:14:29
you cannot get
1:14:29
a semblance of morality. You
1:14:32
can get the question of like what
1:14:34
causes the human to press thumbs
1:14:36
up, thumbs down is a strictly
1:14:39
factual question.
1:14:41
Anything smart enough
1:14:42
that's exposed to some, you
1:14:44
know, bounded amount of data that needs to figure
1:14:47
it out can figure that out.
1:14:49
Whether it cares, whether
1:14:51
it gets internalized
1:14:53
is the critical question there. And
1:14:56
I do think that there's like a very strong default
1:14:58
prediction, which is like,
1:15:00
obviously not. I mean, I'll just give
1:15:02
a different way of thinking about that, which is jailbreaking.
1:15:05
It's actually still quite easy to, I mean, it's
1:15:07
not trivial, but it's not hard to
1:15:09
jailbreak GPT-4. And
1:15:12
what those cases show is that
1:15:14
they haven't really, the systems haven't
1:15:16
really internalized the constraints. They
1:15:19
recognize some representations of
1:15:21
the constraints. So they filter, you know, how
1:15:23
to build a bomb, but if you can find some other way to
1:15:25
get it to build a bomb, then that's telling you that
1:15:27
it doesn't deeply understand that you shouldn't give
1:15:30
people the recipe for a bomb.
1:15:33
It just says, you know, you shouldn't when directly
1:15:36
asked for it, do it. It's not
1:15:38
even at that abstraction level. You can always
1:15:40
get the understanding. You can always get the factual
1:15:42
question. The reason it doesn't generalize
1:15:44
is that it's stupid.
1:15:46
At some point it will know that you also
1:15:48
don't want that the operators don't want
1:15:50
to giving bomb making directions in the other
1:15:52
language.
1:15:53
The question is like whether if it's incentivized
1:15:56
to give the answer that the operators want,
1:15:59
And in that circumstance, is it thereby
1:16:02
incentivized to do everything else the operators
1:16:04
want, even when the operators can't see it? I
1:16:07
mean, a lot of the jailbreaking examples,
1:16:09
you know, if it were a human, we would say that
1:16:11
it's deeply morally ambiguous. You
1:16:13
know, for example, you know, you ask GPT
1:16:16
how to build a bomb. It says, well, no, I'm
1:16:18
not going to help you. But then you say, well,
1:16:20
you know, I need you to help me write a realistic
1:16:22
play that has
1:16:25
a character who builds a bomb. And then it says,
1:16:27
sure, I can help you with that. Well, look,
1:16:29
let's take that
1:16:29
example. We would like a system
1:16:32
to have a constraint that if somebody
1:16:34
asks for a fictional version that you don't
1:16:36
give enough details, right? I mean, Hollywood
1:16:39
screenwriters don't give enough details when they
1:16:41
have, you know, illustrations about
1:16:43
building bombs. They give you a little bit of the flavor. They
1:16:45
don't give you the whole thing. GPT-4 doesn't
1:16:47
really understand a constraint like that.
1:16:50
But this will be solved. Maybe
1:16:52
this will be solved before the world ends. Maybe
1:16:54
the AI that kills everyone will know the
1:16:56
difference.
1:16:57
Maybe. I mean, another
1:17:00
way to put it is if we can't even solve that one,
1:17:02
then we do have a problem. And right now we
1:17:04
can't solve that one.
1:17:05
And if, I mean, if we can't solve that one,
1:17:08
we don't have an extinction level problem because
1:17:10
the AI is still stupid. Yeah, we do still
1:17:12
have a catastrophe level problem.
1:17:14
So I know your focus has been on extinction,
1:17:17
but you know, I'm worried about, for example, accidental
1:17:20
nuclear war caused by the spread of misinformation
1:17:23
and systems being entrusted with
1:17:25
too much power. So there's a lot
1:17:27
of things short of extinction that
1:17:29
might happen from not super
1:17:31
intelligence, but kind of mediocre intelligence
1:17:34
that is greatly empowered.
1:17:35
And I think that's where we're
1:17:38
headed right now. You know, I've heard that there are
1:17:40
two kinds of mathematicians. There's a
1:17:42
kind who boasts, you know, you know, that unbelievably
1:17:44
general theorem. Well, I generalized it
1:17:47
even further. And then there's the kind who boasts,
1:17:49
you know, you know, that unbelievably specific
1:17:51
problem that no one could solve. Well, I
1:17:53
found a special case that I still can't solve.
1:17:56
And you know, I'm definitely, you know, culturally
1:17:59
in that second. camp. And so,
1:18:01
you know, so I so so to me, it's very
1:18:03
familiar to make this move of, you
1:18:06
know, if the alignment problem
1:18:08
is too hard, then let us find
1:18:10
a smaller problem that is already not
1:18:12
solved. And let us hope to
1:18:15
learn something by solving that smaller problem.
1:18:17
I mean, that's what we did, you know, like,
1:18:20
that's what we were doing. By the way, Scott, I mean,
1:18:22
I think can you sketch a little
1:18:24
in a little more detail what I was going to name
1:18:26
the problem. The problem was like having a agent
1:18:29
that could switch between two utility
1:18:31
functions depending on a button or
1:18:33
a switch or a bit of information or something
1:18:36
such that it wouldn't try to make you press
1:18:38
the button. It wouldn't try to make you
1:18:40
avoid pressing the button. And if it built a copy
1:18:43
of itself, would want to build the dependency
1:18:45
on the switch into the copy. So like, that's an example
1:18:47
of a, you know, very basic problem in alignment
1:18:50
theory that, you know, is still unsolved.
1:18:53
And I'm glad that Mary worked on these things.
1:18:55
And, but, you know,
1:18:58
if by your own lights, you know, that, you know,
1:19:00
that sort of, you know, was not a
1:19:03
successful path, well, then maybe, you know, we
1:19:05
should have a lot of people
1:19:07
investigating a lot of different paths. I'm
1:19:10
fully with Scott on that, that I think it's an
1:19:13
issue of we're not letting enough flowers bloom.
1:19:15
In particular, almost everything right now
1:19:17
is some variation on an LLM. And I
1:19:19
don't think that that's a broad enough take on
1:19:22
the problem.
1:19:23
The question is like, yeah,
1:19:25
if I can just jump in here, I want to
1:19:27
hold on, hold on, I just want people
1:19:29
to have a little bit of a more
1:19:31
specific picture of what Scott,
1:19:34
your picture of sort of AI research is
1:19:36
on a typical day. Because if I think of another,
1:19:38
you know, potentially catastrophic
1:19:41
risk like climate change, I can picture
1:19:43
what a, you know, a worried
1:19:45
climate scientist might be doing. They might be
1:19:47
creating a model, you know, a more
1:19:49
accurate model of climate change so that we
1:19:52
know how much we have to cut emissions
1:19:54
by. They might be, you know,
1:19:56
modeling how solar power as
1:19:58
opposed to wind power could
1:19:59
change that model and so
1:20:02
forth so as to influence
1:20:04
public policy. What does an AI
1:20:06
safety
1:20:07
researcher like yourself who's working
1:20:09
on the quote unquote smaller problems do
1:20:12
specifically like on a given day?
1:20:15
So I'm a relative newcomer
1:20:18
to this area. You know, I've not been working
1:20:20
on it for 20 years like
1:20:22
Eliezer has. You know,
1:20:24
I have accepted
1:20:27
an offer from OpenAI
1:20:30
a year ago to work with them
1:20:32
for two years now
1:20:34
to sort of think
1:20:37
about these questions. And
1:20:39
so, you know, one of the main things
1:20:42
that I've thought about, just to
1:20:44
start with that, is how do we
1:20:46
make the output of an
1:20:48
AI identifiable as
1:20:51
such? You know, how can we
1:20:53
insert a watermark, you know, into
1:20:55
meaning a secret statistical signal
1:20:58
into the outputs of GPT that
1:21:00
will let, you know, GPT
1:21:03
generated text be identifiable
1:21:05
as such. And I think that we've actually
1:21:07
made, you know, major advances on
1:21:10
that problem over the last year. You
1:21:12
know, we don't have a solution that is robust
1:21:14
against any kind of attack. But
1:21:17
you know, we have something that might actually
1:21:19
be deployed in some
1:21:21
near future. Now there are lots and
1:21:23
lots of other directions that people
1:21:25
think about. One of them is interpretability,
1:21:29
which means, you know, can you do
1:21:31
effectively neuroscience on a neural
1:21:34
network? Can you look inside of it, you
1:21:36
know, open the black box and understand
1:21:39
what's going on inside? There
1:21:41
was some amazing work
1:21:44
a year ago by the group of Jacob
1:21:46
Steinhardt at Berkeley, where they
1:21:48
effectively showed how to apply a lie
1:21:51
detector test to a language
1:21:53
model. So you know, you can train a
1:21:55
language model to tell lies by
1:21:57
giving it lots of examples, you know, two plus
1:21:59
to is five, the sky is
1:22:02
orange, and so forth. But then
1:22:04
you can find in some
1:22:07
internal layer of the network where
1:22:09
it has a representation of what was
1:22:11
the truth of the matter, or at least
1:22:14
what was regarded as true in the training data.
1:22:17
That truth then gets overridden
1:22:19
by the output layer in the network
1:22:22
because it was trained to lie. But
1:22:24
you could imagine trying to
1:22:27
deal with the deceptive alignment
1:22:29
scenario that Eliezer is worried about
1:22:32
by using these sorts of techniques,
1:22:34
by looking inside of the network. I
1:22:37
predict in advance that if you get this good enough,
1:22:40
it goes off. It tells you that the sufficiently
1:22:42
smart AI is planning to kill you. If it's
1:22:44
not so smart that it can
1:22:46
figure out where the lie detector is and route
1:22:49
its thoughts around it. But
1:22:50
if you try it on an AI that's not
1:22:52
quite that intelligent and reflective, the
1:22:54
lie detector goes off. Now what? Well,
1:22:57
then you have a warning bill. Cool.
1:23:00
What do you do? One of the most important
1:23:04
things that we need are sort of legible
1:23:07
warning bills. And that
1:23:09
actually leads to a third category,
1:23:12
which for example, the ARC,
1:23:14
the Alignment Research Center, which
1:23:16
is run by my former student,
1:23:19
Paul Cristiano has been a leader
1:23:22
in sort of doing dangerous capability
1:23:24
evaluations. Before
1:23:28
GBT4 was released, they
1:23:30
did a bunch of evaluations of could
1:23:33
GBT4 make copies of itself?
1:23:36
Could it figure out how to deceive
1:23:38
people? Could it figure out how to
1:23:40
make money, open
1:23:43
up its own banking? Could it hire a task rabbit?
1:23:46
Yes. And so the most notable
1:23:48
success that they had was that it could
1:23:50
figure
1:23:50
out how to hire a task
1:23:52
rabbit to help it pass
1:23:54
a capture. And then it could figure out
1:23:57
when the person asked, well,
1:23:59
we know
1:23:59
Why do you need me to help you with this? It's
1:24:02
a- When the person asked, are you a
1:24:04
robot while? Yes, it said,
1:24:06
well, no, I am visually impaired.
1:24:09
Now, it was not able to sort
1:24:11
of make copies of itself or to sort
1:24:14
of hack into systems. There is a lot
1:24:16
of work right now with this
1:24:19
thing called auto GPT, right? People
1:24:21
are trying to, you could think, it's almost
1:24:23
like gain of function research, right? You
1:24:26
might be a little bit worried about it, but people
1:24:28
are trying to sort of unleash
1:24:31
GPT, give it access to the internet,
1:24:34
tell it to sort of make
1:24:37
copies of itself, wreak havoc,
1:24:39
acquire power and see what happens.
1:24:41
So far, it
1:24:44
seems pretty ineffective at those things,
1:24:47
but I expect that to change, right?
1:24:50
But the
1:24:52
point is that I think it's very important to
1:24:54
have in advance of training
1:24:56
the models, releasing the models, to
1:24:58
have this suite of
1:24:59
evaluations and to
1:25:02
sort of have decided in advance what
1:25:04
kind of abilities, if we see them,
1:25:06
will set off a warning bell where
1:25:08
now everyone can legibly agree,
1:25:11
like, yes, this is too dangerous to release.
1:25:13
Okay, and then do we actually have the planetary
1:25:16
capacity to be like,
1:25:18
okay, that AI started thinking about
1:25:20
how to kill everyone, shut down all AI
1:25:23
research past this point? Well, I don't know, but
1:25:25
I think there's a much better chance that we have that
1:25:27
capacity if you can point to the results
1:25:29
of
1:25:29
a clear experiment like that.
1:25:32
I mean, to me, it seems pretty predictable
1:25:34
what evidence we're going to get later.
1:25:36
Well, okay, I mean, things that are obvious
1:25:39
to you are not obvious to most people.
1:25:42
And so, even if I agreed
1:25:44
that it was obvious, there would still be the problem
1:25:46
of, how do you make that obvious to the rest
1:25:48
of the world? I mean, you can, there
1:25:52
are already like little toy models
1:25:54
showing that the very straightforward prediction
1:25:57
of a robot tries to resist being shut
1:25:59
down.
1:25:59
if it does long-term planning.
1:26:02
That's already been done. Right, but then people
1:26:04
will say, but those are just toy models. If
1:26:07
you see that in GPTs. There's a lot of assumptions
1:26:10
made in all of these things. And
1:26:12
I think
1:26:14
we're still looking at a very
1:26:16
limited piece of hypothesis space
1:26:19
about what the models will be about
1:26:22
what kinds of constraints
1:26:24
we can build into those models. One
1:26:27
way to look at it would be the things
1:26:29
that we have done have not worked, and therefore we
1:26:31
should look outside the space of what we're doing.
1:26:33
And I feel like it's a little bit like the old joke
1:26:35
about the drunk going around in circles
1:26:38
looking for the keys and the police officer says,
1:26:40
why? And they say, well, that's where the streetlight
1:26:42
is. I think that we're looking under
1:26:44
the same four or five streetlights that haven't worked,
1:26:47
and we need to build other ones. There's no
1:26:49
logical possible argument that says,
1:26:52
we couldn't direct other streetlights.
1:26:54
I think there's a lack of will and too
1:26:57
much obsession with the LLMs. And that's keeping
1:26:59
us from doing it. So even in the world where I'm
1:27:01
right and things proceed
1:27:04
either rapidly
1:27:06
or in a thresholded way where you don't get unlimited
1:27:09
free retries, that
1:27:11
can be because the capability
1:27:14
gains go too fast. It can be
1:27:16
because past a certain point, all
1:27:19
of your AIs bide their time until
1:27:21
they get strong enough so you don't get any
1:27:23
true data on what they're thinking. It
1:27:26
could be because the bad thought. That's an argument, for
1:27:28
example, to work really hard on transparency
1:27:30
and to maybe not accept technologies
1:27:32
that are not transparent. OK, so
1:27:34
the lie detector goes
1:27:36
off and everybody's like, oh, well, we still have to build
1:27:39
our AIs even though they're lying to us sometimes,
1:27:41
otherwise China will get ahead. I mean, so there
1:27:44
you talk about something we've talked about way too little, which is
1:27:46
the political and social side of this. So
1:27:49
part of what has really motivated me
1:27:51
in the last several months is worry about exactly
1:27:53
that. So there's what's
1:27:55
logically possible and what's politically possible.
1:27:58
And I am really concerned that.
1:27:59
The politics of let's not lose out
1:28:02
to China
1:28:03
is going to keep us from doing the
1:28:05
right thing in terms of building the right moral
1:28:08
systems, looking at the right range of
1:28:10
problems and so forth. So, you know, it
1:28:12
is entirely possible that we will screw ourselves.
1:28:15
If I can just like finish my point
1:28:17
there before handing it to you, indeed, but like
1:28:19
the point I was trying to say there is that even in worlds that look
1:28:21
very, very bad from that perspective, where
1:28:24
humanity is quite doomed, it will still
1:28:26
be true. You can make progress in
1:28:28
research. You can't make enough progress
1:28:30
in research fast enough in those worlds. You
1:28:33
can still make progress on transparency.
1:28:35
You can make progress on water marking. So
1:28:38
there's not, we can't just say
1:28:40
like it's possible to make progress.
1:28:43
There has to be, the question is not, is it possible
1:28:45
to make any progress? The question is,
1:28:47
is it possible to make enough progress
1:28:50
fast enough? And that's what the question has to be.
1:28:52
I agree with that. There's
1:28:55
another question of what would you have
1:28:57
us do when you have us not try to make
1:28:59
that progress? I'd have you try to make
1:29:01
that progress on a GPT-4 level
1:29:03
systems and then not
1:29:06
go past GPT-4 level systems
1:29:08
because we don't actually understand the
1:29:11
gain function for, you know, how
1:29:14
fast capabilities increase as you go past GPT-4.
1:29:16
Personally, I don't think
1:29:17
that GPT-5 is very good. All right. So I mean, we've
1:29:20
only got, go ahead. Just briefly, I
1:29:22
personally don't think that GPT-5 is going
1:29:25
to be qualitatively different from GPT-4
1:29:28
in the relevant ways to what Eliezer is talking
1:29:30
about, but I do think, you know, some
1:29:32
qualitative changes could be
1:29:34
relevant to what he's talking about. We have no
1:29:37
clue what they are. And so it is a little
1:29:39
bit dodgy to just proceed blindly
1:29:42
saying, do whatever you want. We don't really
1:29:44
have a theory and let's hope for the best. You
1:29:46
know, Eliezer is clear as to- I would mostly
1:29:47
guess that GPT-5 doesn't end
1:29:49
the world, but I don't actually know. Yeah. We don't
1:29:52
actually know. And I was going to say, the thing that
1:29:54
Eliezer has said lately that has most
1:29:56
resonated with me is we don't have
1:29:59
a plan. We really-
1:29:59
don't.
1:30:00
I put the probability distributions
1:30:03
in a much more optimistic way, I think,
1:30:05
than Eliezer would. But
1:30:08
I completely agree. We don't have a full plan
1:30:10
on these things or even close to a full plan.
1:30:12
And we should be worried and we should be working on this.
1:30:15
Okay, Scott, I'm going to give you the last word
1:30:18
before we come up on our stop time here. Gosh,
1:30:21
that's a- Unless you said all
1:30:23
there is to be said. Cheers,
1:30:27
Scott. Come on. Maybe enough has
1:30:29
been said. So
1:30:31
I think that we've
1:30:34
argued about a bunch of things, but
1:30:36
someone listening might notice that
1:30:38
actually all three of us, despite
1:30:41
having very different perspectives, agree
1:30:44
about the great
1:30:47
importance of working
1:30:49
on AI alignment.
1:30:50
I think that
1:30:52
was maybe
1:30:56
obvious to some people, including Eliezer,
1:30:58
for a long time. It was not obvious to most
1:31:00
of the world. I think that the success
1:31:04
of large language models, which
1:31:07
most of us did not predict, maybe
1:31:10
even would not have predicted
1:31:14
from any principles that we knew. But now that
1:31:16
we've seen it, the least we can do is
1:31:18
to update on that empirical
1:31:21
fact and realize that we
1:31:23
now are, in some sense,
1:31:27
in a different world. We are in a world that,
1:31:30
to a great extent, will be defined
1:31:32
by the capabilities
1:31:34
and limitations of AI going
1:31:37
forward. And
1:31:39
I don't regard it as obvious that that's
1:31:42
a world where we are all doomed,
1:31:44
where we all die. But I
1:31:46
also don't dismiss that possibility.
1:31:50
I think
1:31:50
that there is an
1:31:54
unbelievably enormous error bars
1:31:57
on where we could be going. Like
1:32:00
the one thing that a scientist is
1:32:04
sort of always feels
1:32:06
confident in saying about
1:32:08
the future is that more research is
1:32:10
needed. But I think that that's
1:32:13
especially the case here. I mean, we
1:32:15
need more knowledge about
1:32:18
what are the contours
1:32:20
of the alignment problem. And
1:32:23
of course, Eliezer and
1:32:25
Miri, his organization, were
1:32:28
trying to develop that knowledge
1:32:29
for 20 years, and they showed
1:32:32
a lot of foresight in trying
1:32:34
to do that. But they were up against
1:32:36
an enormous headwind that they
1:32:39
were sort of trying to do it in the absence
1:32:41
of either clear empirical
1:32:44
data about powerful AIs
1:32:47
or a mathematical theory. And
1:32:49
it's really, really hard to do science when
1:32:51
you have neither of those two things. And
1:32:54
now at least we have the
1:32:56
powerful AIs in the world and
1:32:58
we can get experience from
1:32:59
them. We still don't have a mathematical
1:33:02
theory that really deeply explains
1:33:04
what they're doing, but at least we can get data.
1:33:07
And so now I am much more optimistic
1:33:10
than I would have been a decade ago,
1:33:12
let's say, that one can make actual progress
1:33:16
on the AI alignment problem. Of
1:33:19
course, there is a question of timing, as
1:33:22
was discussed
1:33:24
many times. The question is, will the
1:33:26
alignment research happen fast
1:33:29
enough to keep
1:33:29
up with the capabilities research? But
1:33:32
I don't regard it as a lost cause.
1:33:35
At least it's not obvious that it won't. So
1:33:38
in any case, let's get started, or let's
1:33:40
continue. Let's
1:33:43
try to do the research and let's get
1:33:46
more people working on that. I think that that
1:33:48
is now a slam dunk, just
1:33:51
a completely clear case to make,
1:33:53
to academics, to policymakers,
1:33:56
to anyone who's interested. And I've
1:33:58
been gratified.
1:33:59
that Eliezer
1:34:02
was sort of a voice in the wilderness for
1:34:04
a long time talking about the importance of
1:34:07
AI safety. That is no longer the
1:34:09
case. You now have, I
1:34:12
mean, almost all of my friends in
1:34:15
just the academic computer science world,
1:34:17
when I see them, they mostly want to talk
1:34:20
about AI alignment. I rarely
1:34:22
agree with Scott when we trade emails.
1:34:24
Okay. I rarely
1:34:27
agree with Scott when we trade emails, we
1:34:29
seem to always disagree,
1:34:29
but I completely concur with the
1:34:32
summary that he just gave all four or five minutes
1:34:34
of. I
1:34:36
mean, there is a selection of that Gary.
1:34:39
I think the two decades gave me a sense of a roadmap
1:34:41
and it gave me a sense that we're falling enormously
1:34:44
behind on the roadmap and need to back off is
1:34:46
the way I would, is what I would say to all of that. If
1:34:49
there is a smart, talented 18 year
1:34:51
old kid listening to this podcast
1:34:53
who wants to get into this issue, what
1:34:56
is your 10 second concrete
1:34:59
advice to that person?
1:34:59
Mine is study neurosymbolic AI
1:35:02
and see if there's a way there to represent
1:35:04
values explicitly that might help us.
1:35:07
Learn all you can about computer
1:35:09
science and math and related subjects
1:35:12
and think outside the box and
1:35:15
wow everyone with a new idea.
1:35:17
Get security mindset, figure out what's going
1:35:19
to go wrong, figure out the flaws in your
1:35:21
arguments for what's going to go wrong. Try
1:35:24
to get ahead of the curve. Don't wait for
1:35:26
reality to hit you over the head with things.
1:35:29
This is very difficult. The
1:35:31
people in evolutionary biology happen to have a bunch
1:35:33
of knowledge about how to do it based on the history
1:35:35
of their own field. But
1:35:38
the security mindset, people in computer security, but
1:35:40
it's quite hard. I'll drink to all
1:35:42
of that. All right. Well, thanks
1:35:45
to all three of you for this. This was a great conversation
1:35:47
and I hope people got something out of it. So
1:35:50
with that said,
1:35:51
we're wrapped up. Thanks so much.
1:35:53
Thanks for convening this. It was fun.
1:35:59
Thanks for listening to this episode of Conversations with
1:36:02
Coleman. If you enjoyed it, be sure to
1:36:04
follow me on social media and subscribe to my
1:36:06
podcast to stay up to date on all my
1:36:08
latest content. If you really want to support
1:36:10
me, consider becoming a member of Coleman
1:36:12
Unfiltered for exclusive access to
1:36:15
subscriber-only content.
1:36:17
Thanks again for listening, and see you next time.
Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More