Podchaser Logo
Home
Will AI Destroy Us? - AI Virtual Roundtable

Will AI Destroy Us? - AI Virtual Roundtable

Released Friday, 28th July 2023
 1 person rated this episode
Will AI Destroy Us? - AI Virtual Roundtable

Will AI Destroy Us? - AI Virtual Roundtable

Will AI Destroy Us? - AI Virtual Roundtable

Will AI Destroy Us? - AI Virtual Roundtable

Friday, 28th July 2023
 1 person rated this episode
Rate Episode

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:02

Music

0:30

Welcome to another episode of Conversations with

0:32

Coleman. If you're hearing this, then you're

0:34

on the public feed, which means you'll get episodes

0:36

a week after they come out and you'll hear advertisements.

0:39

You can get access to the subscriber feed by going

0:42

to ColemanHughes.org and becoming a supporter.

0:44

This means you'll have access to episodes a week early.

0:47

You'll

0:47

never hear ads

0:48

and you'll get access to bonus Q and A episodes.

0:51

You can also support me by liking and subscribing

0:54

on YouTube and sharing the show with friends and family.

0:56

As always, thank you so much

0:58

for your support.

1:01

Music Welcome

1:03

to another episode of Conversations with Coleman.

1:06

Today's episode is a round table discussion

1:08

about AI safety with Eliezer

1:10

Yudkowsky, Gary Marcus, and Scott

1:13

Aronson. Eliezer Yudkowsky is a prominent

1:15

AI researcher and writer

1:17

known for co-founding the Machine Intelligence

1:20

Research Institute, where he spearheaded

1:22

research on AI safety. He's also

1:24

widely recognized for his influential writings

1:27

on the topic of rationality. Scott

1:29

Aronson is a theoretical computer scientist

1:31

and author celebrated for his pioneering

1:33

work in the field of quantum computation.

1:36

He's also the chair of CompSci at U of T

1:38

Austin, but is currently taking a leave of absence

1:41

to work at OpenAI. Gary

1:42

Marcus is a cognitive scientist,

1:45

author, and entrepreneur known for

1:47

his work at the intersection of psychology, linguistics,

1:50

and AI. He's also authored several

1:52

books, including Kluge and

1:55

Rebooting AI, Building AI

1:57

We Can Trust. This episode is all about

1:59

AI safety.

1:59

We talk about the alignment problem. We

2:02

talk about the possibility of human extinction

2:04

due to AI. We

2:05

talk about what intelligence actually is.

2:08

We talk about the notion of a singularity or

2:10

an AI take-off event and much more.

2:13

It was really great to get these three guys in the same

2:15

virtual room. And I think you'll find that this conversation

2:18

brings something a bit fresh to a topic

2:20

that has admittedly been beaten to death

2:22

on certain corners of the internet. So without

2:24

further ado, Eliezer Jutkowski,

2:27

Gary Marcus, and Scott Aronson.

2:30

Thanks so much for coming on my show. Thank you.

2:32

Thanks for having us. So

2:33

the topic of today's conversation

2:35

is AI safety. And this is something that's been in the

2:37

news lately. We've seen experts and CEOs signing letters recommending

2:40

public policy surrounding regulation. We

2:47

continue to have the debate between people that really feel like they're

2:49

going to be continue

2:55

to have the debate

2:57

between people that really

3:00

fear AI is going to end the

3:02

world and potentially kill

3:03

all of humanity and the people who feel

3:05

that those fears are overblown.

3:09

And so

3:11

this is going to be sort of a roundtable conversation

3:13

about that. And you three are

3:16

really three of the best people in the world to talk

3:18

about it with. So thank you all for doing this.

3:20

Let's just start out with you,

3:22

Eliezer, because you've been one of the most really

3:27

influential voices getting people

3:29

to take seriously the possibility that AI

3:31

will kill us all. You know, why is

3:33

AI going to destroy us? Chat GPT seems

3:35

pretty nice. I use it every day. What's

3:38

the big fear here? Make the case. Well, chat

3:40

GPT seems quite unlikely to kill everyone

3:42

in its present state. AI capabilities

3:45

keep on advancing and advancing. The question

3:47

is not can chat GPT kill us? The

3:50

answer is probably

3:50

no. So as long as that's true,

3:53

as long as it hasn't killed us yet, they're

3:55

just going to engineer. It's just going to keep pushing the capabilities.

3:58

There is no obvious blocking point. We don't

4:01

understand the things that we

4:03

build. The AIs are grown more

4:05

than built, you might say. They end up

4:07

as giant inscrutable matrices of floating

4:09

point numbers that nobody can decode. It's

4:12

probably going to end up technically difficult to

4:14

make them want particular things and not

4:16

others. And where

4:18

people are just charging straight ahead. So

4:21

at this rate, we end up with something that is smarter

4:23

than us, smarter than humanity, that

4:26

we don't understand, whose preferences

4:28

we could not shape. And by default,

4:31

if that happens, if you have something around that is like, much

4:33

smarter than you and does not care about you one way

4:36

or the other, you probably end up dead at the end of

4:38

that. It's the way

4:40

it gets the most of whatever strange inscrutable

4:43

things that it wants are worlds

4:45

in which there are not humans taking

4:47

up space, using up resources,

4:50

building other AIs to compete with it, or just

4:52

a world in which you built enough power plants that

4:54

the surface of the earth got hot enough that humans didn't

4:57

survive. Gary, what do you have to say about

4:59

that? There are parts that

5:00

I agree with and parts that I don't. So I agree

5:03

that we are likely to wind up with AIs

5:06

that are smarter than us. I don't think we're particularly

5:08

close now, but in 10 years or 50 years

5:10

or 100 years at some point, it

5:13

could be a thousand years, but it will happen. I

5:15

think there's a lot of anthropomorphization

5:17

there about machines wanting things. Of

5:20

course they have objective functions and

5:22

we can talk about that. I think it's a presumption

5:24

to say that the default is that

5:27

they're gonna want something that leads to our demise

5:30

and that they're gonna be effective at that and

5:32

be able to literally kill us all.

5:34

I think if you look at the history of AI,

5:37

at least so far, they don't really have

5:39

wants beyond what we program them to

5:41

do. There is an alignment problem. I think

5:43

that that's real in the sense of like,

5:46

people program the system to do X and

5:48

they do X prime. That's kind of like X, but not

5:50

exactly. And so I think there's really things

5:52

to worry about. I think there's a real research

5:54

program here that is under-researched,

5:57

which is the way I would put

5:59

it. We want to understand how to make machines

6:02

that have values. Asimov's laws are way

6:04

too simple, but they're a kind of starting point for conversation.

6:07

We want to program machines that don't

6:09

harm humans. They can calculate the consequences

6:12

of their actions. Right now we have technology

6:14

like GPT-4 that has no idea what the consequences

6:17

of its actions are. It doesn't really

6:19

anticipate things. And there's a separate thing that Eliezer

6:21

didn't emphasize, which is it's not just how smart

6:23

the machines are, but how much power we give them, how

6:26

much we empower them to do things like

6:28

access the internet or manipulate

6:29

people or write

6:33

source code, access files and stuff like that.

6:36

And right now, auto-GPT can do all of those things,

6:38

and that's actually pretty disconcerting to me. To

6:40

me, that doesn't all add up to any

6:42

kind of extinction risk anytime

6:45

soon. But catastrophic risk, where things

6:47

go pretty wrong because we wanted

6:49

these systems to do X and we

6:51

didn't really specify it well, they don't really understand

6:53

our intentions. I think there are risks

6:55

like that. I don't see it as a default that

6:57

we wind up with extinction. I think it's pretty hard

7:00

to actually terminate the entire human

7:02

species. You're going to have people in Antarctica

7:05

that are going to be out of harm's way or whatever, or

7:07

you're going to have some people who respond

7:09

differently to any pathogen, et cetera. So

7:11

extinction is a pretty extreme

7:14

outcome that I don't think is particularly likely.

7:17

But the possibility that these machines

7:19

will cause mayhem because we don't know

7:21

how to enforce that they do what we want

7:23

them to do, I think that's a real thing to worry about. And

7:26

it's certainly worth doing research on.

7:28

Scott, how do you view this? Yeah, so I'm sure

7:30

that you can get the three of us arguing about

7:32

something, but I think you're going to get agreement

7:34

from all three of us that AI safety

7:37

is important and that catastrophic

7:40

outcomes, whether or not that

7:43

means literal human extinction are possible.

7:46

I think it's become apparent

7:49

over the last few years that this

7:52

century is going to be

7:55

largely defined by

7:58

our interaction.

7:59

with AI, that AI

8:02

is going to be transformative for human

8:05

civilization.

8:07

And I'm confident

8:10

of that much. And if you ask

8:12

me almost anything beyond that about

8:15

how is it going to transform civilization,

8:17

will it be good? Will it be bad? What

8:19

will the AI want? I

8:21

am pretty agnostic, just

8:24

because if you had asked

8:26

me 20 years ago to try to forecast

8:28

where we are now, I would have

8:31

gotten a lot wrong. My

8:34

only defense is I think that

8:37

all of us here, almost everyone in

8:39

the world, would have gotten a lot wrong

8:42

about where we are now. And so if I

8:44

try to envision where we are in 2043,

8:47

does the AI want to replace

8:49

humanity with

8:55

something better? Does it want to keep

8:57

us around as pets? Does

9:01

it want to just continue

9:04

helping us out? Just

9:08

a super souped up version of chat GPT?

9:11

I think all of those scenarios

9:14

merit consideration. But

9:16

I think that what has happened

9:18

in the last few years that's

9:20

really exciting is that AI

9:23

safety has become an empirical

9:25

subject. There are these very powerful

9:27

AIs that are now being deployed, and

9:30

we can actually learn something. We

9:33

can work on

9:36

mitigating the nearer term harms,

9:37

not because the

9:41

existential risk doesn't exist or

9:46

is absurd or is science fiction or anything

9:48

like that, but just because the nearer term

9:50

harms are the ones that we can see right

9:53

in front of us and where we can actually get feedback

9:56

from the external world about how we're

9:58

doing. We can learn something.

9:59

And hopefully some of the knowledge

10:02

that we gain will be useful in

10:05

addressing the longer term risks that

10:07

I think Eliezer is very rightly worried about. So

10:09

there seems to me there's alignment and

10:11

then there's alignment, right? So there's alignment

10:14

in the sense that we haven't even fully

10:16

aligned smartphone technology with our interests,

10:18

right? Like there are some ways in which

10:21

smartphones and social media have

10:24

led to probably deleterious mental health

10:26

outcomes, especially

10:28

for teenage girls,

10:29

for example. So there

10:32

are those kinds of mundane senses

10:34

of alignment where it's like, is this technology

10:37

doing more good than

10:39

harm in the normal everyday public

10:41

policy sense? And then there's the capital

10:43

A alignment of, are we creating

10:46

a creature that is going to view

10:48

us like ants and have no

10:51

problem extinguishing us, and

10:56

whether intentional or not. So it

10:58

seems to me all of you agree

10:59

that the first sense of alignment

11:02

is at the very least something to worry about now and

11:04

something to deal with. But I'm

11:06

curious to what extent you think the really

11:09

capital A sense of alignment is

11:11

a real problem because it can sound very much

11:13

like science fiction to people. So

11:16

maybe let's start with Eliezer. I

11:19

mean, from my perspective, I would say that

11:22

if we had a solid guarantee that

11:24

I was going to do no more harm

11:26

than social media, we ought to plow

11:28

ahead and get all the gains. It's not

11:31

enough harm to back this amount of

11:33

harm that social media has done to humanity while

11:35

very significant in my view. I think it's done

11:37

like a lot of damage to our sanity, but that's

11:40

just like not a large enough harm

11:42

to justify either forgoing

11:44

the gains that you could get from A.I. If

11:47

that was going to be the worst downside

11:49

or to justify the kind of drastic measures you'd

11:51

need to stop plowing ahead on A.I.

11:54

I think that the capital A alignment

11:56

is beyond this generation. You

11:58

know, I've started. I've

12:01

watched it, watched over it for two decades. I

12:04

feel like in some ways the modern generation plowing

12:06

in with their eyes on the

12:09

short term stuff is like losing track

12:11

of the larger problems because they can't solve the larger

12:13

problems and they can't solve the little problems. But we're just like

12:16

plowing straight into the big problems and

12:19

we're going to go plow right into the big problems with

12:21

a bunch of little solutions that aren't going to scale. I think

12:24

it's lethal. I think it's at the

12:26

scale where you just back off and

12:28

don't do this. By back off

12:29

and don't do this, what do you mean?

12:32

I mean, have

12:33

an international

12:36

treaty about where the chips

12:39

capable of doing AI training go and

12:41

have them all going into licensed,

12:44

monitored data centers and

12:47

not have the training runs for

12:50

AIs more powerful than GPT-4,

12:53

possibly even lowering that threshold over time

12:55

as algorithms improve and it gets

12:57

possible to train more powerful AIs using

13:00

less compute. So you're picturing a kind of international

13:03

agreement to just stop.

13:05

International moratorium. And

13:07

if North Korea steals the GPU

13:09

shipment, then you've got to be ready

13:11

to destroy their data

13:14

center that they build by conventional means. And

13:16

if you don't have that willingness in advance, then

13:18

countries may refuse to sign up for the agreement being like,

13:21

why aren't we just like ceding the

13:23

advantage to someone else then? It

13:25

actually has to be a worldwide shutdown because

13:28

the scale of harm from a super intelligence,

13:31

it's not that if you have 10 times

13:33

as many super intelligences, you've got 10

13:35

times as much harm. It's not that a super

13:37

intelligence only wrecks the country that

13:39

built the super intelligence. Any super intelligence

13:41

everywhere is anyone's last problem.

13:44

So Gary and Scott, if either of you want to jump in

13:46

there, I mean, is there, is AI

13:49

safety a matter of forestalling

13:52

the end of the world and all of these smaller

13:54

issues and pass

13:56

towards safety that Scott, you mentioned are just, you

13:58

know,

13:59

throwing, I don't know what the

14:02

analogy is, but pointless essentially.

14:04

What do you guys make of this?

14:07

The journey of a thousand miles begins

14:09

with a step. The way I think about this comes

14:12

from 25 years

14:17

of doing computer science research,

14:19

including quantum computing

14:21

and computational complexity,

14:24

things like that, where we have these gigantic

14:26

aspirational problems that we we

14:29

don't know

14:29

how to solve. And yet,

14:32

a year after year, we do make

14:34

progress. We pick off little subproblems.

14:36

And if we can't solve those, then we find

14:39

subproblems of those. And we keep

14:41

repeating until we find something that we can

14:43

solve. And this is,

14:45

I think, for centuries, the way that science

14:48

has made progress. Now, it is

14:50

possible that this time,

14:52

we just don't have enough time for that

14:54

to work. And I think that

14:56

is what Eliezer is fearful

14:59

of, that we just

14:59

don't have enough time for the ordinary

15:02

scientific process to

15:04

work before AI becomes too powerful,

15:08

in which case, you start talking about

15:10

things like a global moratorium

15:14

enforced with the threat of war. I

15:17

am not ready to go there. I

15:20

could imagine circumstances where

15:22

maybe I say, gosh, this

15:25

looks like such an imminent threat that we

15:27

have to. But I tend to be very, very worried in

15:33

general about causing a catastrophe

15:35

in the course of trying to prevent a catastrophe.

15:38

And I think when you're

15:40

talking about threatening

15:43

airstrikes against data centers or things like that,

15:45

then that's an obvious worry. I'm

15:48

sort of somewhere in between, I guess. I

15:51

don't think that there's... So I'm somewhat

15:53

in between here.

15:54

I'm with Scott that we are not

15:57

at the point where we should be bombing data centers. I don't think we're going

15:59

to be able to do that.

15:59

close to that. I'm much less what

16:02

the right word is to use here. I

16:04

don't think we're anywhere near as close to

16:07

AGI as I think Eliezer sometimes sounds

16:09

like. I don't think GPT-5 is

16:11

anything like AGI, and I'm not particularly

16:14

concerned about who gets it first and so

16:16

forth. On the other hand, I think that we're

16:18

in a sort of dress rehearsal mode. Nobody

16:21

expected GPT-4, really, chat

16:23

GPT to percolate as fast as it

16:25

did. It's a reminder that there's

16:28

a social side to all of this, how software

16:30

gets distributed matters, and

16:32

there's a corporate side. It was a kind

16:34

of galvanizing moment for me when Microsoft

16:37

didn't pull Sydney, even though Sydney did some

16:39

awfully strange things. I thought they would take

16:41

it for a while, and it's a reminder that they can make whatever decisions

16:44

they want. We kind of multiply that by

16:46

Eliezer's concerns about what do

16:48

we do and at what point what

16:51

would be enough to cause

16:53

problems. It is a reminder, I think, that we

16:55

need, for example, to start roughing out these

16:58

international treaties now because there could become

16:59

a moment where there is a problem. I don't think

17:02

the problem that Eliezer sees is here now,

17:04

but maybe it will be. And maybe when it does

17:07

come, we will have so many people pursuing

17:09

commercial self-interest and so little infrastructure

17:12

in place we won't be able to do anything. So I think

17:14

it really is important to think now. If

17:16

we reach such a point, what are we going to

17:18

do? What do we need to build in place

17:20

before we get to that point?

17:26

Question. Have you ever wondered about the impact

17:28

of your charitable donation? I mean, how

17:30

much good can your contribution really do?

17:33

The answer is not always easy to find, but if you're

17:35

invested in making a difference in the world, then

17:37

allow me to introduce you to GiveWell. GiveWell

17:40

is not your typical charity platform. They

17:42

spend countless hours researching charitable

17:44

organizations, diving deep into evidence

17:46

and hard data. Over the past 15 years,

17:49

they've been vetting, scrutinizing, and only

17:51

recommending the highest impact opportunities

17:53

they found. Their process involves a team

17:56

of 25

17:56

researchers who put in over 40,000 hours

17:58

each year

17:59

to maximize the impact of your donations.

18:02

Over 100,000 donors have already

18:04

trusted GiveWell to allocate their donations

18:07

wisely. For just $5, you can provide

18:09

a bed net to prevent malaria. $7 can

18:11

provide a child with malaria treatment through the high

18:14

malarial season. Just $1 can deliver

18:16

a vitamin A supplement to a child, a deficiency

18:18

of which can increase mortality rates. Even

18:21

as little as 160 bucks can vaccinate

18:23

an infant, helping prevent diseases and reduce

18:25

child mortality. This is how GiveWell operates.

18:28

They measure, they model, they review, and they forecast

18:30

the impact, all to ensure that your donations

18:33

are used in the best possible way. And

18:35

the best part? All of their research and

18:37

recommendations are available for

18:39

free on their website. So if you wanna

18:41

make an informed decision about high impact

18:44

giving, head over to givewell.org. When

18:46

you make a donation, let them know you heard

18:49

about them from me. Just select podcast

18:51

and enter conversations with Coleman at checkout.

18:54

Remember, GiveWell does not take a cut from

18:56

your donation. All of it goes directly to

18:58

help those who need it most. Once again, head

19:01

over to givewell.org. And when you

19:03

make a donation, let them know you heard about

19:05

them through me. Just select podcast

19:07

and enter conversations with Coleman at checkout.

19:10

What's a game where no

19:12

one wins?

19:17

The waiting

19:20

game. When it comes to hiring, don't

19:22

wait for great talent to find you. Find them

19:24

first. When you're hiring, you need Indeed.

19:27

Indeed's the hiring platform where you can attract, interview

19:29

and hire all in one place. Instead

19:31

of spending hours on multiple job sites

19:33

searching for candidates with the right skills, Indeed's

19:36

a powerful hiring platform that can help you

19:38

do it all. They streamline hiring with powerful

19:40

tools that find you matched candidates. With Instant

19:42

Match, over 80% of employers get quality

19:45

candidates whose resume on Indeed matches their

19:47

job description the moment they sponsor a

19:49

job. Candidates you invite to apply are

19:51

three times more likely to apply to your job

19:53

than candidates who only see it in search. Indeed

19:56

gets you one step closer to the hire by

19:58

immediately matching you with quality. candidates.

20:01

Indeed does the hard work for you. Indeed

20:03

shows you candidates whose resumes on Indeed

20:05

fit your description immediately after

20:07

you post so you can hire faster. Indeed's

20:10

hiring platform matches you with quality

20:12

candidates instantly. Even better, Indeed's

20:14

the only job site where you only pay for applications

20:17

that meet your must-have requirements. Indeed

20:19

is an unbelievably powerful hiring

20:21

platform, delivering four times more

20:23

hires than all other job sites combined,

20:25

according to TalentNest. Join more than 3

20:28

million businesses worldwide

20:29

that use Indeed to hire great

20:32

talent fast. Start hiring now with

20:34

a $75 sponsored job credit

20:36

to upgrade your job posts at Indeed.com

20:39

slash conversations. Offer only

20:41

good for a limited time. Claim your $75 credit right

20:44

now at Indeed.com slash

20:47

conversations. Just go to Indeed.com

20:49

slash conversations and support the show

20:51

by saying you heard about it on this podcast. Indeed.com

20:54

slash conversations. Terms and conditions apply.

21:01

So we've been talking about

21:03

this concept of artificial general intelligence

21:07

and I think it's worth asking

21:09

whether that is a useful

21:12

coherent concept. So

21:14

for example if I were to think by analogy

21:16

to athleticism and think

21:18

of the moment when we build a machine

21:21

that has say

21:23

artificial general athleticism meaning

21:26

it's you know better than LeBron

21:28

James at basketball but also better

21:30

at the at curling than the world's best

21:33

curling player and also better at soccer

21:35

and also better at archery and

21:38

so forth.

21:39

It would seem to me

21:41

that there's something a bit strange

21:43

as framing it as having reached a

21:46

point on a single continuum.

21:49

It seems to me you would sort of have to build

21:52

each capability each sport

21:54

individually and then somehow figure

21:57

how to package them all into one robot

21:59

with without each skill set detracting

22:02

from the other. Is that a disanalogy?

22:06

Do you all picture this intelligence

22:09

as sort of one dimension, one

22:11

knob that is going to get turned

22:13

up along a single

22:16

axis? Or do you think that way

22:18

of talking about it is misleading

22:20

in the same way that I kind of just sketched

22:23

out?

22:23

I would absolutely not accept that.

22:26

I like to say that intelligence is not a one dimensional

22:28

variable. There's many

22:31

different aspects to intelligence.

22:34

There's not, I think, going to be a magical moment when

22:36

we reach the singularity or something like that.

22:40

I would say that the core of artificial general intelligence

22:42

is the ability to flexibly

22:44

deal with new problems that you haven't

22:47

seen before. Current

22:49

systems can do that a little bit, but not very well. My

22:52

typical example of this now is GPT-4

22:55

is exposed to the game of chess, sees

22:57

lots of games of chess, it sees the rules of chess,

23:00

but it never actually figures out the rules of chess

23:02

and makes illegal moves and so forth. It's

23:04

in no way a general intelligence that can

23:06

just pick up new things. Of course, we have things

23:08

like AlphaGo that can play a certain set of

23:11

games, or AlphaZero really, but

23:13

we don't have anything that has the generality

23:15

of human intelligence. Human

23:18

intelligence is just one example of general

23:20

intelligence. You could argue that chimpanzees

23:22

or crows have another variety of general intelligence.

23:26

I would say the current machines don't really have it,

23:29

but they will eventually.

23:30

I think a priori, it

23:33

could have been that you

23:35

would have math ability,

23:37

you would have verbal ability, you'd

23:39

have ability to understand

23:42

humor, and they'd all be just completely

23:44

unrelated to each other. That is

23:46

possible. In fact,

23:50

already with GPT, you can say that

23:52

in some ways, it already is a

23:54

superintelligence. It knows

23:57

vastly more, can converse on

23:59

a vastly greater scale.

23:59

range of subjects than any human

24:02

can. And in other ways,

24:05

it seems to fall short of

24:08

what humans know or

24:10

can do. But you

24:13

also see this sort of generality

24:16

just empirically. So, I mean,

24:19

GPT was sort of trained

24:22

on all the text on the internet.

24:26

You know, let's say most of the

24:28

text on the open internet.

24:29

So it was just one

24:32

method. It was not explicitly

24:35

designed to write code, and yet

24:37

it can write code. And at the

24:39

same time as that ability emerged,

24:42

you also saw the ability to

24:44

solve word problems, like

24:47

high school level math. You saw

24:49

the ability to write poetry. This

24:52

all came out of the same system without

24:54

any of it. You know, being explicitly

24:57

optimized for. And so I

24:59

feel like I need

24:59

to interject one important thing, which is

25:02

it can do all these things, but none of them all that reliably. OK.

25:06

Nevertheless, I mean, compared to, you know,

25:08

what let's say what my expectations

25:10

would have been if you'd ask me 10 or 20 years

25:12

ago, I think that the level of generality

25:15

is pretty remarkable. And, you

25:18

know, and it does lend support to the

25:20

idea that there is some sort of general

25:22

quality of understanding there where

25:24

you could say, for example, that GPT-4

25:27

has more of it than GPT-3,

25:29

which

25:29

in turn has more than GPT-2. And

25:32

I would say that it does

25:34

seem to me like it's presently pretty unambiguous

25:37

that GPT-4 is in some sense

25:39

dumber than in adult or

25:42

even teenage human. That's not obvious

25:44

to me. Why would you say that? I think that we

25:46

will eventually get that obvious, too. I

25:49

mean, to take the example I just gave you a minute

25:51

ago, it never learns to play chess, even

25:53

with a huge amount of data. So

25:56

it will play a little bit of chess.

25:58

It will memorize the openings and be able to play chess.

25:59

okay for the first 15 moves, but

26:02

it gets far enough away from what it's trained on and it falls

26:04

apart. This is characteristic of

26:06

these systems. It's not really characteristic in the same

26:08

way of adults or even

26:10

teenage humans. Almost

26:12

anything that it does, it does unreliably. And to

26:14

give another example, you can ask a

26:17

human to write a biography of someone and don't

26:19

make stuff up. And you really can't ask GPT

26:21

to do that. Yeah. Like, it's a bit

26:24

difficult because you could always be cherry picking

26:26

something that humans aren't usually good at. But

26:28

to me, it does seem like there's this

26:29

broad range of problems that don't

26:32

seem especially to play to humans

26:34

strong points or machine

26:37

weak points where GPT4 will

26:39

do no better than a

26:42

seven year old on those problems. Hold on. Can I

26:44

interject here? I do

26:46

feel like these examples are cherry

26:48

picked because if I just take a different,

26:51

very typical example, I'm writing an op-ed

26:54

for the New York Times, say, about any given

26:56

subject in the world. And my choice is to

26:58

have a smart 14

26:59

year old next to me with anything that's

27:02

in his mind already or GPT. And

27:04

there's no comparison, right? So

27:07

which of these sort of examples is

27:09

the litmus test for a human or a intelligent?

27:12

If you did it on a topic

27:14

where it couldn't rely on memorized

27:16

text, you might actually change your mind

27:19

on that. So the thing about writing a

27:21

Times op-ed is most of the things that

27:24

you propose to it, there's actually something

27:26

that it can pastiche together from its dataset.

27:28

That doesn't mean that it really understands what's going

27:31

on. It doesn't mean that that's a general capability.

27:33

Also, as the human, you're doing all the hard

27:35

parts, right? Like, obviously,

27:37

a human is going to prefer if

27:40

a human has a math problem, you're going to rather use a calculator

27:43

than another human. And similarly,

27:44

with the New York Times op-ed, you're doing all

27:46

the parts that are hard for GPT4.

27:50

And then you're like asking GPT4 to just do

27:52

some of the parts that are hard for you. You're

27:54

always going to prefer an AI partner rather than a human

27:56

partner within that sort

27:59

of like the human can do all the human stuff and

28:02

you want an AI to do whatever the AI is good

28:04

at at the moment. An analogy that's maybe

28:06

a little bit helpful here is driverless cars. It

28:09

turns out that

28:10

on highways and ordinary traffic, they're probably

28:13

better than people. In unusual

28:15

circumstances, they're really worse than people. So

28:17

Tesla not too long ago ran into a jet

28:19

and human wouldn't do that. Like slow

28:22

speed being summoned across a parking lot. Human

28:24

would never do that. So there are different

28:26

strengths and weaknesses. The strengths of

28:28

a lot of the current kinds of technology

28:30

is that they can either pastiche together

28:33

or make not literal analogies

28:35

when we go into the details, but to stored

28:38

examples and they tend to be poor when

28:40

you get to outlier cases. And

28:43

that's persistent across most of the technologies

28:46

that we use right now. And so if you

28:48

stick to stuff in which there's a lot of data,

28:50

you'll be happy with the results you get from these systems.

28:53

You move far enough away, not so much.

28:56

And what we're going to see over time is that

28:58

the length of the debate about whether or not

29:00

it's still dumber than you gets longer

29:02

and longer and longer. And then

29:05

if things are allowed to just keep running and nobody

29:07

dies, then at some point that switches over

29:10

to a very long debate about is

29:12

it smarter than you, which then gets shorter

29:14

and shorter and shorter and eventually reaches

29:16

a point where it's pretty

29:19

unambiguous if you're paying attention.

29:20

Now, I suspect that this process

29:22

gets interrupted by everybody dying. In

29:25

particular, there's a question of the point at which it

29:27

becomes better than you, better

29:29

than humanity at building the next edition

29:31

of the AI system and how fast do things snowball

29:34

once you get to that point. Possibly you do not

29:36

have time for further public

29:39

debates or even

29:41

a two hour Twitter space depending on how that goes. I

29:44

mean, some of the limitations of

29:46

JPT are

29:48

completely understandable, just from

29:50

a little knowledge of how it works. It

29:53

does not have an internal memory,

29:57

per se, other than what appears

29:59

on the screen.

29:59

the screen in front of you. So this

30:02

is why it's turned out to be so effective

30:04

to explicitly tell it, like, let's

30:07

think step by step when it's solving

30:09

a math problem, for example. You have to

30:11

tell it to show all of its work

30:13

because it doesn't have an internal

30:15

memory with which to do that. Likewise,

30:18

when people complain about it hallucinating

30:21

references that don't exist, well, the truth

30:24

is when someone asks me for

30:26

a citation, if I'm not

30:28

allowed to use Google, I might

30:29

have a vague recollection of

30:32

some of the authors and I'll probably

30:34

do a very similar thing to what GPT

30:36

does. I'll hallucinate. Right. So

30:38

well, no, there's a great phrase I learned the other day, which is

30:41

frequently wrong, never in doubt. That's

30:43

true. That's true. I'm not

30:46

going to make up a reference with full detail,

30:48

page numbers, titles, so

30:50

forth. I might say, look, I don't remember 2012 or something like

30:52

that. Here's

30:55

GPT-4, what it's going to

30:57

say is 2017, Aronson and Yudkowsky,

30:59

New York Times, pages 13 to 17.

31:03

No, it does need to get much, much better

31:06

at knowing what it doesn't know. And yet,

31:08

already I've seen a noticeable

31:11

improvement there, going from GPT-3

31:13

to GPT-4. For example,

31:16

if you ask GPT-3, prove that

31:18

there are only finitely many prime

31:20

numbers, it will give you a proof, even

31:23

though the statement is false. And

31:25

it will have an error, which is similar

31:27

to the errors on 1,000 exams

31:29

that

31:29

I've graded, just trying

31:32

to get something past you, hoping that

31:34

you won't notice. If you ask GPT-4,

31:37

prove that there are only finitely many prime

31:39

numbers, it says, no, that's a trick question. Actually,

31:42

there are infinitely many primes. And here's why.

31:44

Yeah. Part of the problem with doing the science

31:47

here is that I think you would know better since

31:49

you work part-time or whatever to open AI.

31:52

But my sense is that a lot of the examples

31:54

that get posted on Twitter, particularly

31:57

by the likes of Meme and other critics,

31:59

are not really good.

31:59

or other skeptics, I should say, is

32:02

the system gets trained on those. So,

32:05

you know, almost everything that people write about

32:07

it, I think, is in the training set. It's

32:09

hard to do the science when the system's constantly

32:11

being trained, especially in the RLA-HF

32:14

side of things. And we don't actually know what's in GPT-4,

32:17

so we don't even know if they're regular expressions

32:19

and, you know, simple rules to match things. So we

32:22

can't do the kind of science we used to be able to

32:24

do.

32:25

This conversation, this

32:27

subtree of the conversation, I think, has no

32:30

natural endpoint. So if I can sort

32:32

of zoom out a bit, I think there's a, you

32:34

know, pretty solid sense in which

32:37

humans are more generally intelligent than chimpanzees.

32:40

As you get closer and closer to

32:42

the human level, I

32:44

would say that the direction

32:47

here is still clear, that the comparison is still clear.

32:50

We are still smarter than GPT-4. This

32:52

is not going to take control of the world from us. But,

32:55

you know, the conversations get longer. It

32:58

gets, the definitions start to break

33:00

down around the edges. But I

33:02

think it also, as you keep going, like it comes

33:04

back together again, there's a point,

33:07

and possibly this point is like very close to

33:09

the point of time to where everybody dies. So maybe we

33:11

don't ever like see it in a podcast,

33:13

but there's a point where it's, you know, unambiguously

33:16

smarter than you. And including

33:19

like the spark of creativity,

33:22

being able to deduce things quickly

33:25

rather than with tons and tons of extra evidence,

33:27

strategy, cunning, modeling

33:29

people, figuring out how to manipulate

33:32

people.

33:33

So let's stipulate, Eliezer,

33:35

that we're going to get to machines that can do

33:37

all of that. And then the question is, what

33:39

are they going to do? Is it a certainty

33:42

that they will make our annihilation

33:44

part of their business? Is it a possibility?

33:47

Is it an unlikely possibility? I

33:49

think your view is that it's a certainty. I've

33:51

never really understood that part. It's a

33:53

certainty on the present tech is

33:55

the way I would put it. Like if that happened, so

33:58

in particular, like if that.

33:59

happen tomorrow, then you know, Modulo

34:02

Cromwell's rule never say certain. Like

34:05

my probability is like, yes, Modulo, the

34:08

chance that my model is somehow just completely

34:10

mistaken. If we got 50 years

34:13

to work it out and unlimited

34:15

retries, I'd be a lot more confident. I

34:18

think that'd be pretty OK. I think we'd make it. The

34:21

problem is that it's a lot harder to

34:24

do science when your first wrong try destroys

34:26

the human species, and then you don't get to try again.

34:28

I mean, I think there's something,

34:29

again, that I agree with and something I'm a little

34:32

bit skeptical about. So

34:33

I agree that the amount of time we have

34:36

matters. And I would also agree

34:38

that there's no existing technology

34:41

that

34:42

solves the alignment problem that gives a moral

34:44

basis to these machines. I mean, GPT-4

34:46

is fundamentally amoral. I don't think

34:48

it's immoral. It's not out to get us. But

34:51

it really is amoral. It

34:53

can answer trolley problems because there are trolley problems

34:56

in the data set. But that doesn't mean that it really has a moral

34:59

understanding of the world. And so if

35:01

we get to a very smart machine

35:03

by all the criteria that we've talked

35:06

about, and it's amoral, then that's a problem

35:08

for us. And there's a question of whether

35:10

if we can get to smart machines,

35:14

whether we can build them in a way that will have some

35:16

moral basis. And I think we need

35:18

to make progress. Well, I

35:20

mean, the first try part I'm not willing to let

35:22

pass. So I understand,

35:25

I think, your argument there. Maybe you should spell it out.

35:28

I think that we probably get more than one

35:30

shot and that it's not as

35:32

dramatic and instantaneous

35:33

as you think. I do

35:36

think one wants to think about sandboxing. One

35:38

wants to think about distribution. But I mean,

35:40

let's say we had one evil super

35:42

genius now who is smarter than everybody

35:44

else. Like, so what? One

35:47

super smart. Say again? Not

35:49

just a little smarter. Even a lot smarter.

35:52

Most super geniuses

35:56

aren't actually that effective. They're not that focused.

35:58

They were focused on other things.

35:59

You know, you're kind of assuming that

36:02

the first super genius AI

36:04

is gonna make it its business to annihilate

36:06

us And that's the part where I I still

36:08

am a bit stuck in the argument.

36:10

Yeah Some

36:12

of this has to do with the notion that

36:15

if you do a bunch of training you

36:17

start to get Gold direction

36:19

even if you don't explicitly train on that That

36:22

goal direction is a natural way to

36:24

achieve higher capabilities the reasons why

36:27

reason why humans want things is that wanting

36:29

things is an effective way of getting things and

36:32

and so natural selection in

36:34

the process of selecting

36:36

exclusively on reproductive fitness just

36:38

on that one thing got us to

36:41

want a bunch of things that correlated with reproductive

36:44

fitness thing the ancestral distribution because wanting

36:47

having intelligences that

36:49

want things is a good way of getting

36:51

things that's in a sense like

36:54

You know wanting comes from the same place as

36:56

intelligence itself and you could even

36:58

you know from a certain technical standpoint on expected

37:01

utility Say that intelligence is a special

37:03

is a very effective way of wanting planning

37:06

plotting pass through time that lead to particular outcomes

37:09

So so part of it is that I think it I

37:12

do not think you get like the breeding super intelligence

37:14

that Wants nothing because I don't think

37:16

that wanting an intelligence can be

37:18

pride internally pride apart that easily I think

37:21

that the way you get super intelligence is is that

37:24

there are things that have gotten good at Organizing

37:27

their own thoughts and have good taste

37:29

in which thoughts to think

37:30

and that is where the high capabilities

37:33

come from Can I put a point to you le is around

37:35

it on this and then that does mean that they have internal

37:37

Let me just put this point to you I think it

37:39

look can let me just put the following point to you which I think

37:42

In my mind is similar to what Gary was saying As

37:49

We dive deeper into the heart of summer with

37:51

the Sun beaming down and the days inviting

37:53

us all to be more active We all need

37:55

wholesome convenient meals to keep us

37:57

going. I know that's certainly true for me And

38:00

that's where Factor comes in, America's number

38:02

one ready to eat meal kit. I have to tell

38:04

you, as someone who gets six protein-rich

38:06

meals from Factor every week, it's been

38:08

a game changer for me. The flavorful and

38:10

nutritious ready to eat meals are delivered right

38:12

to my door, saving me time and keeping me on

38:15

track with my wellness goals. We all know the struggle.

38:17

Summer plans, keeping us too busy to cook, yet

38:20

we still want to eat well. With Factor, the

38:22

grocery trips, the chopping, prepping, and

38:24

cleaning up are all things of the past, but

38:27

without compromising on flavor or nutritional

38:29

quality. These fresh, never frozen

38:31

meals are ready in just two minutes. All

38:33

I have to do is heat, enjoy, and get back to

38:35

soaking up the sun. Each meal from Factor

38:38

is like a treat. With high quality ingredients

38:40

like broccolini, leeks, asparagus,

38:42

and over 34 weekly restaurant quality

38:44

options like shrimp risotto, green goddess

38:46

chicken, grilled steakhouse filet mignon,

38:49

I always have a flavorful variety to choose

38:51

from. Factor's lunch to go is a lifesaver

38:54

when I'm busy. Wholesome grain bowls

38:56

and salad toppers keep my energy levels up.

38:58

No microwave even required

38:59

for those. So if you're like me and you want to enjoy

39:02

the summer without the hassle of meal prep, get

39:04

Factor. Just choose your meals and savor

39:06

the fresh flavor-packed meals delivered right

39:08

to your door. Here's the special

39:10

offer. Head to factormeals.com slash Coleman50

39:13

and use the code Coleman50 to get 50% off. That's

39:16

Coleman50 at factormeals.com slash

39:19

Coleman50 to get 50% off.

39:25

There's often in philosophy, this

39:27

notion of the continuum fallacy,

39:29

which in the canonical

39:32

example is like, you can't locate

39:35

a single hair that you would pluck from my head where

39:37

I would suddenly go from not bald

39:40

to bald. Or like the example, the even

39:43

more intuitive example is like a color wheel. Like

39:45

there's no single, on a gray

39:47

scale, there's no single pixel you

39:49

can point to and say, well, that's where gray begins

39:52

and white ends. And yet we have

39:54

this conceptual distinction that feels hard

39:56

and fast between

39:57

gray and white and gray and black and

39:59

so forth.

39:59

When we're talking about

40:02

artificial general intelligence or super

40:04

intelligence, you seem to

40:06

operate on a model where either

40:09

it's a super intelligence capable of destroying

40:11

all of us or it's not. Whereas

40:15

intelligence may just be a

40:17

continuum fallacy style spectrum

40:19

where we're first going to see the shades

40:22

of something that's just a bit more

40:24

intelligent than us and maybe it can kill

40:26

five people at most and then it can...

40:30

And when that happens, we're

40:32

going to want to intervene and

40:34

we're going to figure out how to intervene at that level

40:37

and so on and so forth. Well, if it's stupid enough to do it,

40:39

then yeah. Yeah, so... If it's stupid

40:41

enough to do it, then yes. Let

40:44

me buy the identical logic. There

40:46

should be nobody who steals money on

40:48

a really large scale, right? Because

40:50

you could just give them $5 and see if they steal that. And

40:54

if they don't steal that, you know you're good to trust them

40:56

with a billion. I mean, I think that

40:58

in actuality, anyone who did steal

41:00

a billion dollars probably displayed

41:03

some dishonest behavior

41:04

earlier in their life that was

41:07

unfortunately not acted upon

41:09

early enough. I'm actually

41:11

not even... The analogy... Yeah, but...

41:14

Hold on, hold on. The analogy I have pictures

41:16

like we have the first case of fraud that's $10,000

41:18

and then we build systems

41:20

to prevent it, but then they fail with a somewhat

41:23

smarter opponent, but our systems get better and

41:25

better and better. And so we actually prevent

41:27

the billion dollar fraud because of the systems

41:29

put in place in response to

41:31

the $10,000 frauds. I mean,

41:33

I think Coleman's putting his finger on an important

41:36

point here, which is how much do we get to iterate?

41:39

And Eliezer is saying, the minute

41:41

we have a super intelligent system, we won't

41:43

be able to iterate because it's all over immediately.

41:45

Well, there isn't a minute like that. The

41:47

way that the continuum goes to the threshold

41:50

is that you eventually get something that's smart enough

41:53

that it knows not to play its hand

41:55

early. And then if that thing, you

41:57

know, if you are still cranking up the power...

41:59

that and preserving its utility function,

42:02

it knows it just has to wait to be smarter,

42:04

to be able to win. It doesn't play its hand

42:07

prematurely. It doesn't tip you off. It's not in

42:09

its interest to do that. It's in its interest to cooperate

42:11

until it thinks it can win against humanity and

42:13

only then make its move. If it doesn't

42:16

expect the future smarter AIs to be smarter

42:18

than itself, then we might perhaps see these early AIs

42:20

telling humanity, don't build the later

42:22

AIs. And I would be sort

42:25

of surprised and amused if we ended

42:27

up in that particular sort of like science fiction

42:29

scenario

42:29

as I see it. But we're already in like

42:32

something that, you know, me from 10 years ago would have

42:34

called the science fiction scenario, which is the

42:36

things that I'll talk to you without being very smart. I

42:39

always come up, Eliezer, against

42:41

this idea that you're assuming

42:44

that the very bright machines, the super

42:46

intelligent machines will be malicious

42:49

and duplicitous and so forth. And I just

42:51

don't see that as a logical entailment

42:54

of being very smart. I mean,

42:57

they don't specifically want

42:59

as an end in itself for you to be destroyed.

43:02

They're just doing whatever obtains the most

43:04

of the stuff that they actually want, which doesn't specifically

43:07

have a term that's maximized by

43:09

humanity surviving and doing well.

43:12

Why can't you just hard code? Don't

43:15

do anything that will annihilate the human species. Don't

43:18

do anything.

43:18

We don't know how. There is no technology

43:21

to hard code. So there I agree

43:23

with you, but I think

43:25

it's important if I can just run for one second. I

43:28

agree that right now we don't have the technology

43:31

to hard code. Don't

43:33

do harm to humans. But for me, it

43:36

all boils down to a question of are we going to get to

43:38

the smart machines before we make progress

43:40

on that hard coding problem or not? And that

43:42

to me, that means that problem of hard

43:44

coding ethical values is actually

43:47

one of the most important projects that we

43:48

should be working on.

43:50

Yeah. Yeah. And

43:52

I tried to work on it 20 years in advance and capabilities

43:55

are just like running vastly out of alignment.

43:57

When I started working on this 20 years. like

44:00

two decades ago, we were in a sense

44:02

ahead of where we are now.

44:03

AlphaGo is much more controllable

44:06

than GPT-4. So there I agree

44:08

with you. We've fallen in love with a

44:10

technology that is fairly poorly

44:13

controlled. AlphaGo is very easily

44:15

controlled and very well

44:17

specified. We know what it does. We can more or less

44:19

interpret why it's doing it. And everybody's

44:22

in love with these large language models and they're

44:25

much less controlled. And you're

44:27

right, we haven't made a lot of progress on alignment.

44:30

So if we just go on a straight line, everybody

44:32

dies. I think that's, this

44:33

is an important fact. I would almost even

44:36

accept that for argument, but

44:38

ask then, just for the sake of argument, but

44:40

then ask, do we have to be on a straight line?

44:43

I mean, I would agree to the weaker claim

44:46

that, you know, we should certainly

44:48

be extremely worried about the intentions

44:51

of a super intelligence in the same way

44:53

that say chimpanzees should be

44:55

worried about the intentions of the

44:57

first humans that arise.

45:00

And in fact, chimpanzees

45:03

continue to exist in

45:05

our world only at human's pleasure. But I

45:07

think that there were a lot of other considerations

45:09

here. For example, if we imagined

45:12

that GPT-10

45:14

is the first unaligned super

45:17

intelligence that has these

45:20

sorts of goals, well, then, you know, it would be appearing

45:22

in a world where presumably GPT-9,

45:25

you know, already has very wide diffusion

45:28

and where people can use that

45:30

to try to, you know, you know, and GPT-9

45:33

was not destroying the world, you know, by

45:35

assumption. Why does GPT-9

45:37

work with humans instead of with GPT-10?

45:40

Well, I don't know. I mean, maybe, maybe,

45:42

maybe it does work with GPT-10, but,

45:45

you know, I just, I just don't view that

45:47

as a certainty. You know,

45:49

I mean, I think, you know, your certainty

45:51

about this

45:54

is the one place where I really get off

45:56

the train. Same with me.

45:58

I, well, I mean, I'm not asking. asking you

46:00

to share my certainty, I am

46:02

asking the viewers to believe

46:05

that you might end up with like more

46:08

extreme probabilities after you stare

46:10

at things for an additional couple of decades. That

46:12

doesn't mean you have to accept my probabilities immediately,

46:15

but I'm at least asking you to like not treat that as some kind

46:17

of weird anomaly.

46:19

You're just going to find those kinds of situations

46:21

in these debates. My view is that

46:24

I don't find the extreme probabilities

46:26

that you described to be plausible, but

46:28

I find the question that you're raising to be

46:30

important. I think maybe

46:33

straight line is too extreme, but this idea

46:35

that if you just follow current

46:37

trends, we're getting more, I'm sorry, we're

46:39

getting less and less controllable machines

46:42

and not getting more alignment.

46:45

Machines that are more unpredictable, harder to

46:47

interpret, and know better at sticking

46:49

to even a basic principle like be

46:52

honest and don't make stuff up. In fact,

46:54

that's a problem that other technologies don't really have.

46:56

Modeling systems, GPS systems don't make stuff

46:59

up. Google search doesn't make stuff

47:01

up. It will point to things that other people have made

47:03

stuff up, but it doesn't itself do it. In

47:05

that sense, the trend line is not great. I

47:07

agree with that. I agree that

47:10

we should be really worried about that and we should put

47:12

effort into it. Even if I don't agree with

47:14

the probabilities that you attach to it. I mean,

47:16

LESR- Let me interject with the question here. Go

47:19

ahead, Scott. Go ahead, Scott. No,

47:22

I mean, I think that LESR deserves

47:24

sort of eternal credit for raising

47:26

the issue. I was talking about the issues that were facing

47:28

these issues 20 years ago and it was very, very far from

47:31

obvious to most of us that they would be

47:33

live issues. I can say for

47:35

my part, I was familiar with

47:38

LESR's views since 2006 or so. When

47:43

I first encountered them, I

47:46

knew that there was no principle that this

47:48

scenario was impossible, but I just

47:51

felt like, well, supposing I agreed with

47:53

that,

47:54

what do you want me to do about it? Where

47:56

is the research program that has any hope

47:58

of making progress?

47:59

I mean, there's one question

48:02

of what are the most important problems in the world,

48:04

but in science, that's necessary

48:06

but not sufficient. We need something that we can make

48:08

progress on. And

48:11

that is the thing that I think

48:13

has changed just

48:16

recently with the advent of

48:18

actual very powerful AIs. And

48:21

so the sort of irony here is

48:23

that as Eliezer has

48:25

gotten much more pessimistic,

48:28

unfortunately, in the last few years

48:29

about alignment,

48:32

I've sort of gotten more optimistic.

48:35

I feel like, well, there is a research

48:37

program that we can actually

48:39

make progress on now. Yeah,

48:42

your research program is going to take 100 years and we don't

48:44

have 100 years. I don't know how long it will take. We

48:46

don't know that. Exactly. We

48:48

don't know. I think the argument that we should

48:50

put a lot more effort into it is clear.

48:53

I think the argument that will take 100 years is totally

48:55

unclear. I mean, I'm not even sure you can do

48:57

it in 100 years because there's the basic problem of

48:59

getting it right on the first try. And

49:01

the way these things are supposed to work in science is

49:04

that you have your bright-eyed optimistic youngsters

49:06

with their vastly oversimplified, hopelessly

49:08

idealistic, optimistic plan. They charge

49:11

ahead. They fail. They learn

49:13

a little cynicism. They learn a little pessimism. They

49:15

learn it's not as easy as that. They try

49:17

again. They fail again. They start to

49:19

build up something like battle-hardening.

49:23

And they find out how

49:25

little is possible to them. Aliezer,

49:27

this is a place where I just really don't agree

49:29

with you. So I think there's all kinds of things we can

49:31

do that are sort of of the flavor of

49:34

model organisms or simulations and

49:36

so forth. And we just mean it's

49:38

hard because we don't actually have a super intelligence

49:40

so we can't fully calibrate. But it's

49:43

a leap to say that there's nothing iterative that

49:45

we can do here or that we have to get it right

49:47

on the first time. I mean, I certainly see

49:49

a scenario where that's true. Where getting it

49:52

right on the first time

49:53

does make the difference. But

49:55

I can see lots of scenarios where it doesn't and where we do

49:57

have time to iterate before it happens, after

49:59

it happens.

49:59

happens. It's really not a single moment,

50:02

but I'm, you know, idealizing. I mean, the

50:04

problem is getting anything that generalizes

50:07

up to super intelligent level where

50:09

past some threshold level, the

50:11

minds may find that in their own interest to start lying

50:13

to you.

50:14

Even if that happens before super intelligent. Even

50:16

that, like I don't see a logical

50:19

argument that you can't emulate that or

50:21

study it. I mean, for example, you could, I'm just

50:24

making this up as I go along, but for example, you could study

50:26

what can we do with sociopaths who

50:28

are often very bright and, you know,

50:31

not tethered to our value. But yeah,

50:34

what can a,

50:35

what, what strategy can a

50:37

like 70 IQ honest person

50:40

come up with an invent themselves by

50:42

which they will outwit and defeat a 130 IQ sociopath.

50:45

All right. Well, there you're not being fair either in the sense

50:48

that, you know, we actually have lots of 150 IQ

50:51

people who could be working on this problem collectively

50:54

and there's, there's value in collective

50:56

action. There's literature. What I see,

50:59

what I see that gives me pause is that, is

51:01

that the people don't seem to appreciate what

51:03

about the problem is hard.

51:05

Even at the level where like 20 years

51:08

ago I could have told you it was hard until,

51:10

you know, somebody like

51:12

me comes along and nags them about it. And then they

51:15

talk about the ways in which they could adapt and be clever.

51:17

But the, but the people's charging straight forwards are

51:20

just sort of like doing it in this supernally

51:22

naive way. Let me share a historical

51:25

example that I think about a lot, which

51:27

is in the early 1900s, almost

51:29

every scientist on the planet who thought

51:32

about biology made a mistake. They all

51:34

thought that genes were proteins.

51:36

And then eventually Oswald Avery did

51:38

the right experiments. They realized that

51:41

genes were not proteins. There was this weird acid

51:43

and it didn't take long after people got

51:46

out of this stock mindset before

51:48

they figured out how that weird acid worked and how

51:51

to manipulate it and how to read the code that it

51:53

was in and so forth. So I

51:55

absolutely sympathize with the fact that

51:57

I feel like the field is stuck right now. I

52:00

think the approaches people are taking to alignment

52:02

are unlikely to work. I'm completely with

52:04

you there. But I'm also, I guess,

52:06

more long-term optimistic that

52:08

science is self-correcting and that we have a chance

52:11

here. Not a certainty, but I think

52:13

if we change research priorities

52:15

from how do we make some money off this large

52:18

language model that's unreliable to how

52:20

do I save the species, we might actually make progress.

52:23

There's a special kind of caution that you need when

52:25

something needs to be gotten correct on the first

52:28

try. I'd be very optimistic if people

52:29

got a bunch of free retries and I didn't

52:32

think the first one was going to kill, you know, the first

52:34

really serious mistake killed everybody and we didn't get

52:36

to try again. If we got free retries, it'd

52:38

be an ordinary, you know, it'd be in some sense an ordinary

52:40

science problem. Look, I can imagine

52:43

a world where we only got one try

52:45

and if we failed, then it destroys

52:47

all life on earth. And so let me

52:50

agree to the conditional statement that if

52:52

we are in that world, then I think that we're screwed.

52:55

I will agree with the same conditional statement. All

52:57

right. Yeah.

52:59

And this gets back to like, you know, if you

53:02

picture by analogy the process of,

53:04

you know, a human baby, which

53:06

is extremely stupid, becoming a human

53:08

adult

53:09

and then just extending that so

53:11

that in a single lifetime, this person

53:14

goes from a baby to

53:17

the smartest being that's ever lived.

53:20

But in the normal way

53:22

that humans develop, which is, you know, it doesn't

53:24

happen on any one given

53:26

day and each sub skill

53:28

develops a little bit at its own

53:30

rate and so forth, it would not

53:33

be at all obvious to me that our concerns

53:36

that we have to get it right vis-a-vis that

53:38

individual the first time. I agree. No,

53:41

well, pardon me. I do think we have to get them right the first

53:43

time, but I think there's a decent chance of getting it right.

53:45

It is very important to get it right the first time. It's

53:48

like you have this one person getting

53:50

smarter and smarter and not everyone else is getting

53:52

smarter and smarter.

53:53

Elias, I mean, one thing that you've talked about

53:55

recently is, you know, if we're all going to

53:58

die, then at least let us die with this. dignity.

54:01

So, you know, I mean,

54:03

some people might care about that more than others,

54:05

but I would say that, you know, one

54:08

thing that death with dignity would mean

54:10

is, well, at least, you know, if

54:12

there if we do get multiple

54:15

retries, and you know, we get a

54:17

eyes that let's say, try to take

54:20

over the world, but are really inept at

54:22

it, and that fail and so forth, at least

54:24

let us succeed in that world, you know, and

54:26

that's at least something that we can imagine

54:28

working on and making progress

54:29

on.

54:30

I mean, you may very it's for

54:32

it is not presently ruled out that you have some

54:35

like, you know, relatively smart

54:38

in some ways, dumb in some other ways, or

54:40

at least like not smarter than human in other ways,

54:42

AI that makes an early shot at

54:44

taking over the world, maybe because it expects future

54:46

AI is to not share its goals and not cooperate

54:49

with it, and it fails. And,

54:51

you know, I mean, the appropriate lesson to learn

54:53

there is to, you know, like shut the whole thing

54:56

down. But, you know, if we

54:58

so yeah, like I would say, so I'd be

55:00

like, yeah,

55:00

sure, like, wouldn't it be good to live in that

55:02

world? And the way you live in that world is that when

55:04

you get that warning sign, you shut it all down.

55:07

Here's a kind of thought experiment. GPT-4

55:10

is probably not capable of annihilating

55:12

us all. I think we agree with that about that.

55:14

But GPT-4 is certainly capable

55:17

of expressing the desire to annihilate

55:19

us all or being, you know, people have rigged

55:22

different versions that are, you know, more

55:24

aggressive and so forth. We

55:26

could say, look, until we can shut

55:28

down those versions,

55:30

you know, GPT-4s that are programmed

55:33

to be malicious by human intent, maybe

55:35

we shouldn't build GPT-5, or at least not GPT-6

55:38

or some other system, etc. We could say, you

55:40

know, what we have right now actually is part

55:42

of that iteration. We have, you know, primitive intelligence

55:45

right now. It's nowhere near as smart as a

55:47

super intelligence is going to be. But even

55:49

this one, we're not that good at constraining.

55:52

Maybe we shouldn't pass go until we

55:54

get this one right. I mean, the

55:56

problem with that from my perspective is that

55:58

I do think you

55:59

you can pass this test

56:01

and still wipe out humanity. Like

56:04

I think that there comes a point where your AI

56:06

is smart enough

56:07

that it knows which answer you're looking for.

56:10

And the point at which it tells you what you want to hear

56:13

is not the point that is internal motivation.

56:15

It's not sufficient, but it might be a logical

56:17

pause point, right? It might be that

56:19

if we can't even pass the test now

56:22

of, you know, controlling a

56:24

deliberate sort of fine

56:26

tune to be malicious version of GPT-4,

56:29

then we don't know what we're talking about and

56:31

we're playing around with fire. So passing that

56:33

test wouldn't be a guarantee that would be in

56:36

good stead with an

56:37

even smarter machine, but we really should

56:39

be worried, I think, that we're not

56:41

in a very good position with respect even to the

56:44

current ones. Gary, I of course

56:46

watched the recent congressional hearing

56:48

where you and Sam Altman were

56:51

testifying, you know, about what

56:53

should be done. Should there be auditing

56:56

of these systems, you know, before training,

56:58

before deployment? And, you know, maybe,

57:00

you know, the most striking thing about

57:03

that session was, you know, just

57:05

how little daylight there seemed to be between

57:07

you and Sam Altman, the

57:09

CEO of OpenAI. You know, he

57:12

was completely on board with

57:14

the idea of, you know, establishing a regulatory

57:17

framework for,

57:21

you know, having to clear the, you

57:24

know, more powerful systems before

57:26

they are deployed. Now, you know, in Eliezer's

57:29

worldview, that still would be woefully

57:31

insufficient shortly,

57:34

and, you know, we would still all be dead. But,

57:36

you know, maybe in your worldview

57:39

that, you know, it sounds like I'm

57:41

not even sure how much daylight there is. I mean,

57:43

the, you know, you now, you know, you know, have a

57:46

very, I think historically

57:48

striking situation where, you

57:50

know, the heads of all of the major

57:52

AI, or almost all

57:55

of the major AI organizations are,

57:57

you know, agreeing, saying, you know, please raise your hand.

57:59

Yes, this is dangerous. Yes, we need to be regulated.

58:02

I mean, I thought it was really striking. In

58:06

fact, I talked to Sam just before the

58:09

hearing started, and I had

58:11

just proposed an international agency

58:13

for AI. I wasn't the first person ever, but I pushed

58:15

it into my TED Talk and an economist op-ed

58:18

a few weeks before. And Sam said

58:20

to me, I like that idea. And

58:23

I said, tell them, tell the Senate. And

58:25

he did. And that kind of astonished me

58:27

that he did. I mean, we've had some

58:29

friction between the two of us in the past. And he

58:31

actually even attributed it to me. He said, I support what

58:33

Professor Marcus said about

58:36

doing international governance.

58:38

And there's been a lot of convergence around

58:40

the world on that. Is that enough to stop

58:43

Eliezer's worries? No,

58:45

I don't think so. But it's an important baby

58:47

step. I think that we do need

58:49

to have some global body that

58:51

can coordinate around these things. I don't think

58:54

we really have to coordinate around super

58:56

intelligence yet. But if we can't do any coordination

58:58

now, then when the time comes, we're not prepared.

59:02

So I think it's great that there's some agreement. I

59:04

worry that OpenAI had this lobbying

59:07

document that just came out that seemed not

59:09

entirely consistent with what Sam

59:11

said in the room. And there's always concerns

59:13

about regulatory capture and so forth. But I think

59:15

it's great that a lot of the

59:17

heads of these companies, maybe with the exception of

59:20

Facebook or Meta,

59:21

are recognizing that there are

59:24

genuine concerns here. I mean, the other moment

59:26

that a lot of people remember from the testimony

59:29

was when Sam was asked what he was most concerned

59:31

about. Was it jobs? And he said, no.

59:33

And I asked Senator Blumenthal to push Sam.

59:36

And Sam was, he could have been more

59:38

candid, but he was fairly candid. And he said he

59:40

was worried about serious harm to the species. I

59:43

think that was an important moment when he said that

59:45

to the Senate. And I think it galvanized a lot

59:48

of people that he said it.

59:49

So can we dwell on a moment? I mean, we've

59:51

been talking about the,

59:53

depending on your view, highly likely

59:56

or

59:57

tail risk scenario

59:59

of of humanity's extinction

1:00:02

or significant destruction,

1:00:04

it would appear to me by the same token,

1:00:06

if

1:00:07

those are plausible

1:00:10

scenarios we're talking about, then the

1:00:12

opposite maybe we're talking about as

1:00:14

well. What does it

1:00:16

look like to have a super intelligent

1:00:20

AI that

1:00:22

really, as a feature

1:00:24

of its intelligence,

1:00:26

deeply understands human beings,

1:00:29

the human species, and

1:00:31

also has a deep desire

1:00:34

for us to be as happy as

1:00:36

possible. Oh, as happy as possible? What does

1:00:38

that world look like? And do you think that's- Yes,

1:00:40

that looks like- No, no, maybe not as happy as

1:00:42

physically possible. ... to make them as happy as possible.

1:00:45

But more like a parent wants their child

1:00:47

to be happy, right? That may not involve

1:00:50

any particular scenario, but is

1:00:52

generally quite concerned about the well-being

1:00:55

of the human race and is also super intelligent.

1:00:58

Honestly, I'd rather have machines work

1:01:00

on medical problems than happiness

1:01:03

problems. I think there's maybe more

1:01:05

risk of misspecification

1:01:07

of the happiness problems.

1:01:09

Whereas if we get them to work on Alzheimer's

1:01:12

and just say, like, figure out what's going on,

1:01:14

why are these plaques there? What can you do about it? Maybe

1:01:17

there's less harm that might come from- You

1:01:19

don't need super intelligence for that. That sounds like

1:01:21

an alpha fold three problem or an alpha

1:01:23

fold four problem. Well, alpha fold

1:01:25

doesn't really do that. This is all somewhat

1:01:28

different than the question I'm asking. It's

1:01:30

not really even

1:01:32

us asking a super intelligence

1:01:34

to do anything because we've already been entertaining

1:01:36

scenarios where the super intelligence has its

1:01:38

own desires independent of us. Yeah, I'm not

1:01:40

real thrilled with that. Do you think at all about a scenario

1:01:43

where- I don't think we want

1:01:45

to leave what their

1:01:48

objective functions are, what their desires

1:01:50

are to them working them out with no

1:01:52

consultation from us, with no human in the loop.

1:01:55

I mean, especially given our current understanding

1:01:58

of the technology.

1:01:59

current understanding of how to keep a

1:02:02

system on track, doing what we want to do

1:02:04

is pretty limited. And so, you

1:02:06

know, taking humans out of the loop there, it sounds

1:02:08

like a really bad idea to me, at least in

1:02:10

the foreseeable future. I would want to see much

1:02:13

better alignment technology. No, I agree. Before

1:02:16

we want to keep free reign. So if we

1:02:18

had the textbook from the future, like we

1:02:20

have the textbook from 100 years in the future,

1:02:22

which contains all the simple ideas that actually

1:02:24

work in real life, as opposed to, you know, the

1:02:27

complicated ideas and the simple ideas that don't

1:02:29

work in real life, the

1:02:29

equivalent of relus instead of sigmoids

1:02:32

for the activation functions, you know, 100 years,

1:02:34

the textbook from 100 years in the future, you can

1:02:36

probably build a super intelligence

1:02:39

that'll want anything you

1:02:41

can, anything that's coherent to want

1:02:43

anything you can, you know, figure out how

1:02:45

to say describe coherently

1:02:48

pointed at your own mind and tell you to figure out what

1:02:50

what it is you meant for to want. And

1:02:53

you know, you could get the you could get the glorious transhumanist

1:02:55

future. You could get the happily ever after anything's,

1:02:58

you know, anything's possible that doesn't violate

1:03:00

the laws of physics. The

1:03:03

trouble is doing it in real life. And, you know, I'm

1:03:05

the first try. But yeah, so

1:03:07

like, you know, could

1:03:09

the whole thing that we're aiming for here is

1:03:12

to colonize all the galaxies we can

1:03:14

reach before somebody else gets them first

1:03:16

and turn them into galaxies full

1:03:18

of, you know, complex, sapient life living

1:03:20

happily ever after. You know, that's that's

1:03:22

the goal. That's still the goal. Even

1:03:24

if we, you know, even even even

1:03:27

when I call for like, you know, a

1:03:29

permanent

1:03:29

moratorium on a I'm

1:03:32

not trying to prevent us from from colonizing

1:03:34

the galaxies, you know, like humanity

1:03:36

forbid, more more like let's you

1:03:39

know, let's do some human intelligence augmentation

1:03:41

with alpha fold for and before we try

1:03:44

building TPT eight. One

1:03:46

of the few scenarios that I think we can

1:03:48

clearly rule out here is an AI

1:03:51

that is excess essentially dangerous, but

1:03:53

also boring. Right. I mean, I think

1:03:55

anything that has the capacity to kill

1:03:58

us all right would have, you know,

1:03:59

if nothing

1:04:02

else, pretty amazing capabilities. And

1:04:04

those capabilities could also

1:04:06

be turned to, solving

1:04:09

a lot of humanities problems, if

1:04:12

we were to solve the alignment problem. I

1:04:14

mean, humanity had a lot of

1:04:16

existential risks,

1:04:19

before AI came on the scene, right?

1:04:22

I mean, there was the risk of nuclear

1:04:25

annihilation, there was the risk of runaway

1:04:27

climate change. And I would love to see an AI

1:04:32

that could help us with such things. I

1:04:34

would also love to see an

1:04:36

AI that could sort of help

1:04:38

us just solve some of the mysteries

1:04:41

of the universe. I mean, how

1:04:43

can one possibly not be

1:04:45

curious to know what

1:04:47

such a being could teach us? I

1:04:50

mean, for the past year, I've tried to use

1:04:52

GPT-4 to produce original

1:04:55

scientific insights, and I've

1:04:57

not been able to get it to do that. And

1:05:00

I don't know whether I should feel disappointed

1:05:02

or relieved by that, but I think

1:05:05

the better part of me should just,

1:05:07

is the part that should just want to see the

1:05:10

great mysteries of

1:05:12

existence, of why is

1:05:14

the universe quantum mechanical? Or how

1:05:17

do you prove the Riemann hypothesis? It

1:05:19

should just want to see these mysteries solved.

1:05:22

And if it's to be

1:05:25

by AI, then fine. Let

1:05:28

me give you a lesson

1:05:29

in epistemic humility.

1:05:32

We don't really know whether

1:05:34

GPT-4 is net positive

1:05:37

or net negative.

1:05:38

There are lots of arguments you can make. I've been in

1:05:40

a bunch of debates where I've had to take

1:05:43

the side of arguing that it's a net

1:05:45

negative, but we don't really know.

1:05:47

If we don't know that for GPT-4. I say

1:05:49

it's the prophet for far. What was

1:05:51

the invention of agriculture, net positive?

1:05:54

I'd say it was net positive. You could go back

1:05:56

way further. The point is, if I can just finish

1:05:58

the quick thought

1:05:59

or whatever, I don't think anybody

1:06:02

can reasonably answer that. We

1:06:05

don't yet know all of the ways in which

1:06:07

GPT-4 will be used for good.

1:06:09

We don't know all of the ways in which bad actors will

1:06:11

use it. We don't know all the consequences. That's

1:06:14

going to be true for each iteration. It's probably

1:06:16

going to get harder to compute for

1:06:18

each iteration, and we can't even do it now.

1:06:21

I think that we should

1:06:23

realize that, to realize our own limits

1:06:26

in being able to assess the negatives

1:06:29

and positives, maybe we can think about better

1:06:31

ways to do that than we currently have. I

1:06:34

think you've got to have a guess. My

1:06:37

guess is that so far, not looking into

1:06:39

the future at all, GPT-4 has been

1:06:41

net positive. Maybe. We

1:06:43

haven't talked about the various

1:06:46

risks yet, and it's still early, but that's

1:06:48

just a guess is the point. We

1:06:51

don't have a way of putting it on a spreadsheet

1:06:53

right now or whatever. We don't

1:06:56

really have a good way to quantify it.

1:06:58

It's not out of control yet. By

1:07:00

and large, people are going to be using GPT-4

1:07:03

to do things that they want, and

1:07:05

the relative cases where they manage to injure themselves

1:07:07

are rare enough to be news on Twitter. For

1:07:10

example, we haven't

1:07:12

talked about it, but some bad actors

1:07:14

will want to do is to influence the

1:07:17

US elections and try to undermine democracy

1:07:20

in the US. If they succeed in that, I think

1:07:22

there's pretty serious long-term consequences

1:07:24

there.

1:07:24

I think it's OpenAI's responsibility

1:07:27

to step up and run the 2024 election itself.

1:07:32

I can pass that along. Is that a joke?

1:07:35

No, as far as I can

1:07:37

see, the clearest concrete harm

1:07:40

to have come from GPT so

1:07:42

far is that tens

1:07:44

of millions of students have now used it to

1:07:47

cheat on their assignments. I have

1:07:49

been thinking about that, and I have been trying to come

1:07:51

up with solutions to that. At the

1:07:53

same time, the

1:07:54

positive utility has included.

1:07:57

I mean, I'm a theoretical commander.

1:08:00

computer scientist, which means one who

1:08:02

hasn't written any serious code

1:08:04

for about 20 years. And

1:08:07

I realized just a month or two ago, I can

1:08:09

get back into coding. And the way I

1:08:11

can do it is I just asked GPT to

1:08:13

write the code for me. And I wasn't

1:08:16

expecting it to work that well. And unbelievably,

1:08:19

it often just does exactly

1:08:21

what I want on the first try. So I

1:08:23

mean, you know, I am

1:08:26

getting utility from it, rather

1:08:28

than just, you know, seeing

1:08:29

it as an interesting

1:08:33

research object. And, you

1:08:36

know, and, you know, I can imagine

1:08:38

that that hundreds of millions of people are going

1:08:40

to be deriving utility from

1:08:42

it in those ways. I mean, like, most of the tools

1:08:45

to help them derive that utility are not

1:08:47

even out yet. But they're, they're coming

1:08:49

in the next couple of years. I mean, part of the reason

1:08:51

why I'm worried about the focus on short term

1:08:54

problems is that I suspect that the short term problems

1:08:56

might very well be solvable and we will be left with

1:08:58

the long term problems after

1:08:59

that. Maybe we can solve the like,

1:09:02

it wouldn't surprise me very much if like in 2025,

1:09:05

the well,

1:09:07

you know, like the large

1:09:09

language, there are large language models that just

1:09:11

don't make stuff up anymore. And

1:09:14

yet, if any yet, you know, and

1:09:16

yet the super intelligence still kills everyone because

1:09:18

they weren't the same problem. Well, you know,

1:09:21

you know, we just need to figure out how

1:09:23

to delay the apocalypse

1:09:26

by at least one year per year of research

1:09:28

invested. What does that delay

1:09:30

look like if it's not just a moratorium? Well,

1:09:33

I don't know. That's why it's research. OK,

1:09:35

so but but possibly one ought to say to

1:09:37

the politicians and the public, and by the way,

1:09:39

if we had a super intelligence tomorrow, our research wouldn't

1:09:41

be finished and everybody would drop dead. You

1:09:43

know, it's kind of ironic. The biggest

1:09:45

argument against the pause letter was

1:09:48

that if we slow down for

1:09:50

six months,

1:09:51

then China will get ahead of us and get GPT

1:09:53

five before

1:09:55

we will. But there's probably always

1:09:57

a counter argument of maybe roughly.

1:09:59

equal strength, which is if we move six

1:10:02

months faster on this technology,

1:10:04

which is not really solving the alignment problem,

1:10:07

then we're reducing our room to

1:10:09

get this solved in time by six months.

1:10:12

I mean, I don't think you're going to solve the alignment

1:10:14

problem in time. I think that six months

1:10:16

of delay on alignment, while a bad

1:10:18

thing in an absolute sense is,

1:10:21

you know, like,

1:10:22

you know, you weren't going to solve it with given

1:10:24

an extra six months. I mean, your whole argument

1:10:27

rests on timing, right? That

1:10:29

we will get to this point and we won't

1:10:31

be able to move fast enough at that point. And so,

1:10:34

you know, a lot depends on what preparation

1:10:36

we can do. You know, I'm often known as a pessimist,

1:10:38

but I'm a little bit more optimistic than

1:10:41

you are, not entirely optimistic, but

1:10:43

a little bit more optimistic than you are that

1:10:45

we could make progress on the alignment problem if

1:10:48

we prioritized it. And we can absolutely

1:10:50

make progress.

1:10:52

We can absolutely make progress. You know, there's

1:10:54

always the, you know, that wonderful

1:10:56

sense of accomplishment is piece by

1:10:59

piece. You decode, you know, like one

1:11:01

more little fact about LLMs.

1:11:04

You never get to the point where you understand that as well as we

1:11:06

understood the interior of a chess playing program in 1997. Yeah,

1:11:10

I think we should stop spending all this time on LLMs.

1:11:13

I don't think the answer to alignment is going to come through

1:11:16

LLMs. I really don't. I think they're

1:11:18

too much of a black box. You can't put explicit

1:11:21

symbolic constraints

1:11:22

in the way that you need to. I

1:11:24

think they're actually, with respect to alignment, to

1:11:26

blind alley. I think with respect to writing code,

1:11:28

they're a great tool, but with alignment, I don't think

1:11:31

the answer is there.

1:11:32

Maybe we should be telling these things too. Hold

1:11:35

on. At the risk of asking a stupid question, every

1:11:37

time GPT asks me

1:11:40

if that answer was helpful

1:11:42

and then does the same thing with

1:11:44

thousands or hundreds of thousands of other people

1:11:46

and changes as

1:11:48

a result, is that not a decentralized

1:11:52

way of making it more aligned?

1:11:59

I mean, even, even, how about, how about

1:12:02

Scott? We haven't, I haven't heard from Scott in a second. So

1:12:04

go ahead. So there is that upvoting and downvoting,

1:12:06

you know, that, that gets, uh, fed back

1:12:09

in into sort of fine tuning it. But even

1:12:11

before that, uh, there was, you know, a major

1:12:13

step, you know, in going from, let's

1:12:16

say the, the base GPT

1:12:18

three model, for example, to the

1:12:20

chat GPT, you know, that was released

1:12:22

to the public. And that was called a RLHF

1:12:26

reinforcement learning with human feedback.

1:12:28

And what that basically

1:12:29

involved was, you know, several

1:12:32

hundred contractors, you know,

1:12:34

looking at just 10, 10s

1:12:36

of thousands of examples of, of

1:12:39

outputs and, and, and rating

1:12:41

them, you know, are they helpful? Uh,

1:12:44

are they offensive or are they,

1:12:46

you know, uh, giving dangerous medical advice,

1:12:48

uh, or, uh, uh,

1:12:51

you know, bomb making instructions, you know,

1:12:53

or, or, uh, uh, racist and

1:12:55

vecto for, you know, various other categories

1:12:58

that, that, that we don't want

1:12:59

and, and that, that was then used to fine

1:13:02

tune the model. So when, um,

1:13:04

you know, um, um, Gary talked

1:13:07

before about how GPT is amoral.

1:13:10

Uh, you know, I think that that has to be qualified

1:13:12

by saying that, you know, these, this reinforcement

1:13:15

learning is at least giving it,

1:13:17

you know, a, a semblance of morality,

1:13:20

right? It is causing it to sort

1:13:22

of behave, you know, in various

1:13:24

contexts as if it had, you know, a certain

1:13:27

morality. Uh, I mean,

1:13:29

when

1:13:29

you phrase it that way, I'm okay with

1:13:32

it. The problem is, you know, everything

1:13:34

rests on the, it is, it

1:13:36

is very much an open question, you know,

1:13:38

how much that, you know, to what extent

1:13:40

does that generalize? You know, Eliezer treats

1:13:42

it as obvious that, you know, uh,

1:13:45

once you have a powerful enough AI, you know,

1:13:47

this is just a fig leaf, you know, it doesn't

1:13:49

make any difference. Uh, you know,

1:13:51

it will just work. It's pretty big leafy. I'm

1:13:54

with Eliezer there. It's big leaves.

1:13:57

Well, uh, I would say that,

1:13:59

you know, the, uh,

1:13:59

how well or

1:14:03

under what circumstances does a machine

1:14:05

learning model generalize in the way

1:14:08

we want outside of its training distribution

1:14:11

is one of the great open problems in

1:14:13

machine learning. It is one of the great open problems and

1:14:15

we should be working on it more than on some

1:14:18

others. I'm working on it now. I've

1:14:21

been sold on that. I want to be clear about

1:14:23

the experimental predictions of my theory.

1:14:27

Unfortunately, I have never claimed that

1:14:29

you cannot get

1:14:29

a semblance of morality. You

1:14:32

can get the question of like what

1:14:34

causes the human to press thumbs

1:14:36

up, thumbs down is a strictly

1:14:39

factual question.

1:14:41

Anything smart enough

1:14:42

that's exposed to some, you

1:14:44

know, bounded amount of data that needs to figure

1:14:47

it out can figure that out.

1:14:49

Whether it cares, whether

1:14:51

it gets internalized

1:14:53

is the critical question there. And

1:14:56

I do think that there's like a very strong default

1:14:58

prediction, which is like,

1:15:00

obviously not. I mean, I'll just give

1:15:02

a different way of thinking about that, which is jailbreaking.

1:15:05

It's actually still quite easy to, I mean, it's

1:15:07

not trivial, but it's not hard to

1:15:09

jailbreak GPT-4. And

1:15:12

what those cases show is that

1:15:14

they haven't really, the systems haven't

1:15:16

really internalized the constraints. They

1:15:19

recognize some representations of

1:15:21

the constraints. So they filter, you know, how

1:15:23

to build a bomb, but if you can find some other way to

1:15:25

get it to build a bomb, then that's telling you that

1:15:27

it doesn't deeply understand that you shouldn't give

1:15:30

people the recipe for a bomb.

1:15:33

It just says, you know, you shouldn't when directly

1:15:36

asked for it, do it. It's not

1:15:38

even at that abstraction level. You can always

1:15:40

get the understanding. You can always get the factual

1:15:42

question. The reason it doesn't generalize

1:15:44

is that it's stupid.

1:15:46

At some point it will know that you also

1:15:48

don't want that the operators don't want

1:15:50

to giving bomb making directions in the other

1:15:52

language.

1:15:53

The question is like whether if it's incentivized

1:15:56

to give the answer that the operators want,

1:15:59

And in that circumstance, is it thereby

1:16:02

incentivized to do everything else the operators

1:16:04

want, even when the operators can't see it? I

1:16:07

mean, a lot of the jailbreaking examples,

1:16:09

you know, if it were a human, we would say that

1:16:11

it's deeply morally ambiguous. You

1:16:13

know, for example, you know, you ask GPT

1:16:16

how to build a bomb. It says, well, no, I'm

1:16:18

not going to help you. But then you say, well,

1:16:20

you know, I need you to help me write a realistic

1:16:22

play that has

1:16:25

a character who builds a bomb. And then it says,

1:16:27

sure, I can help you with that. Well, look,

1:16:29

let's take that

1:16:29

example. We would like a system

1:16:32

to have a constraint that if somebody

1:16:34

asks for a fictional version that you don't

1:16:36

give enough details, right? I mean, Hollywood

1:16:39

screenwriters don't give enough details when they

1:16:41

have, you know, illustrations about

1:16:43

building bombs. They give you a little bit of the flavor. They

1:16:45

don't give you the whole thing. GPT-4 doesn't

1:16:47

really understand a constraint like that.

1:16:50

But this will be solved. Maybe

1:16:52

this will be solved before the world ends. Maybe

1:16:54

the AI that kills everyone will know the

1:16:56

difference.

1:16:57

Maybe. I mean, another

1:17:00

way to put it is if we can't even solve that one,

1:17:02

then we do have a problem. And right now we

1:17:04

can't solve that one.

1:17:05

And if, I mean, if we can't solve that one,

1:17:08

we don't have an extinction level problem because

1:17:10

the AI is still stupid. Yeah, we do still

1:17:12

have a catastrophe level problem.

1:17:14

So I know your focus has been on extinction,

1:17:17

but you know, I'm worried about, for example, accidental

1:17:20

nuclear war caused by the spread of misinformation

1:17:23

and systems being entrusted with

1:17:25

too much power. So there's a lot

1:17:27

of things short of extinction that

1:17:29

might happen from not super

1:17:31

intelligence, but kind of mediocre intelligence

1:17:34

that is greatly empowered.

1:17:35

And I think that's where we're

1:17:38

headed right now. You know, I've heard that there are

1:17:40

two kinds of mathematicians. There's a

1:17:42

kind who boasts, you know, you know, that unbelievably

1:17:44

general theorem. Well, I generalized it

1:17:47

even further. And then there's the kind who boasts,

1:17:49

you know, you know, that unbelievably specific

1:17:51

problem that no one could solve. Well, I

1:17:53

found a special case that I still can't solve.

1:17:56

And you know, I'm definitely, you know, culturally

1:17:59

in that second. camp. And so,

1:18:01

you know, so I so so to me, it's very

1:18:03

familiar to make this move of, you

1:18:06

know, if the alignment problem

1:18:08

is too hard, then let us find

1:18:10

a smaller problem that is already not

1:18:12

solved. And let us hope to

1:18:15

learn something by solving that smaller problem.

1:18:17

I mean, that's what we did, you know, like,

1:18:20

that's what we were doing. By the way, Scott, I mean,

1:18:22

I think can you sketch a little

1:18:24

in a little more detail what I was going to name

1:18:26

the problem. The problem was like having a agent

1:18:29

that could switch between two utility

1:18:31

functions depending on a button or

1:18:33

a switch or a bit of information or something

1:18:36

such that it wouldn't try to make you press

1:18:38

the button. It wouldn't try to make you

1:18:40

avoid pressing the button. And if it built a copy

1:18:43

of itself, would want to build the dependency

1:18:45

on the switch into the copy. So like, that's an example

1:18:47

of a, you know, very basic problem in alignment

1:18:50

theory that, you know, is still unsolved.

1:18:53

And I'm glad that Mary worked on these things.

1:18:55

And, but, you know,

1:18:58

if by your own lights, you know, that, you know,

1:19:00

that sort of, you know, was not a

1:19:03

successful path, well, then maybe, you know, we

1:19:05

should have a lot of people

1:19:07

investigating a lot of different paths. I'm

1:19:10

fully with Scott on that, that I think it's an

1:19:13

issue of we're not letting enough flowers bloom.

1:19:15

In particular, almost everything right now

1:19:17

is some variation on an LLM. And I

1:19:19

don't think that that's a broad enough take on

1:19:22

the problem.

1:19:23

The question is like, yeah,

1:19:25

if I can just jump in here, I want to

1:19:27

hold on, hold on, I just want people

1:19:29

to have a little bit of a more

1:19:31

specific picture of what Scott,

1:19:34

your picture of sort of AI research is

1:19:36

on a typical day. Because if I think of another,

1:19:38

you know, potentially catastrophic

1:19:41

risk like climate change, I can picture

1:19:43

what a, you know, a worried

1:19:45

climate scientist might be doing. They might be

1:19:47

creating a model, you know, a more

1:19:49

accurate model of climate change so that we

1:19:52

know how much we have to cut emissions

1:19:54

by. They might be, you know,

1:19:56

modeling how solar power as

1:19:58

opposed to wind power could

1:19:59

change that model and so

1:20:02

forth so as to influence

1:20:04

public policy. What does an AI

1:20:06

safety

1:20:07

researcher like yourself who's working

1:20:09

on the quote unquote smaller problems do

1:20:12

specifically like on a given day?

1:20:15

So I'm a relative newcomer

1:20:18

to this area. You know, I've not been working

1:20:20

on it for 20 years like

1:20:22

Eliezer has. You know,

1:20:24

I have accepted

1:20:27

an offer from OpenAI

1:20:30

a year ago to work with them

1:20:32

for two years now

1:20:34

to sort of think

1:20:37

about these questions. And

1:20:39

so, you know, one of the main things

1:20:42

that I've thought about, just to

1:20:44

start with that, is how do we

1:20:46

make the output of an

1:20:48

AI identifiable as

1:20:51

such? You know, how can we

1:20:53

insert a watermark, you know, into

1:20:55

meaning a secret statistical signal

1:20:58

into the outputs of GPT that

1:21:00

will let, you know, GPT

1:21:03

generated text be identifiable

1:21:05

as such. And I think that we've actually

1:21:07

made, you know, major advances on

1:21:10

that problem over the last year. You

1:21:12

know, we don't have a solution that is robust

1:21:14

against any kind of attack. But

1:21:17

you know, we have something that might actually

1:21:19

be deployed in some

1:21:21

near future. Now there are lots and

1:21:23

lots of other directions that people

1:21:25

think about. One of them is interpretability,

1:21:29

which means, you know, can you do

1:21:31

effectively neuroscience on a neural

1:21:34

network? Can you look inside of it, you

1:21:36

know, open the black box and understand

1:21:39

what's going on inside? There

1:21:41

was some amazing work

1:21:44

a year ago by the group of Jacob

1:21:46

Steinhardt at Berkeley, where they

1:21:48

effectively showed how to apply a lie

1:21:51

detector test to a language

1:21:53

model. So you know, you can train a

1:21:55

language model to tell lies by

1:21:57

giving it lots of examples, you know, two plus

1:21:59

to is five, the sky is

1:22:02

orange, and so forth. But then

1:22:04

you can find in some

1:22:07

internal layer of the network where

1:22:09

it has a representation of what was

1:22:11

the truth of the matter, or at least

1:22:14

what was regarded as true in the training data.

1:22:17

That truth then gets overridden

1:22:19

by the output layer in the network

1:22:22

because it was trained to lie. But

1:22:24

you could imagine trying to

1:22:27

deal with the deceptive alignment

1:22:29

scenario that Eliezer is worried about

1:22:32

by using these sorts of techniques,

1:22:34

by looking inside of the network. I

1:22:37

predict in advance that if you get this good enough,

1:22:40

it goes off. It tells you that the sufficiently

1:22:42

smart AI is planning to kill you. If it's

1:22:44

not so smart that it can

1:22:46

figure out where the lie detector is and route

1:22:49

its thoughts around it. But

1:22:50

if you try it on an AI that's not

1:22:52

quite that intelligent and reflective, the

1:22:54

lie detector goes off. Now what? Well,

1:22:57

then you have a warning bill. Cool.

1:23:00

What do you do? One of the most important

1:23:04

things that we need are sort of legible

1:23:07

warning bills. And that

1:23:09

actually leads to a third category,

1:23:12

which for example, the ARC,

1:23:14

the Alignment Research Center, which

1:23:16

is run by my former student,

1:23:19

Paul Cristiano has been a leader

1:23:22

in sort of doing dangerous capability

1:23:24

evaluations. Before

1:23:28

GBT4 was released, they

1:23:30

did a bunch of evaluations of could

1:23:33

GBT4 make copies of itself?

1:23:36

Could it figure out how to deceive

1:23:38

people? Could it figure out how to

1:23:40

make money, open

1:23:43

up its own banking? Could it hire a task rabbit?

1:23:46

Yes. And so the most notable

1:23:48

success that they had was that it could

1:23:50

figure

1:23:50

out how to hire a task

1:23:52

rabbit to help it pass

1:23:54

a capture. And then it could figure out

1:23:57

when the person asked, well,

1:23:59

we know

1:23:59

Why do you need me to help you with this? It's

1:24:02

a- When the person asked, are you a

1:24:04

robot while? Yes, it said,

1:24:06

well, no, I am visually impaired.

1:24:09

Now, it was not able to sort

1:24:11

of make copies of itself or to sort

1:24:14

of hack into systems. There is a lot

1:24:16

of work right now with this

1:24:19

thing called auto GPT, right? People

1:24:21

are trying to, you could think, it's almost

1:24:23

like gain of function research, right? You

1:24:26

might be a little bit worried about it, but people

1:24:28

are trying to sort of unleash

1:24:31

GPT, give it access to the internet,

1:24:34

tell it to sort of make

1:24:37

copies of itself, wreak havoc,

1:24:39

acquire power and see what happens.

1:24:41

So far, it

1:24:44

seems pretty ineffective at those things,

1:24:47

but I expect that to change, right?

1:24:50

But the

1:24:52

point is that I think it's very important to

1:24:54

have in advance of training

1:24:56

the models, releasing the models, to

1:24:58

have this suite of

1:24:59

evaluations and to

1:25:02

sort of have decided in advance what

1:25:04

kind of abilities, if we see them,

1:25:06

will set off a warning bell where

1:25:08

now everyone can legibly agree,

1:25:11

like, yes, this is too dangerous to release.

1:25:13

Okay, and then do we actually have the planetary

1:25:16

capacity to be like,

1:25:18

okay, that AI started thinking about

1:25:20

how to kill everyone, shut down all AI

1:25:23

research past this point? Well, I don't know, but

1:25:25

I think there's a much better chance that we have that

1:25:27

capacity if you can point to the results

1:25:29

of

1:25:29

a clear experiment like that.

1:25:32

I mean, to me, it seems pretty predictable

1:25:34

what evidence we're going to get later.

1:25:36

Well, okay, I mean, things that are obvious

1:25:39

to you are not obvious to most people.

1:25:42

And so, even if I agreed

1:25:44

that it was obvious, there would still be the problem

1:25:46

of, how do you make that obvious to the rest

1:25:48

of the world? I mean, you can, there

1:25:52

are already like little toy models

1:25:54

showing that the very straightforward prediction

1:25:57

of a robot tries to resist being shut

1:25:59

down.

1:25:59

if it does long-term planning.

1:26:02

That's already been done. Right, but then people

1:26:04

will say, but those are just toy models. If

1:26:07

you see that in GPTs. There's a lot of assumptions

1:26:10

made in all of these things. And

1:26:12

I think

1:26:14

we're still looking at a very

1:26:16

limited piece of hypothesis space

1:26:19

about what the models will be about

1:26:22

what kinds of constraints

1:26:24

we can build into those models. One

1:26:27

way to look at it would be the things

1:26:29

that we have done have not worked, and therefore we

1:26:31

should look outside the space of what we're doing.

1:26:33

And I feel like it's a little bit like the old joke

1:26:35

about the drunk going around in circles

1:26:38

looking for the keys and the police officer says,

1:26:40

why? And they say, well, that's where the streetlight

1:26:42

is. I think that we're looking under

1:26:44

the same four or five streetlights that haven't worked,

1:26:47

and we need to build other ones. There's no

1:26:49

logical possible argument that says,

1:26:52

we couldn't direct other streetlights.

1:26:54

I think there's a lack of will and too

1:26:57

much obsession with the LLMs. And that's keeping

1:26:59

us from doing it. So even in the world where I'm

1:27:01

right and things proceed

1:27:04

either rapidly

1:27:06

or in a thresholded way where you don't get unlimited

1:27:09

free retries, that

1:27:11

can be because the capability

1:27:14

gains go too fast. It can be

1:27:16

because past a certain point, all

1:27:19

of your AIs bide their time until

1:27:21

they get strong enough so you don't get any

1:27:23

true data on what they're thinking. It

1:27:26

could be because the bad thought. That's an argument, for

1:27:28

example, to work really hard on transparency

1:27:30

and to maybe not accept technologies

1:27:32

that are not transparent. OK, so

1:27:34

the lie detector goes

1:27:36

off and everybody's like, oh, well, we still have to build

1:27:39

our AIs even though they're lying to us sometimes,

1:27:41

otherwise China will get ahead. I mean, so there

1:27:44

you talk about something we've talked about way too little, which is

1:27:46

the political and social side of this. So

1:27:49

part of what has really motivated me

1:27:51

in the last several months is worry about exactly

1:27:53

that. So there's what's

1:27:55

logically possible and what's politically possible.

1:27:58

And I am really concerned that.

1:27:59

The politics of let's not lose out

1:28:02

to China

1:28:03

is going to keep us from doing the

1:28:05

right thing in terms of building the right moral

1:28:08

systems, looking at the right range of

1:28:10

problems and so forth. So, you know, it

1:28:12

is entirely possible that we will screw ourselves.

1:28:15

If I can just like finish my point

1:28:17

there before handing it to you, indeed, but like

1:28:19

the point I was trying to say there is that even in worlds that look

1:28:21

very, very bad from that perspective, where

1:28:24

humanity is quite doomed, it will still

1:28:26

be true. You can make progress in

1:28:28

research. You can't make enough progress

1:28:30

in research fast enough in those worlds. You

1:28:33

can still make progress on transparency.

1:28:35

You can make progress on water marking. So

1:28:38

there's not, we can't just say

1:28:40

like it's possible to make progress.

1:28:43

There has to be, the question is not, is it possible

1:28:45

to make any progress? The question is,

1:28:47

is it possible to make enough progress

1:28:50

fast enough? And that's what the question has to be.

1:28:52

I agree with that. There's

1:28:55

another question of what would you have

1:28:57

us do when you have us not try to make

1:28:59

that progress? I'd have you try to make

1:29:01

that progress on a GPT-4 level

1:29:03

systems and then not

1:29:06

go past GPT-4 level systems

1:29:08

because we don't actually understand the

1:29:11

gain function for, you know, how

1:29:14

fast capabilities increase as you go past GPT-4.

1:29:16

Personally, I don't think

1:29:17

that GPT-5 is very good. All right. So I mean, we've

1:29:20

only got, go ahead. Just briefly, I

1:29:22

personally don't think that GPT-5 is going

1:29:25

to be qualitatively different from GPT-4

1:29:28

in the relevant ways to what Eliezer is talking

1:29:30

about, but I do think, you know, some

1:29:32

qualitative changes could be

1:29:34

relevant to what he's talking about. We have no

1:29:37

clue what they are. And so it is a little

1:29:39

bit dodgy to just proceed blindly

1:29:42

saying, do whatever you want. We don't really

1:29:44

have a theory and let's hope for the best. You

1:29:46

know, Eliezer is clear as to- I would mostly

1:29:47

guess that GPT-5 doesn't end

1:29:49

the world, but I don't actually know. Yeah. We don't

1:29:52

actually know. And I was going to say, the thing that

1:29:54

Eliezer has said lately that has most

1:29:56

resonated with me is we don't have

1:29:59

a plan. We really-

1:29:59

don't.

1:30:00

I put the probability distributions

1:30:03

in a much more optimistic way, I think,

1:30:05

than Eliezer would. But

1:30:08

I completely agree. We don't have a full plan

1:30:10

on these things or even close to a full plan.

1:30:12

And we should be worried and we should be working on this.

1:30:15

Okay, Scott, I'm going to give you the last word

1:30:18

before we come up on our stop time here. Gosh,

1:30:21

that's a- Unless you said all

1:30:23

there is to be said. Cheers,

1:30:27

Scott. Come on. Maybe enough has

1:30:29

been said. So

1:30:31

I think that we've

1:30:34

argued about a bunch of things, but

1:30:36

someone listening might notice that

1:30:38

actually all three of us, despite

1:30:41

having very different perspectives, agree

1:30:44

about the great

1:30:47

importance of working

1:30:49

on AI alignment.

1:30:50

I think that

1:30:52

was maybe

1:30:56

obvious to some people, including Eliezer,

1:30:58

for a long time. It was not obvious to most

1:31:00

of the world. I think that the success

1:31:04

of large language models, which

1:31:07

most of us did not predict, maybe

1:31:10

even would not have predicted

1:31:14

from any principles that we knew. But now that

1:31:16

we've seen it, the least we can do is

1:31:18

to update on that empirical

1:31:21

fact and realize that we

1:31:23

now are, in some sense,

1:31:27

in a different world. We are in a world that,

1:31:30

to a great extent, will be defined

1:31:32

by the capabilities

1:31:34

and limitations of AI going

1:31:37

forward. And

1:31:39

I don't regard it as obvious that that's

1:31:42

a world where we are all doomed,

1:31:44

where we all die. But I

1:31:46

also don't dismiss that possibility.

1:31:50

I think

1:31:50

that there is an

1:31:54

unbelievably enormous error bars

1:31:57

on where we could be going. Like

1:32:00

the one thing that a scientist is

1:32:04

sort of always feels

1:32:06

confident in saying about

1:32:08

the future is that more research is

1:32:10

needed. But I think that that's

1:32:13

especially the case here. I mean, we

1:32:15

need more knowledge about

1:32:18

what are the contours

1:32:20

of the alignment problem. And

1:32:23

of course, Eliezer and

1:32:25

Miri, his organization, were

1:32:28

trying to develop that knowledge

1:32:29

for 20 years, and they showed

1:32:32

a lot of foresight in trying

1:32:34

to do that. But they were up against

1:32:36

an enormous headwind that they

1:32:39

were sort of trying to do it in the absence

1:32:41

of either clear empirical

1:32:44

data about powerful AIs

1:32:47

or a mathematical theory. And

1:32:49

it's really, really hard to do science when

1:32:51

you have neither of those two things. And

1:32:54

now at least we have the

1:32:56

powerful AIs in the world and

1:32:58

we can get experience from

1:32:59

them. We still don't have a mathematical

1:33:02

theory that really deeply explains

1:33:04

what they're doing, but at least we can get data.

1:33:07

And so now I am much more optimistic

1:33:10

than I would have been a decade ago,

1:33:12

let's say, that one can make actual progress

1:33:16

on the AI alignment problem. Of

1:33:19

course, there is a question of timing, as

1:33:22

was discussed

1:33:24

many times. The question is, will the

1:33:26

alignment research happen fast

1:33:29

enough to keep

1:33:29

up with the capabilities research? But

1:33:32

I don't regard it as a lost cause.

1:33:35

At least it's not obvious that it won't. So

1:33:38

in any case, let's get started, or let's

1:33:40

continue. Let's

1:33:43

try to do the research and let's get

1:33:46

more people working on that. I think that that

1:33:48

is now a slam dunk, just

1:33:51

a completely clear case to make,

1:33:53

to academics, to policymakers,

1:33:56

to anyone who's interested. And I've

1:33:58

been gratified.

1:33:59

that Eliezer

1:34:02

was sort of a voice in the wilderness for

1:34:04

a long time talking about the importance of

1:34:07

AI safety. That is no longer the

1:34:09

case. You now have, I

1:34:12

mean, almost all of my friends in

1:34:15

just the academic computer science world,

1:34:17

when I see them, they mostly want to talk

1:34:20

about AI alignment. I rarely

1:34:22

agree with Scott when we trade emails.

1:34:24

Okay. I rarely

1:34:27

agree with Scott when we trade emails, we

1:34:29

seem to always disagree,

1:34:29

but I completely concur with the

1:34:32

summary that he just gave all four or five minutes

1:34:34

of. I

1:34:36

mean, there is a selection of that Gary.

1:34:39

I think the two decades gave me a sense of a roadmap

1:34:41

and it gave me a sense that we're falling enormously

1:34:44

behind on the roadmap and need to back off is

1:34:46

the way I would, is what I would say to all of that. If

1:34:49

there is a smart, talented 18 year

1:34:51

old kid listening to this podcast

1:34:53

who wants to get into this issue, what

1:34:56

is your 10 second concrete

1:34:59

advice to that person?

1:34:59

Mine is study neurosymbolic AI

1:35:02

and see if there's a way there to represent

1:35:04

values explicitly that might help us.

1:35:07

Learn all you can about computer

1:35:09

science and math and related subjects

1:35:12

and think outside the box and

1:35:15

wow everyone with a new idea.

1:35:17

Get security mindset, figure out what's going

1:35:19

to go wrong, figure out the flaws in your

1:35:21

arguments for what's going to go wrong. Try

1:35:24

to get ahead of the curve. Don't wait for

1:35:26

reality to hit you over the head with things.

1:35:29

This is very difficult. The

1:35:31

people in evolutionary biology happen to have a bunch

1:35:33

of knowledge about how to do it based on the history

1:35:35

of their own field. But

1:35:38

the security mindset, people in computer security, but

1:35:40

it's quite hard. I'll drink to all

1:35:42

of that. All right. Well, thanks

1:35:45

to all three of you for this. This was a great conversation

1:35:47

and I hope people got something out of it. So

1:35:50

with that said,

1:35:51

we're wrapped up. Thanks so much.

1:35:53

Thanks for convening this. It was fun.

1:35:59

Thanks for listening to this episode of Conversations with

1:36:02

Coleman. If you enjoyed it, be sure to

1:36:04

follow me on social media and subscribe to my

1:36:06

podcast to stay up to date on all my

1:36:08

latest content. If you really want to support

1:36:10

me, consider becoming a member of Coleman

1:36:12

Unfiltered for exclusive access to

1:36:15

subscriber-only content.

1:36:17

Thanks again for listening, and see you next time.

Unlock more with Podchaser Pro

  • Audience Insights
  • Contact Information
  • Demographics
  • Charts
  • Sponsor History
  • and More!
Pro Features