Will AI Destroy Us? - AI Virtual Roundtable by Conversations With Coleman | Podchaser

Episode from the podcastConversations With Coleman

Will AI Destroy Us? - AI Virtual Roundtable

Released Friday, 28th July 2023

1 person rated this episode

Will AI Destroy Us? - AI Virtual Roundtable

Will AI Destroy Us? - AI Virtual Roundtable

Friday, 28th July 2023

1 person rated this episode

Rate Episode

Podchaser Pro

Episode Transcript

Transcripts are displayed as originally observed. Some content, including advertisements may have changed.

Use Ctrl + F to search

0:02

Music

0:30

Welcome to another episode of Conversations with

0:32

Coleman. If you're hearing this, then you're

0:34

on the public feed, which means you'll get episodes

0:36

a week after they come out and you'll hear advertisements.

0:39

You can get access to the subscriber feed by going

0:42

to ColemanHughes.org and becoming a supporter.

0:44

This means you'll have access to episodes a week early.

0:47

You'll

0:47

never hear ads

0:48

and you'll get access to bonus Q and A episodes.

0:51

You can also support me by liking and subscribing

0:54

on YouTube and sharing the show with friends and family.

0:56

As always, thank you so much

0:58

for your support.

1:01

Music Welcome

1:03

to another episode of Conversations with Coleman.

1:06

Today's episode is a round table discussion

1:08

about AI safety with Eliezer

1:10

Yudkowsky, Gary Marcus, and Scott

1:13

Aronson. Eliezer Yudkowsky is a prominent

1:15

AI researcher and writer

1:17

known for co-founding the Machine Intelligence

1:20

Research Institute, where he spearheaded

1:22

research on AI safety. He's also

1:24

widely recognized for his influential writings

1:27

on the topic of rationality. Scott

1:29

Aronson is a theoretical computer scientist

1:31

and author celebrated for his pioneering

1:33

work in the field of quantum computation.

1:36

He's also the chair of CompSci at U of T

1:38

Austin, but is currently taking a leave of absence

1:41

to work at OpenAI. Gary

1:42

Marcus is a cognitive scientist,

1:45

author, and entrepreneur known for

1:47

his work at the intersection of psychology, linguistics,

1:50

and AI. He's also authored several

1:52

books, including Kluge and

1:55

Rebooting AI, Building AI

1:57

We Can Trust. This episode is all about

1:59

AI safety.

1:59

We talk about the alignment problem. We

2:02

talk about the possibility of human extinction

2:04

due to AI. We

2:05

talk about what intelligence actually is.

2:08

We talk about the notion of a singularity or

2:10

an AI take-off event and much more.

2:13

It was really great to get these three guys in the same

2:15

virtual room. And I think you'll find that this conversation

2:18

brings something a bit fresh to a topic

2:20

that has admittedly been beaten to death

2:22

on certain corners of the internet. So without

2:24

further ado, Eliezer Jutkowski,

2:27

Gary Marcus, and Scott Aronson.

2:30

Thanks so much for coming on my show. Thank you.

2:32

Thanks for having us. So

2:33

the topic of today's conversation

2:35

is AI safety. And this is something that's been in the

2:37

news lately. We've seen experts and CEOs signing letters recommending

2:40

public policy surrounding regulation. We

2:47

continue to have the debate between people that really feel like they're

2:49

going to be continue

2:55

to have the debate

2:57

between people that really

3:00

fear AI is going to end the

3:02

world and potentially kill

3:03

all of humanity and the people who feel

3:05

that those fears are overblown.

3:09

And so

3:11

this is going to be sort of a roundtable conversation

3:13

about that. And you three are

3:16

really three of the best people in the world to talk

3:18

about it with. So thank you all for doing this.

3:20

Let's just start out with you,

3:22

Eliezer, because you've been one of the most really

3:27

influential voices getting people

3:29

to take seriously the possibility that AI

3:31

will kill us all. You know, why is

3:33

AI going to destroy us? Chat GPT seems

3:35

pretty nice. I use it every day. What's

3:38

the big fear here? Make the case. Well, chat

3:40

GPT seems quite unlikely to kill everyone

3:42

in its present state. AI capabilities

3:45

keep on advancing and advancing. The question

3:47

is not can chat GPT kill us? The

3:50

answer is probably

3:50

no. So as long as that's true,

3:53

as long as it hasn't killed us yet, they're

3:55

just going to engineer. It's just going to keep pushing the capabilities.

3:58

There is no obvious blocking point. We don't

4:01

understand the things that we

4:03

build. The AIs are grown more

4:05

than built, you might say. They end up

4:07

as giant inscrutable matrices of floating

4:09

point numbers that nobody can decode. It's

4:12

probably going to end up technically difficult to

4:14

make them want particular things and not

4:16

others. And where

4:18

people are just charging straight ahead. So

4:21

at this rate, we end up with something that is smarter

4:23

than us, smarter than humanity, that

4:26

we don't understand, whose preferences

4:28

we could not shape. And by default,

4:31

if that happens, if you have something around that is like, much

4:33

smarter than you and does not care about you one way

4:36

or the other, you probably end up dead at the end of

4:38

that. It's the way

4:40

it gets the most of whatever strange inscrutable

4:43

things that it wants are worlds

4:45

in which there are not humans taking

4:47

up space, using up resources,

4:50

building other AIs to compete with it, or just

4:52

a world in which you built enough power plants that

4:54

the surface of the earth got hot enough that humans didn't

4:57

survive. Gary, what do you have to say about

4:59

that? There are parts that

5:00

I agree with and parts that I don't. So I agree

5:03

that we are likely to wind up with AIs

5:06

that are smarter than us. I don't think we're particularly

5:08

close now, but in 10 years or 50 years

5:10

or 100 years at some point, it

5:13

could be a thousand years, but it will happen. I

5:15

think there's a lot of anthropomorphization

5:17

there about machines wanting things. Of

5:20

course they have objective functions and

5:22

we can talk about that. I think it's a presumption

5:24

to say that the default is that

5:27

they're gonna want something that leads to our demise

5:30

and that they're gonna be effective at that and

5:32

be able to literally kill us all.

5:34

I think if you look at the history of AI,

5:37

at least so far, they don't really have

5:39

wants beyond what we program them to

5:41

do. There is an alignment problem. I think

5:43

that that's real in the sense of like,

5:46

people program the system to do X and

5:48

they do X prime. That's kind of like X, but not

5:50

exactly. And so I think there's really things

5:52

to worry about. I think there's a real research

5:54

program here that is under-researched,

5:57

which is the way I would put

5:59

it. We want to understand how to make machines

6:02

that have values. Asimov's laws are way

6:04

too simple, but they're a kind of starting point for conversation.

6:07

We want to program machines that don't

6:09

harm humans. They can calculate the consequences

6:12

of their actions. Right now we have technology

6:14

like GPT-4 that has no idea what the consequences

6:17

of its actions are. It doesn't really

6:19

anticipate things. And there's a separate thing that Eliezer

6:21

didn't emphasize, which is it's not just how smart

6:23

the machines are, but how much power we give them, how

6:26

much we empower them to do things like

6:28

access the internet or manipulate

6:29

people or write

6:33

source code, access files and stuff like that.

6:36

And right now, auto-GPT can do all of those things,

6:38

and that's actually pretty disconcerting to me. To

6:40

me, that doesn't all add up to any

6:42

kind of extinction risk anytime

6:45

soon. But catastrophic risk, where things

6:47

go pretty wrong because we wanted

6:49

these systems to do X and we

6:51

didn't really specify it well, they don't really understand

6:53

our intentions. I think there are risks

6:55

like that. I don't see it as a default that

6:57

we wind up with extinction. I think it's pretty hard

7:00

to actually terminate the entire human

7:02

species. You're going to have people in Antarctica

7:05

that are going to be out of harm's way or whatever, or

7:07

you're going to have some people who respond

7:09

differently to any pathogen, et cetera. So

7:11

extinction is a pretty extreme

7:14

outcome that I don't think is particularly likely.

7:17

But the possibility that these machines

7:19

will cause mayhem because we don't know

7:21

how to enforce that they do what we want

7:23

them to do, I think that's a real thing to worry about. And

7:26

it's certainly worth doing research on.

7:28

Scott, how do you view this? Yeah, so I'm sure

7:30

that you can get the three of us arguing about

7:32

something, but I think you're going to get agreement

7:34

from all three of us that AI safety

7:37

is important and that catastrophic

7:40

outcomes, whether or not that

7:43

means literal human extinction are possible.

7:46

I think it's become apparent

7:49

over the last few years that this

7:52

century is going to be

7:55

largely defined by

7:58

our interaction.

7:59

with AI, that AI

8:02

is going to be transformative for human

8:05

civilization.

8:07

And I'm confident

8:10

of that much. And if you ask

8:12

me almost anything beyond that about

8:15

how is it going to transform civilization,

8:17

will it be good? Will it be bad? What

8:19

will the AI want? I

8:21

am pretty agnostic, just

8:24

because if you had asked

8:26

me 20 years ago to try to forecast

8:28

where we are now, I would have

8:31

gotten a lot wrong. My

8:34

only defense is I think that

8:37

all of us here, almost everyone in

8:39

the world, would have gotten a lot wrong

8:42

about where we are now. And so if I

8:44

try to envision where we are in 2043,

8:47

does the AI want to replace

8:49

humanity with

8:55

something better? Does it want to keep

8:57

us around as pets? Does

9:01

it want to just continue

9:04

helping us out? Just

9:08

a super souped up version of chat GPT?

9:11

I think all of those scenarios

9:14

merit consideration. But

9:16

I think that what has happened

9:18

in the last few years that's

9:20

really exciting is that AI

9:23

safety has become an empirical

9:25

subject. There are these very powerful

9:27

AIs that are now being deployed, and

9:30

we can actually learn something. We

9:33

can work on

9:36

mitigating the nearer term harms,

9:37

not because the

9:41

existential risk doesn't exist or

9:46

is absurd or is science fiction or anything

9:48

like that, but just because the nearer term

9:50

harms are the ones that we can see right

9:53

in front of us and where we can actually get feedback

9:56

from the external world about how we're

9:58

doing. We can learn something.

9:59

And hopefully some of the knowledge

10:02

that we gain will be useful in

10:05

addressing the longer term risks that

10:07

I think Eliezer is very rightly worried about. So

10:09

there seems to me there's alignment and

10:11

then there's alignment, right? So there's alignment

10:14

in the sense that we haven't even fully

10:16

aligned smartphone technology with our interests,

10:18

right? Like there are some ways in which

10:21

smartphones and social media have

10:24

led to probably deleterious mental health

10:26

outcomes, especially

10:28

for teenage girls,

10:29

for example. So there

10:32

are those kinds of mundane senses

10:34

of alignment where it's like, is this technology

10:37

doing more good than

10:39

harm in the normal everyday public

10:41

policy sense? And then there's the capital

10:43

A alignment of, are we creating

10:46

a creature that is going to view

10:48

us like ants and have no

10:51

problem extinguishing us, and

10:56

whether intentional or not. So it

10:58

seems to me all of you agree

10:59

that the first sense of alignment

11:02

is at the very least something to worry about now and

11:04

something to deal with. But I'm

11:06

curious to what extent you think the really

11:09

capital A sense of alignment is

11:11

a real problem because it can sound very much

11:13

like science fiction to people. So

11:16

maybe let's start with Eliezer. I

11:19

mean, from my perspective, I would say that

11:22

if we had a solid guarantee that

11:24

I was going to do no more harm

11:26

than social media, we ought to plow

11:28

ahead and get all the gains. It's not

11:31

enough harm to back this amount of

11:33

harm that social media has done to humanity while

11:35

very significant in my view. I think it's done

11:37

like a lot of damage to our sanity, but that's

11:40

just like not a large enough harm

11:42

to justify either forgoing

11:44

the gains that you could get from A.I. If

11:47

that was going to be the worst downside

11:49

or to justify the kind of drastic measures you'd

11:51

need to stop plowing ahead on A.I.

11:54

I think that the capital A alignment

11:56

is beyond this generation. You

11:58

know, I've started. I've

12:01

watched it, watched over it for two decades. I

12:04

feel like in some ways the modern generation plowing

12:06

in with their eyes on the

12:09

short term stuff is like losing track

12:11

of the larger problems because they can't solve the larger

12:13

problems and they can't solve the little problems. But we're just like

12:16

plowing straight into the big problems and

12:19

we're going to go plow right into the big problems with

12:21

a bunch of little solutions that aren't going to scale. I think

12:24

it's lethal. I think it's at the

12:26

scale where you just back off and

12:28

don't do this. By back off

12:29

and don't do this, what do you mean?

12:32

I mean, have

12:33

an international

12:36

treaty about where the chips

12:39

capable of doing AI training go and

12:41

have them all going into licensed,

12:44

monitored data centers and

12:47

not have the training runs for

12:50

AIs more powerful than GPT-4,

12:53

possibly even lowering that threshold over time

12:55

as algorithms improve and it gets

12:57

possible to train more powerful AIs using

13:00

less compute. So you're picturing a kind of international

13:03

agreement to just stop.

13:05

International moratorium. And

13:07

if North Korea steals the GPU

13:09

shipment, then you've got to be ready

13:11

to destroy their data

13:14

center that they build by conventional means. And

13:16

if you don't have that willingness in advance, then

13:18

countries may refuse to sign up for the agreement being like,

13:21

why aren't we just like ceding the

13:23

advantage to someone else then? It

13:25

actually has to be a worldwide shutdown because

13:28

the scale of harm from a super intelligence,

13:31

it's not that if you have 10 times

13:33

as many super intelligences, you've got 10

13:35

times as much harm. It's not that a super

13:37

intelligence only wrecks the country that

13:39

built the super intelligence. Any super intelligence

13:41

everywhere is anyone's last problem.

13:44

So Gary and Scott, if either of you want to jump in

13:46

there, I mean, is there, is AI

13:49

safety a matter of forestalling

13:52

the end of the world and all of these smaller

13:54

issues and pass

13:56

towards safety that Scott, you mentioned are just, you

13:58

know,

13:59

throwing, I don't know what the

14:02

analogy is, but pointless essentially.

14:04

What do you guys make of this?

14:07

The journey of a thousand miles begins

14:09

with a step. The way I think about this comes

14:12

from 25 years

14:17

of doing computer science research,

14:19

including quantum computing

14:21

and computational complexity,

14:24

things like that, where we have these gigantic

14:26

aspirational problems that we we

14:29

don't know

14:29

how to solve. And yet,

14:32

a year after year, we do make

14:34

progress. We pick off little subproblems.

14:36

And if we can't solve those, then we find

14:39

subproblems of those. And we keep

14:41

repeating until we find something that we can

14:43

solve. And this is,

14:45

I think, for centuries, the way that science

14:48

has made progress. Now, it is

14:50

possible that this time,

14:52

we just don't have enough time for that

14:54

to work. And I think that

14:56

is what Eliezer is fearful

14:59

of, that we just

14:59

don't have enough time for the ordinary

15:02

scientific process to

15:04

work before AI becomes too powerful,

15:08

in which case, you start talking about

15:10

things like a global moratorium

15:14

enforced with the threat of war. I

15:17

am not ready to go there. I

15:20

could imagine circumstances where

15:22

maybe I say, gosh, this

15:25

looks like such an imminent threat that we

15:27

have to. But I tend to be very, very worried in

15:33

general about causing a catastrophe

15:35

in the course of trying to prevent a catastrophe.

15:38

And I think when you're

15:40

talking about threatening

15:43

airstrikes against data centers or things like that,

15:45

then that's an obvious worry. I'm

15:48

sort of somewhere in between, I guess. I

15:51

don't think that there's... So I'm somewhat

15:53

in between here.

15:54

I'm with Scott that we are not

15:57

at the point where we should be bombing data centers. I don't think we're going

15:59

to be able to do that.

15:59

close to that. I'm much less what

16:02

the right word is to use here. I

16:04

don't think we're anywhere near as close to

16:07

AGI as I think Eliezer sometimes sounds

16:09

like. I don't think GPT-5 is

16:11

anything like AGI, and I'm not particularly

16:14

concerned about who gets it first and so

16:16

forth. On the other hand, I think that we're

16:18

in a sort of dress rehearsal mode. Nobody

16:21

expected GPT-4, really, chat

16:23

GPT to percolate as fast as it

16:25

did. It's a reminder that there's

16:28

a social side to all of this, how software

16:30

gets distributed matters, and

16:32

there's a corporate side. It was a kind

16:34

of galvanizing moment for me when Microsoft

16:37

didn't pull Sydney, even though Sydney did some

16:39

awfully strange things. I thought they would take

16:41

it for a while, and it's a reminder that they can make whatever decisions

16:44

they want. We kind of multiply that by

16:46

Eliezer's concerns about what do

16:48

we do and at what point what

16:51

would be enough to cause

16:53

problems. It is a reminder, I think, that we

16:55

need, for example, to start roughing out these

16:58

international treaties now because there could become

16:59

a moment where there is a problem. I don't think

17:02

the problem that Eliezer sees is here now,

17:04

but maybe it will be. And maybe when it does

17:07

come, we will have so many people pursuing

17:09

commercial self-interest and so little infrastructure

17:12

in place we won't be able to do anything. So I think

17:14

it really is important to think now. If

17:16

we reach such a point, what are we going to

17:18

do? What do we need to build in place

17:20

before we get to that point?

17:26

Question. Have you ever wondered about the impact

17:28

of your charitable donation? I mean, how

17:30

much good can your contribution really do?

17:33

The answer is not always easy to find, but if you're

17:35

invested in making a difference in the world, then

17:37

allow me to introduce you to GiveWell. GiveWell

17:40

is not your typical charity platform. They

17:42

spend countless hours researching charitable

17:44

organizations, diving deep into evidence

17:46

and hard data. Over the past 15 years,

17:49

they've been vetting, scrutinizing, and only

17:51

recommending the highest impact opportunities

17:53

they found. Their process involves a team

17:56

of 25

17:56

researchers who put in over 40,000 hours

17:58

each year

17:59

to maximize the impact of your donations.

18:02

Over 100,000 donors have already

18:04

trusted GiveWell to allocate their donations

18:07

wisely. For just $5, you can provide

18:09

a bed net to prevent malaria. $7 can

18:11

provide a child with malaria treatment through the high

18:14

malarial season. Just $1 can deliver

18:16

a vitamin A supplement to a child, a deficiency

18:18

of which can increase mortality rates. Even

18:21

as little as 160 bucks can vaccinate

18:23

an infant, helping prevent diseases and reduce

18:25

child mortality. This is how GiveWell operates.

18:28

They measure, they model, they review, and they forecast

18:30

the impact, all to ensure that your donations

18:33

are used in the best possible way. And

18:35

the best part? All of their research and

18:37

recommendations are available for

18:39

free on their website. So if you wanna

18:41

make an informed decision about high impact

18:44

giving, head over to givewell.org. When

18:46

you make a donation, let them know you heard

18:49

about them from me. Just select podcast

18:51

and enter conversations with Coleman at checkout.

18:54

Remember, GiveWell does not take a cut from

18:56

your donation. All of it goes directly to

18:58

help those who need it most. Once again, head

19:01

over to givewell.org. And when you

19:03

make a donation, let them know you heard about

19:05

them through me. Just select podcast

19:07

and enter conversations with Coleman at checkout.

19:10

What's a game where no

19:12

one wins?

19:17

The waiting

19:20

game. When it comes to hiring, don't

19:22

wait for great talent to find you. Find them

19:24

first. When you're hiring, you need Indeed.

19:27

Indeed's the hiring platform where you can attract, interview

19:29

and hire all in one place. Instead

19:31

of spending hours on multiple job sites

19:33

searching for candidates with the right skills, Indeed's

19:36

a powerful hiring platform that can help you

19:38

do it all. They streamline hiring with powerful

19:40

tools that find you matched candidates. With Instant

19:42

Match, over 80% of employers get quality

19:45

candidates whose resume on Indeed matches their

19:47

job description the moment they sponsor a

19:49

job. Candidates you invite to apply are

19:51

three times more likely to apply to your job

19:53

than candidates who only see it in search. Indeed

19:56

gets you one step closer to the hire by

19:58

immediately matching you with quality. candidates.

20:01

Indeed does the hard work for you. Indeed

20:03

shows you candidates whose resumes on Indeed

20:05

fit your description immediately after

20:07

you post so you can hire faster. Indeed's

20:10

hiring platform matches you with quality

20:12

candidates instantly. Even better, Indeed's

20:14

the only job site where you only pay for applications

20:17

that meet your must-have requirements. Indeed

20:19

is an unbelievably powerful hiring

20:21

platform, delivering four times more

20:23

hires than all other job sites combined,

20:25

according to TalentNest. Join more than 3

20:28

million businesses worldwide

20:29

that use Indeed to hire great

20:32

talent fast. Start hiring now with

20:34

a $75 sponsored job credit

20:36

to upgrade your job posts at Indeed.com

20:39

slash conversations. Offer only

20:41

good for a limited time. Claim your $75 credit right

20:44

now at Indeed.com slash

20:47

conversations. Just go to Indeed.com

20:49

slash conversations and support the show

20:51

by saying you heard about it on this podcast. Indeed.com

20:54

slash conversations. Terms and conditions apply.

21:01

So we've been talking about

21:03

this concept of artificial general intelligence

21:07

and I think it's worth asking

21:09

whether that is a useful

21:12

coherent concept. So

21:14

for example if I were to think by analogy

21:16

to athleticism and think

21:18

of the moment when we build a machine

21:21

that has say

21:23

artificial general athleticism meaning

21:26

it's you know better than LeBron

21:28

James at basketball but also better

21:30

at the at curling than the world's best

21:33

curling player and also better at soccer

21:35

and also better at archery and

21:38

so forth.

21:39

It would seem to me

21:41

that there's something a bit strange

21:43

as framing it as having reached a

21:46

point on a single continuum.

21:49

It seems to me you would sort of have to build

21:52

each capability each sport

21:54

individually and then somehow figure

21:57

how to package them all into one robot

21:59

with without each skill set detracting

22:02

from the other. Is that a disanalogy?

22:06

Do you all picture this intelligence

22:09

as sort of one dimension, one

22:11

knob that is going to get turned

22:13

up along a single

22:16

axis? Or do you think that way

22:18

of talking about it is misleading

22:20

in the same way that I kind of just sketched

22:23

out?

22:23

I would absolutely not accept that.

22:26

I like to say that intelligence is not a one dimensional

22:28

variable. There's many

22:31

different aspects to intelligence.

22:34

There's not, I think, going to be a magical moment when

22:36

we reach the singularity or something like that.

22:40

I would say that the core of artificial general intelligence

22:42

is the ability to flexibly

22:44

deal with new problems that you haven't

22:47

seen before. Current

22:49

systems can do that a little bit, but not very well. My

22:52

typical example of this now is GPT-4

22:55

is exposed to the game of chess, sees

22:57

lots of games of chess, it sees the rules of chess,

23:00

but it never actually figures out the rules of chess

23:02

and makes illegal moves and so forth. It's

23:04

in no way a general intelligence that can

23:06

just pick up new things. Of course, we have things

23:08

like AlphaGo that can play a certain set of

23:11

games, or AlphaZero really, but

23:13

we don't have anything that has the generality

23:15

of human intelligence. Human

23:18

intelligence is just one example of general

23:20

intelligence. You could argue that chimpanzees

23:22

or crows have another variety of general intelligence.

23:26

I would say the current machines don't really have it,

23:29

but they will eventually.

23:30

I think a priori, it

23:33

could have been that you

23:35

would have math ability,

23:37

you would have verbal ability, you'd

23:39

have ability to understand

23:42

humor, and they'd all be just completely

23:44

unrelated to each other. That is

23:46

possible. In fact,

23:50

already with GPT, you can say that

23:52

in some ways, it already is a

23:54

superintelligence. It knows

23:57

vastly more, can converse on

23:59

a vastly greater scale.

23:59

range of subjects than any human

24:02

can. And in other ways,

24:05

it seems to fall short of

24:08

what humans know or

24:10

can do. But you

24:13

also see this sort of generality

24:16

just empirically. So, I mean,

24:19

GPT was sort of trained

24:22

on all the text on the internet.

24:26

You know, let's say most of the

24:28

text on the open internet.

24:29

So it was just one

24:32

method. It was not explicitly

24:35

designed to write code, and yet

24:37

it can write code. And at the

24:39

same time as that ability emerged,

24:42

you also saw the ability to

24:44

solve word problems, like

24:47

high school level math. You saw

24:49

the ability to write poetry. This

24:52

all came out of the same system without

24:54

any of it. You know, being explicitly

24:57

optimized for. And so I

24:59

feel like I need

24:59

to interject one important thing, which is

25:02

it can do all these things, but none of them all that reliably. OK.

25:06

Nevertheless, I mean, compared to, you know,

25:08

what let's say what my expectations

25:10

would have been if you'd ask me 10 or 20 years

25:12

ago, I think that the level of generality

25:15

is pretty remarkable. And, you

25:18

know, and it does lend support to the

25:20

idea that there is some sort of general

25:22

quality of understanding there where

25:24

you could say, for example, that GPT-4

25:27

has more of it than GPT-3,

25:29

which

25:29

in turn has more than GPT-2. And

25:32

I would say that it does

25:34

seem to me like it's presently pretty unambiguous

25:37

that GPT-4 is in some sense

25:39

dumber than in adult or

25:42

even teenage human. That's not obvious

25:44

to me. Why would you say that? I think that we

25:46

will eventually get that obvious, too. I

25:49

mean, to take the example I just gave you a minute

25:51

ago, it never learns to play chess, even

25:53

with a huge amount of data. So

25:56

it will play a little bit of chess.

25:58

It will memorize the openings and be able to play chess.

25:59

okay for the first 15 moves, but

26:02

it gets far enough away from what it's trained on and it falls

26:04

apart. This is characteristic of

26:06

these systems. It's not really characteristic in the same

26:08

way of adults or even

26:10

teenage humans. Almost

26:12

anything that it does, it does unreliably. And to

26:14

give another example, you can ask a

26:17

human to write a biography of someone and don't

26:19

make stuff up. And you really can't ask GPT

26:21

to do that. Yeah. Like, it's a bit

26:24

difficult because you could always be cherry picking

26:26

something that humans aren't usually good at. But

26:28

to me, it does seem like there's this

26:29

broad range of problems that don't

26:32

seem especially to play to humans

26:34

strong points or machine

26:37

weak points where GPT4 will

26:39

do no better than a

26:42

seven year old on those problems. Hold on. Can I

26:44

interject here? I do

26:46

feel like these examples are cherry

26:48

picked because if I just take a different,

26:51

very typical example, I'm writing an op-ed

26:54

for the New York Times, say, about any given

26:56

subject in the world. And my choice is to

26:58

have a smart 14

26:59

year old next to me with anything that's

27:02

in his mind already or GPT. And

27:04

there's no comparison, right? So

27:07

which of these sort of examples is

27:09

the litmus test for a human or a intelligent?

27:12

If you did it on a topic

27:14

where it couldn't rely on memorized

27:16

text, you might actually change your mind

27:19

on that. So the thing about writing a

27:21

Times op-ed is most of the things that

27:24

you propose to it, there's actually something

27:26

that it can pastiche together from its dataset.

27:28

That doesn't mean that it really understands what's going

27:31

on. It doesn't mean that that's a general capability.

27:33

Also, as the human, you're doing all the hard

27:35

parts, right? Like, obviously,

27:37

a human is going to prefer if

27:40

a human has a math problem, you're going to rather use a calculator

27:43

than another human. And similarly,

27:44

with the New York Times op-ed, you're doing all

27:46

the parts that are hard for GPT4.

27:50

And then you're like asking GPT4 to just do

27:52

some of the parts that are hard for you. You're

27:54

always going to prefer an AI partner rather than a human

27:56

partner within that sort

27:59

of like the human can do all the human stuff and

28:02

you want an AI to do whatever the AI is good

28:04

at at the moment. An analogy that's maybe

28:06

a little bit helpful here is driverless cars. It

28:09

turns out that

28:10

on highways and ordinary traffic, they're probably

28:13

better than people. In unusual

28:15

circumstances, they're really worse than people. So

28:17

Tesla not too long ago ran into a jet

28:19

and human wouldn't do that. Like slow

28:22

speed being summoned across a parking lot. Human

28:24

would never do that. So there are different

28:26

strengths and weaknesses. The strengths of

28:28

a lot of the current kinds of technology

28:30

is that they can either pastiche together

28:33

or make not literal analogies

28:35

when we go into the details, but to stored

28:38

examples and they tend to be poor when

28:40

you get to outlier cases. And

28:43

that's persistent across most of the technologies

28:46

that we use right now. And so if you

28:48

stick to stuff in which there's a lot of data,

28:50

you'll be happy with the results you get from these systems.

28:53

You move far enough away, not so much.

28:56

And what we're going to see over time is that

28:58

the length of the debate about whether or not

29:00

it's still dumber than you gets longer

29:02

and longer and longer. And then

29:05

if things are allowed to just keep running and nobody

29:07

dies, then at some point that switches over

29:10

to a very long debate about is

29:12

it smarter than you, which then gets shorter

29:14

and shorter and shorter and eventually reaches

29:16

a point where it's pretty

29:19

unambiguous if you're paying attention.

29:20

Now, I suspect that this process

29:22

gets interrupted by everybody dying. In

29:25

particular, there's a question of the point at which it

29:27

becomes better than you, better

29:29

than humanity at building the next edition

29:31

of the AI system and how fast do things snowball

29:34

once you get to that point. Possibly you do not

29:36

have time for further public

29:39

debates or even

29:41

a two hour Twitter space depending on how that goes. I

29:44

mean, some of the limitations of

29:46

JPT are

29:48

completely understandable, just from

29:50

a little knowledge of how it works. It

29:53

does not have an internal memory,

29:57

per se, other than what appears

29:59

on the screen.

29:59

the screen in front of you. So this

30:02

is why it's turned out to be so effective

30:04

to explicitly tell it, like, let's

30:07

think step by step when it's solving

30:09

a math problem, for example. You have to

30:11

tell it to show all of its work

30:13

because it doesn't have an internal

30:15

memory with which to do that. Likewise,

30:18

when people complain about it hallucinating

30:21

references that don't exist, well, the truth

30:24

is when someone asks me for

30:26

a citation, if I'm not

30:28

allowed to use Google, I might

30:29

have a vague recollection of

30:32

some of the authors and I'll probably

30:34

do a very similar thing to what GPT

30:36

does. I'll hallucinate. Right. So

30:38

well, no, there's a great phrase I learned the other day, which is

30:41

frequently wrong, never in doubt. That's

30:43

true. That's true. I'm not

30:46

going to make up a reference with full detail,

30:48

page numbers, titles, so

30:50

forth. I might say, look, I don't remember 2012 or something like

30:52

that. Here's

30:55

GPT-4, what it's going to

30:57

say is 2017, Aronson and Yudkowsky,

30:59

New York Times, pages 13 to 17.

31:03

No, it does need to get much, much better

31:06

at knowing what it doesn't know. And yet,

31:08

already I've seen a noticeable

31:11

improvement there, going from GPT-3

31:13

to GPT-4. For example,

31:16

if you ask GPT-3, prove that

31:18

there are only finitely many prime

31:20

numbers, it will give you a proof, even

31:23

though the statement is false. And

31:25

it will have an error, which is similar

31:27

to the errors on 1,000 exams

31:29

that

31:29

I've graded, just trying

31:32

to get something past you, hoping that

31:34

you won't notice. If you ask GPT-4,

31:37

prove that there are only finitely many prime

31:39

numbers, it says, no, that's a trick question. Actually,

31:42

there are infinitely many primes. And here's why.

31:44

Yeah. Part of the problem with doing the science

31:47

here is that I think you would know better since

31:49

you work part-time or whatever to open AI.

31:52

But my sense is that a lot of the examples

31:54

that get posted on Twitter, particularly

31:57

by the likes of Meme and other critics,

31:59

are not really good.

31:59

or other skeptics, I should say, is

32:02

the system gets trained on those. So,

32:05

you know, almost everything that people write about

32:07

it, I think, is in the training set. It's

32:09

hard to do the science when the system's constantly

32:11

being trained, especially in the RLA-HF

32:14

side of things. And we don't actually know what's in GPT-4,

32:17

so we don't even know if they're regular expressions

32:19

and, you know, simple rules to match things. So we

32:22

can't do the kind of science we used to be able to

32:24

do.

32:25

This conversation, this

32:27

subtree of the conversation, I think, has no

32:30

natural endpoint. So if I can sort

32:32

of zoom out a bit, I think there's a, you

32:34

know, pretty solid sense in which

32:37

humans are more generally intelligent than chimpanzees.

32:40

As you get closer and closer to

32:42

the human level, I

32:44

would say that the direction

32:47

here is still clear, that the comparison is still clear.

32:50

We are still smarter than GPT-4. This

32:52

is not going to take control of the world from us. But,

32:55

you know, the conversations get longer. It

32:58

gets, the definitions start to break

33:00

down around the edges. But I

33:02

think it also, as you keep going, like it comes

33:04

back together again, there's a point,

33:07

and possibly this point is like very close to

33:09

the point of time to where everybody dies. So maybe we

33:11

don't ever like see it in a podcast,

33:13

but there's a point where it's, you know, unambiguously

33:16

smarter than you. And including

33:19

like the spark of creativity,

33:22

being able to deduce things quickly

33:25

rather than with tons and tons of extra evidence,

33:27

strategy, cunning, modeling

33:29

people, figuring out how to manipulate

33:32

people.

33:33

So let's stipulate, Eliezer,

33:35

that we're going to get to machines that can do

33:37

all of that. And then the question is, what

33:39

are they going to do? Is it a certainty

33:42

that they will make our annihilation

33:44

part of their business? Is it a possibility?

33:47

Is it an unlikely possibility? I

33:49

think your view is that it's a certainty. I've

33:51

never really understood that part. It's a

33:53

certainty on the present tech is

33:55

the way I would put it. Like if that happened, so

33:58

in particular, like if that.

33:59

happen tomorrow, then you know, Modulo

34:02

Cromwell's rule never say certain. Like

34:05

my probability is like, yes, Modulo, the

34:08

chance that my model is somehow just completely

34:10

mistaken. If we got 50 years

34:13

to work it out and unlimited

34:15

retries, I'd be a lot more confident. I

34:18

think that'd be pretty OK. I think we'd make it. The

34:21

problem is that it's a lot harder to

34:24

do science when your first wrong try destroys

34:26

the human species, and then you don't get to try again.

34:28

I mean, I think there's something,

34:29

again, that I agree with and something I'm a little

34:32

bit skeptical about. So

34:33

I agree that the amount of time we have

34:36

matters. And I would also agree

34:38

that there's no existing technology

34:41

that

34:42

solves the alignment problem that gives a moral

34:44

basis to these machines. I mean, GPT-4

34:46

is fundamentally amoral. I don't think

34:48

it's immoral. It's not out to get us. But

34:51

it really is amoral. It

34:53

can answer trolley problems because there are trolley problems

34:56

in the data set. But that doesn't mean that it really has a moral

34:59

understanding of the world. And so if

35:01

we get to a very smart machine

35:03

by all the criteria that we've talked

35:06

about, and it's amoral, then that's a problem

35:08

for us. And there's a question of whether

35:10

if we can get to smart machines,

35:14

whether we can build them in a way that will have some

35:16

moral basis. And I think we need

35:18

to make progress. Well, I

35:20

mean, the first try part I'm not willing to let

35:22

pass. So I understand,

35:25

I think, your argument there. Maybe you should spell it out.

35:28

I think that we probably get more than one

35:30

shot and that it's not as

35:32

dramatic and instantaneous

35:33

as you think. I do

35:36

think one wants to think about sandboxing. One

35:38

wants to think about distribution. But I mean,

35:40

let's say we had one evil super

35:42

genius now who is smarter than everybody

35:44

else. Like, so what? One

35:47

super smart. Say again? Not

35:49

just a little smarter. Even a lot smarter.

35:52

Most super geniuses

35:56

aren't actually that effective. They're not that focused.

35:58

They were focused on other things.

35:59

You know, you're kind of assuming that

36:02

the first super genius AI

36:04

is gonna make it its business to annihilate

36:06

us And that's the part where I I still

36:08

am a bit stuck in the argument.

36:10

Yeah Some

36:12

of this has to do with the notion that

36:15

if you do a bunch of training you

36:17

start to get Gold direction

36:19

even if you don't explicitly train on that That

36:22

goal direction is a natural way to

36:24

achieve higher capabilities the reasons why

36:27

reason why humans want things is that wanting

36:29

things is an effective way of getting things and

36:32

and so natural selection in

36:34

the process of selecting

36:36

exclusively on reproductive fitness just

36:38

on that one thing got us to

36:41

want a bunch of things that correlated with reproductive

36:44

fitness thing the ancestral distribution because wanting

36:47

having intelligences that

36:49

want things is a good way of getting

36:51

things that's in a sense like

36:54

You know wanting comes from the same place as

36:56

intelligence itself and you could even

36:58

you know from a certain technical standpoint on expected

37:01

utility Say that intelligence is a special

37:03

is a very effective way of wanting planning

37:06

plotting pass through time that lead to particular outcomes

37:09

So so part of it is that I think it I

37:12

do not think you get like the breeding super intelligence

37:14

that Wants nothing because I don't think

37:16

that wanting an intelligence can be

37:18

pride internally pride apart that easily I think

37:21

that the way you get super intelligence is is that

37:24

there are things that have gotten good at Organizing

37:27

their own thoughts and have good taste

37:29

in which thoughts to think

37:30

and that is where the high capabilities

37:33

come from Can I put a point to you le is around

37:35

it on this and then that does mean that they have internal

37:37

Let me just put this point to you I think it

37:39

look can let me just put the following point to you which I think

37:42

In my mind is similar to what Gary was saying As

37:49

We dive deeper into the heart of summer with

37:51

the Sun beaming down and the days inviting

37:53

us all to be more active We all need

37:55

wholesome convenient meals to keep us

37:57

going. I know that's certainly true for me And

38:00

that's where Factor comes in, America's number

38:02

one ready to eat meal kit. I have to tell

38:04

you, as someone who gets six protein-rich

38:06

meals from Factor every week, it's been

38:08

a game changer for me. The flavorful and

38:10

nutritious ready to eat meals are delivered right

38:12

to my door, saving me time and keeping me on

38:15

track with my wellness goals. We all know the struggle.

38:17

Summer plans, keeping us too busy to cook, yet

38:20

we still want to eat well. With Factor, the

38:22

grocery trips, the chopping, prepping, and

38:24

cleaning up are all things of the past, but

38:27

without compromising on flavor or nutritional

38:29

quality. These fresh, never frozen

38:31

meals are ready in just two minutes. All

38:33

I have to do is heat, enjoy, and get back to

38:35

soaking up the sun. Each meal from Factor

38:38

is like a treat. With high quality ingredients

38:40

like broccolini, leeks, asparagus,

38:42

and over 34 weekly restaurant quality

38:44

options like shrimp risotto, green goddess

38:46

chicken, grilled steakhouse filet mignon,

38:49

I always have a flavorful variety to choose

38:51

from. Factor's lunch to go is a lifesaver

38:54

when I'm busy. Wholesome grain bowls

38:56

and salad toppers keep my energy levels up.

38:58

No microwave even required

38:59

for those. So if you're like me and you want to enjoy

39:02

the summer without the hassle of meal prep, get

39:04

Factor. Just choose your meals and savor

39:06

the fresh flavor-packed meals delivered right

39:08

to your door. Here's the special

39:10

offer. Head to factormeals.com slash Coleman50

39:13

and use the code Coleman50 to get 50% off. That's

39:16

Coleman50 at factormeals.com slash

39:19

Coleman50 to get 50% off.

39:25

There's often in philosophy, this

39:27

notion of the continuum fallacy,

39:29

which in the canonical

39:32

example is like, you can't locate

39:35

a single hair that you would pluck from my head where

39:37

I would suddenly go from not bald

39:40

to bald. Or like the example, the even

39:43

more intuitive example is like a color wheel. Like

39:45

there's no single, on a gray

39:47

scale, there's no single pixel you

39:49

can point to and say, well, that's where gray begins

39:52

and white ends. And yet we have

39:54

this conceptual distinction that feels hard

39:56

and fast between

39:57

gray and white and gray and black and

39:59

so forth.

39:59

When we're talking about

40:02

artificial general intelligence or super

40:04

intelligence, you seem to

40:06

operate on a model where either

40:09

it's a super intelligence capable of destroying

40:11

all of us or it's not. Whereas

40:15

intelligence may just be a

40:17

continuum fallacy style spectrum

40:19

where we're first going to see the shades

40:22

of something that's just a bit more

40:24

intelligent than us and maybe it can kill

40:26

five people at most and then it can...

40:30

And when that happens, we're

40:32

going to want to intervene and

40:34

we're going to figure out how to intervene at that level

40:37

and so on and so forth. Well, if it's stupid enough to do it,

40:39

then yeah. Yeah, so... If it's stupid

40:41

enough to do it, then yes. Let

40:44

me buy the identical logic. There

40:46

should be nobody who steals money on

40:48

a really large scale, right? Because

40:50

you could just give them $5 and see if they steal that. And

40:54

if they don't steal that, you know you're good to trust them

40:56

with a billion. I mean, I think that

40:58

in actuality, anyone who did steal

41:00

a billion dollars probably displayed

41:03

some dishonest behavior

41:04

earlier in their life that was

41:07

unfortunately not acted upon

41:09

early enough. I'm actually

41:11

not even... The analogy... Yeah, but...

41:14

Hold on, hold on. The analogy I have pictures

41:16

like we have the first case of fraud that's $10,000

41:18

and then we build systems

41:20

to prevent it, but then they fail with a somewhat

41:23

smarter opponent, but our systems get better and

41:25

better and better. And so we actually prevent

41:27

the billion dollar fraud because of the systems

41:29

put in place in response to

41:31

the $10,000 frauds. I mean,

41:33

I think Coleman's putting his finger on an important

41:36

point here, which is how much do we get to iterate?

41:39

And Eliezer is saying, the minute

41:41

we have a super intelligent system, we won't

41:43

be able to iterate because it's all over immediately.

41:45

Well, there isn't a minute like that. The

41:47

way that the continuum goes to the threshold

41:50

is that you eventually get something that's smart enough

41:53

that it knows not to play its hand

41:55

early. And then if that thing, you

41:57

know, if you are still cranking up the power...

41:59

that and preserving its utility function,

42:02

it knows it just has to wait to be smarter,

42:04

to be able to win. It doesn't play its hand

42:07

prematurely. It doesn't tip you off. It's not in

42:09

its interest to do that. It's in its interest to cooperate

42:11

until it thinks it can win against humanity and

42:13

only then make its move. If it doesn't

42:16

expect the future smarter AIs to be smarter

42:18

than itself, then we might perhaps see these early AIs

42:20

telling humanity, don't build the later

42:22

AIs. And I would be sort

42:25

of surprised and amused if we ended

42:27

up in that particular sort of like science fiction

42:29

scenario

42:29

as I see it. But we're already in like

42:32

something that, you know, me from 10 years ago would have

42:34

called the science fiction scenario, which is the

42:36

things that I'll talk to you without being very smart. I

42:39

always come up, Eliezer, against

42:41

this idea that you're assuming

42:44

that the very bright machines, the super

42:46

intelligent machines will be malicious

42:49

and duplicitous and so forth. And I just

42:51

don't see that as a logical entailment

42:54

of being very smart. I mean,

42:57

they don't specifically want

42:59

as an end in itself for you to be destroyed.

43:02

They're just doing whatever obtains the most

43:04

of the stuff that they actually want, which doesn't specifically

43:07

have a term that's maximized by

43:09

humanity surviving and doing well.

43:12

Why can't you just hard code? Don't

43:15

do anything that will annihilate the human species. Don't

43:18

do anything.

43:18

We don't know how. There is no technology

43:21

to hard code. So there I agree

43:23

with you, but I think

43:25

it's important if I can just run for one second. I

43:28

agree that right now we don't have the technology

43:31

to hard code. Don't

43:33

do harm to humans. But for me, it

43:36

all boils down to a question of are we going to get to

43:38

the smart machines before we make progress

43:40

on that hard coding problem or not? And that

43:42

to me, that means that problem of hard

43:44

coding ethical values is actually

43:47

one of the most important projects that we

43:48

should be working on.

43:50

Yeah. Yeah. And

43:52

I tried to work on it 20 years in advance and capabilities

43:55

are just like running vastly out of alignment.

43:57

When I started working on this 20 years. like

44:00

two decades ago, we were in a sense

44:02

ahead of where we are now.

44:03

AlphaGo is much more controllable

44:06

than GPT-4. So there I agree

44:08

with you. We've fallen in love with a

44:10

technology that is fairly poorly

44:13

controlled. AlphaGo is very easily

44:15

controlled and very well

44:17

specified. We know what it does. We can more or less

44:19

interpret why it's doing it. And everybody's

44:22

in love with these large language models and they're

44:25

much less controlled. And you're

44:27

right, we haven't made a lot of progress on alignment.

44:30

So if we just go on a straight line, everybody

44:32

dies. I think that's, this

44:33

is an important fact. I would almost even

44:36

accept that for argument, but

44:38

ask then, just for the sake of argument, but

44:40

then ask, do we have to be on a straight line?

44:43

I mean, I would agree to the weaker claim

44:46

that, you know, we should certainly

44:48

be extremely worried about the intentions

44:51

of a super intelligence in the same way

44:53

that say chimpanzees should be

44:55

worried about the intentions of the

44:57

first humans that arise.

45:00

And in fact, chimpanzees

45:03

continue to exist in

45:05

our world only at human's pleasure. But I

45:07

think that there were a lot of other considerations

45:09

here. For example, if we imagined

45:12

that GPT-10

45:14

is the first unaligned super

45:17

intelligence that has these

45:20

sorts of goals, well, then, you know, it would be appearing

45:22

in a world where presumably GPT-9,

45:25

you know, already has very wide diffusion

45:28

and where people can use that

45:30

to try to, you know, you know, and GPT-9

45:33

was not destroying the world, you know, by

45:35

assumption. Why does GPT-9

45:37

work with humans instead of with GPT-10?

45:40

Well, I don't know. I mean, maybe, maybe,

45:42

maybe it does work with GPT-10, but,

45:45

you know, I just, I just don't view that

45:47

as a certainty. You know,

45:49

I mean, I think, you know, your certainty

45:51

about this

45:54

is the one place where I really get off

45:56

the train. Same with me.

45:58

I, well, I mean, I'm not asking. asking you

46:00

to share my certainty, I am

46:02

asking the viewers to believe

46:05

that you might end up with like more

46:08

extreme probabilities after you stare

46:10

at things for an additional couple of decades. That

46:12

doesn't mean you have to accept my probabilities immediately,

46:15

but I'm at least asking you to like not treat that as some kind

46:17

of weird anomaly.

46:19

You're just going to find those kinds of situations

46:21

in these debates. My view is that

46:24

I don't find the extreme probabilities

46:26

that you described to be plausible, but

46:28

I find the question that you're raising to be

46:30

important. I think maybe

46:33

straight line is too extreme, but this idea

46:35

that if you just follow current

46:37

trends, we're getting more, I'm sorry, we're

46:39

getting less and less controllable machines

46:42

and not getting more alignment.

46:45

Machines that are more unpredictable, harder to

46:47

interpret, and know better at sticking

46:49

to even a basic principle like be

46:52

honest and don't make stuff up. In fact,

46:54

that's a problem that other technologies don't really have.

46:56

Modeling systems, GPS systems don't make stuff

46:59

up. Google search doesn't make stuff

47:01

up. It will point to things that other people have made

47:03

stuff up, but it doesn't itself do it. In

47:05

that sense, the trend line is not great. I

47:07

agree with that. I agree that

47:10

we should be really worried about that and we should put

47:12

effort into it. Even if I don't agree with

47:14

the probabilities that you attach to it. I mean,

47:16

LESR- Let me interject with the question here. Go

47:19

ahead, Scott. Go ahead, Scott. No,

47:22

I mean, I think that LESR deserves

47:24

sort of eternal credit for raising

47:26

the issue. I was talking about the issues that were facing

47:28

these issues 20 years ago and it was very, very far from

47:31

obvious to most of us that they would be

47:33

live issues. I can say for

47:35

my part, I was familiar with

47:38

LESR's views since 2006 or so. When

47:43

I first encountered them, I

47:46

knew that there was no principle that this

47:48

scenario was impossible, but I just

47:51

felt like, well, supposing I agreed with

47:53

that,

47:54

what do you want me to do about it? Where

47:56

is the research program that has any hope

47:58

of making progress?

47:59

I mean, there's one question

48:02

of what are the most important problems in the world,

48:04

but in science, that's necessary

48:06

but not sufficient. We need something that we can make

48:08

progress on. And

48:11

that is the thing that I think

48:13

has changed just

48:16

recently with the advent of

48:18

actual very powerful AIs. And

48:21

so the sort of irony here is

48:23

that as Eliezer has

48:25

gotten much more pessimistic,

48:28

unfortunately, in the last few years

48:29

about alignment,

48:32

I've sort of gotten more optimistic.

48:35

I feel like, well, there is a research

48:37

program that we can actually

48:39

make progress on now. Yeah,

48:42

your research program is going to take 100 years and we don't

48:44

have 100 years. I don't know how long it will take. We

48:46

don't know that. Exactly. We

48:48

don't know. I think the argument that we should

48:50

put a lot more effort into it is clear.

48:53

I think the argument that will take 100 years is totally

48:55

unclear. I mean, I'm not even sure you can do

48:57

it in 100 years because there's the basic problem of

48:59

getting it right on the first try. And

49:01

the way these things are supposed to work in science is

49:04

that you have your bright-eyed optimistic youngsters

49:06

with their vastly oversimplified, hopelessly

49:08

idealistic, optimistic plan. They charge

49:11

ahead. They fail. They learn

49:13

a little cynicism. They learn a little pessimism. They

49:15

learn it's not as easy as that. They try

49:17

again. They fail again. They start to

49:19

build up something like battle-hardening.

49:23

And they find out how

49:25

little is possible to them. Aliezer,

49:27

this is a place where I just really don't agree

49:29

with you. So I think there's all kinds of things we can

49:31

do that are sort of of the flavor of

49:34

model organisms or simulations and

49:36

so forth. And we just mean it's

49:38

hard because we don't actually have a super intelligence

49:40

so we can't fully calibrate. But it's

49:43

a leap to say that there's nothing iterative that

49:45

we can do here or that we have to get it right

49:47

on the first time. I mean, I certainly see

49:49

a scenario where that's true. Where getting it

49:52

right on the first time

49:53

does make the difference. But

49:55

I can see lots of scenarios where it doesn't and where we do

49:57

have time to iterate before it happens, after

49:59

it happens.

49:59

happens. It's really not a single moment,

50:02

but I'm, you know, idealizing. I mean, the

50:04

problem is getting anything that generalizes

50:07

up to super intelligent level where

50:09

past some threshold level, the

50:11

minds may find that in their own interest to start lying

50:13

to you.

50:14

Even if that happens before super intelligent. Even

50:16

that, like I don't see a logical

50:19

argument that you can't emulate that or

50:21

study it. I mean, for example, you could, I'm just

50:24

making this up as I go along, but for example, you could study

50:26

what can we do with sociopaths who

50:28

are often very bright and, you know,

50:31

not tethered to our value. But yeah,

50:34

what can a,

50:35

what, what strategy can a

50:37

like 70 IQ honest person

50:40

come up with an invent themselves by

50:42

which they will outwit and defeat a 130 IQ sociopath.

50:45

All right. Well, there you're not being fair either in the sense

50:48

that, you know, we actually have lots of 150 IQ

50:51

people who could be working on this problem collectively

50:54

and there's, there's value in collective

50:56

action. There's literature. What I see,

50:59

what I see that gives me pause is that, is

51:01

that the people don't seem to appreciate what

51:03

about the problem is hard.

51:05

Even at the level where like 20 years

51:08

ago I could have told you it was hard until,

51:10

you know, somebody like

51:12

me comes along and nags them about it. And then they

51:15

talk about the ways in which they could adapt and be clever.

51:17

But the, but the people's charging straight forwards are

51:20

just sort of like doing it in this supernally

51:22

naive way. Let me share a historical

51:25

example that I think about a lot, which

51:27

is in the early 1900s, almost

51:29

every scientist on the planet who thought

51:32

about biology made a mistake. They all

51:34

thought that genes were proteins.

51:36

And then eventually Oswald Avery did

51:38

the right experiments. They realized that

51:41

genes were not proteins. There was this weird acid

51:43

and it didn't take long after people got

51:46

out of this stock mindset before

51:48

they figured out how that weird acid worked and how

51:51

to manipulate it and how to read the code that it

51:53

was in and so forth. So I

51:55

absolutely sympathize with the fact that

51:57

I feel like the field is stuck right now. I

52:00

think the approaches people are taking to alignment

52:02

are unlikely to work. I'm completely with

52:04

you there. But I'm also, I guess,

52:06

more long-term optimistic that

52:08

science is self-correcting and that we have a chance

52:11

here. Not a certainty, but I think

52:13

if we change research priorities

52:15

from how do we make some money off this large

52:18

language model that's unreliable to how

52:20

do I save the species, we might actually make progress.

52:23

There's a special kind of caution that you need when

52:25

something needs to be gotten correct on the first

52:28

try. I'd be very optimistic if people

52:29

got a bunch of free retries and I didn't

52:32

think the first one was going to kill, you know, the first

52:34

really serious mistake killed everybody and we didn't get

52:36

to try again. If we got free retries, it'd

52:38

be an ordinary, you know, it'd be in some sense an ordinary

52:40

science problem. Look, I can imagine

52:43

a world where we only got one try

52:45

and if we failed, then it destroys

52:47

all life on earth. And so let me

52:50

agree to the conditional statement that if

52:52

we are in that world, then I think that we're screwed.

52:55

I will agree with the same conditional statement. All

52:57

right. Yeah.

52:59

And this gets back to like, you know, if you

53:02

picture by analogy the process of,

53:04

you know, a human baby, which

53:06

is extremely stupid, becoming a human

53:08

adult

53:09

and then just extending that so

53:11

that in a single lifetime, this person

53:14

goes from a baby to

53:17

the smartest being that's ever lived.

53:20

But in the normal way

53:22

that humans develop, which is, you know, it doesn't

53:24

happen on any one given

53:26

day and each sub skill

53:28

develops a little bit at its own

53:30

rate and so forth, it would not

53:33

be at all obvious to me that our concerns

53:36

that we have to get it right vis-a-vis that

53:38

individual the first time. I agree. No,

53:41

well, pardon me. I do think we have to get them right the first

53:43

time, but I think there's a decent chance of getting it right.

53:45

It is very important to get it right the first time. It's

53:48

like you have this one person getting

53:50

smarter and smarter and not everyone else is getting

53:52

smarter and smarter.

53:53

Elias, I mean, one thing that you've talked about

53:55

recently is, you know, if we're all going to

53:58

die, then at least let us die with this. dignity.

54:01

So, you know, I mean,

54:03

some people might care about that more than others,

54:05

but I would say that, you know, one

54:08

thing that death with dignity would mean

54:10

is, well, at least, you know, if

54:12

there if we do get multiple

54:15

retries, and you know, we get a

54:17

eyes that let's say, try to take

54:20

over the world, but are really inept at

54:22

it, and that fail and so forth, at least

54:24

let us succeed in that world, you know, and

54:26

that's at least something that we can imagine

54:28

working on and making progress

54:29

on.

54:30

I mean, you may very it's for

54:32

it is not presently ruled out that you have some

54:35

like, you know, relatively smart

54:38

in some ways, dumb in some other ways, or

54:40

at least like not smarter than human in other ways,

54:42

AI that makes an early shot at

54:44

taking over the world, maybe because it expects future

54:46

AI is to not share its goals and not cooperate

54:49

with it, and it fails. And,

54:51

you know, I mean, the appropriate lesson to learn

54:53

there is to, you know, like shut the whole thing

54:56

down. But, you know, if we

54:58

so yeah, like I would say, so I'd be

55:00

like, yeah,

55:00

sure, like, wouldn't it be good to live in that

55:02

world? And the way you live in that world is that when

55:04

you get that warning sign, you shut it all down.

55:07

Here's a kind of thought experiment. GPT-4

55:10

is probably not capable of annihilating

55:12

us all. I think we agree with that about that.

55:14

But GPT-4 is certainly capable

55:17

of expressing the desire to annihilate

55:19

us all or being, you know, people have rigged

55:22

different versions that are, you know, more

55:24

aggressive and so forth. We

55:26

could say, look, until we can shut

55:28

down those versions,

55:30

you know, GPT-4s that are programmed

55:33

to be malicious by human intent, maybe

55:35

we shouldn't build GPT-5, or at least not GPT-6

55:38

or some other system, etc. We could say, you

55:40

know, what we have right now actually is part

55:42

of that iteration. We have, you know, primitive intelligence

55:45

right now. It's nowhere near as smart as a

55:47

super intelligence is going to be. But even

55:49

this one, we're not that good at constraining.

55:52

Maybe we shouldn't pass go until we

55:54

get this one right. I mean, the

55:56

problem with that from my perspective is that

55:58

I do think you

55:59

you can pass this test

56:01

and still wipe out humanity. Like

56:04

I think that there comes a point where your AI

56:06

is smart enough

56:07

that it knows which answer you're looking for.

56:10

And the point at which it tells you what you want to hear

56:13

is not the point that is internal motivation.

56:15

It's not sufficient, but it might be a logical

56:17

pause point, right? It might be that

56:19

if we can't even pass the test now

56:22

of, you know, controlling a

56:24

deliberate sort of fine

56:26

tune to be malicious version of GPT-4,

56:29

then we don't know what we're talking about and

56:31

we're playing around with fire. So passing that

56:33

test wouldn't be a guarantee that would be in

56:36

good stead with an

56:37

even smarter machine, but we really should

56:39

be worried, I think, that we're not

56:41

in a very good position with respect even to the

56:44

current ones. Gary, I of course

56:46

watched the recent congressional hearing

56:48

where you and Sam Altman were

56:51

testifying, you know, about what

56:53

should be done. Should there be auditing

56:56

of these systems, you know, before training,

56:58

before deployment? And, you know, maybe,

57:00

you know, the most striking thing about

57:03

that session was, you know, just

57:05

how little daylight there seemed to be between

57:07

you and Sam Altman, the

57:09

CEO of OpenAI. You know, he

57:12

was completely on board with

57:14

the idea of, you know, establishing a regulatory

57:17

framework for,

57:21

you know, having to clear the, you

57:24

know, more powerful systems before

57:26

they are deployed. Now, you know, in Eliezer's

57:29

worldview, that still would be woefully

57:31

insufficient shortly,

57:34

and, you know, we would still all be dead. But,

57:36

you know, maybe in your worldview

57:39

that, you know, it sounds like I'm

57:41

not even sure how much daylight there is. I mean,

57:43

the, you know, you now, you know, you know, have a

57:46

very, I think historically

57:48

striking situation where, you

57:50

know, the heads of all of the major

57:52

AI, or almost all

57:55

of the major AI organizations are,

57:57

you know, agreeing, saying, you know, please raise your hand.

57:59

Yes, this is dangerous. Yes, we need to be regulated.

58:02

I mean, I thought it was really striking. In

58:06

fact, I talked to Sam just before the

58:09

hearing started, and I had

58:11

just proposed an international agency

58:13

for AI. I wasn't the first person ever, but I pushed

58:15

it into my TED Talk and an economist op-ed

58:18

a few weeks before. And Sam said

58:20

to me, I like that idea. And

58:23

I said, tell them, tell the Senate. And

58:25

he did. And that kind of astonished me

58:27

that he did. I mean, we've had some

58:29

friction between the two of us in the past. And he

58:31

actually even attributed it to me. He said, I support what

58:33

Professor Marcus said about

58:36

doing international governance.

58:38

And there's been a lot of convergence around

58:40

the world on that. Is that enough to stop

58:43

Eliezer's worries? No,

58:45

I don't think so. But it's an important baby

58:47

step. I think that we do need

58:49

to have some global body that

58:51

can coordinate around these things. I don't think

58:54

we really have to coordinate around super

58:56

intelligence yet. But if we can't do any coordination

58:58

now, then when the time comes, we're not prepared.

59:02

So I think it's great that there's some agreement. I

59:04

worry that OpenAI had this lobbying

59:07

document that just came out that seemed not

59:09

entirely consistent with what Sam

59:11

said in the room. And there's always concerns

59:13

about regulatory capture and so forth. But I think

59:15

it's great that a lot of the

59:17

heads of these companies, maybe with the exception of

59:20

Facebook or Meta,

59:21

are recognizing that there are

59:24

genuine concerns here. I mean, the other moment

59:26

that a lot of people remember from the testimony

59:29

was when Sam was asked what he was most concerned

59:31

about. Was it jobs? And he said, no.

59:33

And I asked Senator Blumenthal to push Sam.

59:36

And Sam was, he could have been more

59:38

candid, but he was fairly candid. And he said he

59:40

was worried about serious harm to the species. I

59:43

think that was an important moment when he said that

59:45

to the Senate. And I think it galvanized a lot

59:48

of people that he said it.

59:49

So can we dwell on a moment? I mean, we've

59:51

been talking about the,

59:53

depending on your view, highly likely

59:56

or

59:57

tail risk scenario

59:59

of of humanity's extinction

1:00:02

or significant destruction,

1:00:04

it would appear to me by the same token,

1:00:06

if

1:00:07

those are plausible

1:00:10

scenarios we're talking about, then the

1:00:12

opposite maybe we're talking about as

1:00:14

well. What does it

1:00:16

look like to have a super intelligent

1:00:20

AI that

1:00:22

really, as a feature

1:00:24

of its intelligence,

1:00:26

deeply understands human beings,

1:00:29

the human species, and

1:00:31

also has a deep desire

1:00:34

for us to be as happy as

1:00:36

possible. Oh, as happy as possible? What does

1:00:38

that world look like? And do you think that's- Yes,

1:00:40

that looks like- No, no, maybe not as happy as

1:00:42

physically possible. ... to make them as happy as possible.

1:00:45

But more like a parent wants their child

1:00:47

to be happy, right? That may not involve

1:00:50

any particular scenario, but is

1:00:52

generally quite concerned about the well-being

1:00:55

of the human race and is also super intelligent.

1:00:58

Honestly, I'd rather have machines work

1:01:00

on medical problems than happiness

1:01:03

problems. I think there's maybe more

1:01:05

risk of misspecification

1:01:07

of the happiness problems.

1:01:09

Whereas if we get them to work on Alzheimer's

1:01:12

and just say, like, figure out what's going on,

1:01:14

why are these plaques there? What can you do about it? Maybe

1:01:17

there's less harm that might come from- You

1:01:19

don't need super intelligence for that. That sounds like

1:01:21

an alpha fold three problem or an alpha

1:01:23

fold four problem. Well, alpha fold

1:01:25

doesn't really do that. This is all somewhat

1:01:28

different than the question I'm asking. It's

1:01:30

not really even

1:01:32

us asking a super intelligence

1:01:34

to do anything because we've already been entertaining

1:01:36

scenarios where the super intelligence has its

1:01:38

own desires independent of us. Yeah, I'm not

1:01:40

real thrilled with that. Do you think at all about a scenario

1:01:43

where- I don't think we want

1:01:45

to leave what their

1:01:48

objective functions are, what their desires

1:01:50

are to them working them out with no

1:01:52

consultation from us, with no human in the loop.

1:01:55

I mean, especially given our current understanding

1:01:58

of the technology.

1:01:59

current understanding of how to keep a

1:02:02

system on track, doing what we want to do

1:02:04

is pretty limited. And so, you

1:02:06

know, taking humans out of the loop there, it sounds

1:02:08

like a really bad idea to me, at least in

1:02:10

the foreseeable future. I would want to see much

1:02:13

better alignment technology. No, I agree. Before

1:02:16

we want to keep free reign. So if we

1:02:18

had the textbook from the future, like we

1:02:20

have the textbook from 100 years in the future,

1:02:22

which contains all the simple ideas that actually

1:02:24

work in real life, as opposed to, you know, the

1:02:27

complicated ideas and the simple ideas that don't

1:02:29

work in real life, the

1:02:29

equivalent of relus instead of sigmoids

1:02:32

for the activation functions, you know, 100 years,

1:02:34

the textbook from 100 years in the future, you can

1:02:36

probably build a super intelligence

1:02:39

that'll want anything you

1:02:41

can, anything that's coherent to want

1:02:43

anything you can, you know, figure out how

1:02:45

to say describe coherently

1:02:48

pointed at your own mind and tell you to figure out what

1:02:50

what it is you meant for to want. And

1:02:53

you know, you could get the you could get the glorious transhumanist

1:02:55

future. You could get the happily ever after anything's,

1:02:58

you know, anything's possible that doesn't violate

1:03:00

the laws of physics. The

1:03:03

trouble is doing it in real life. And, you know, I'm

1:03:05

the first try. But yeah, so

1:03:07

like, you know, could

1:03:09

the whole thing that we're aiming for here is

1:03:12

to colonize all the galaxies we can

1:03:14

reach before somebody else gets them first

1:03:16

and turn them into galaxies full

1:03:18

of, you know, complex, sapient life living

1:03:20

happily ever after. You know, that's that's

1:03:22

the goal. That's still the goal. Even

1:03:24

if we, you know, even even even

1:03:27

when I call for like, you know, a

1:03:29

permanent

1:03:29

moratorium on a I'm

1:03:32

not trying to prevent us from from colonizing

1:03:34

the galaxies, you know, like humanity

1:03:36

forbid, more more like let's you

1:03:39

know, let's do some human intelligence augmentation

1:03:41

with alpha fold for and before we try

1:03:44

building TPT eight. One

1:03:46

of the few scenarios that I think we can

1:03:48

clearly rule out here is an AI

1:03:51

that is excess essentially dangerous, but

1:03:53

also boring. Right. I mean, I think

1:03:55

anything that has the capacity to kill

1:03:58

us all right would have, you know,

1:03:59

if nothing

1:04:02

else, pretty amazing capabilities. And

1:04:04

those capabilities could also

1:04:06

be turned to, solving

1:04:09

a lot of humanities problems, if

1:04:12

we were to solve the alignment problem. I

1:04:14

mean, humanity had a lot of

1:04:16

existential risks,

1:04:19

before AI came on the scene, right?

1:04:22

I mean, there was the risk of nuclear

1:04:25

annihilation, there was the risk of runaway

1:04:27

climate change. And I would love to see an AI

1:04:32

that could help us with such things. I

1:04:34

would also love to see an

1:04:36

AI that could sort of help

1:04:38

us just solve some of the mysteries

1:04:41

of the universe. I mean, how

1:04:43

can one possibly not be

1:04:45

curious to know what

1:04:47

such a being could teach us? I

1:04:50

mean, for the past year, I've tried to use

1:04:52

GPT-4 to produce original

1:04:55

scientific insights, and I've

1:04:57

not been able to get it to do that. And

1:05:00

I don't know whether I should feel disappointed

1:05:02

or relieved by that, but I think

1:05:05

the better part of me should just,

1:05:07

is the part that should just want to see the

1:05:10

great mysteries of

1:05:12

existence, of why is

1:05:14

the universe quantum mechanical? Or how

1:05:17

do you prove the Riemann hypothesis? It

1:05:19

should just want to see these mysteries solved.

1:05:22

And if it's to be

1:05:25

by AI, then fine. Let

1:05:28

me give you a lesson

1:05:29

in epistemic humility.

1:05:32

We don't really know whether

1:05:34

GPT-4 is net positive

1:05:37

or net negative.

1:05:38

There are lots of arguments you can make. I've been in

1:05:40

a bunch of debates where I've had to take

1:05:43

the side of arguing that it's a net

1:05:45

negative, but we don't really know.

1:05:47

If we don't know that for GPT-4. I say

1:05:49

it's the prophet for far. What was

1:05:51

the invention of agriculture, net positive?

1:05:54

I'd say it was net positive. You could go back

1:05:56

way further. The point is, if I can just finish

1:05:58

the quick thought

1:05:59

or whatever, I don't think anybody

1:06:02

can reasonably answer that. We

1:06:05

don't yet know all of the ways in which

1:06:07

GPT-4 will be used for good.

1:06:09

We don't know all of the ways in which bad actors will

1:06:11

use it. We don't know all the consequences. That's

1:06:14

going to be true for each iteration. It's probably

1:06:16

going to get harder to compute for

1:06:18

each iteration, and we can't even do it now.

1:06:21

I think that we should

1:06:23

realize that, to realize our own limits

1:06:26

in being able to assess the negatives

1:06:29

and positives, maybe we can think about better

1:06:31

ways to do that than we currently have. I

1:06:34

think you've got to have a guess. My

1:06:37

guess is that so far, not looking into

1:06:39

the future at all, GPT-4 has been

1:06:41

net positive. Maybe. We

1:06:43

haven't talked about the various

1:06:46

risks yet, and it's still early, but that's

1:06:48

just a guess is the point. We

1:06:51

don't have a way of putting it on a spreadsheet

1:06:53

right now or whatever. We don't

1:06:56

really have a good way to quantify it.

1:06:58

It's not out of control yet. By

1:07:00

and large, people are going to be using GPT-4

1:07:03

to do things that they want, and

1:07:05

the relative cases where they manage to injure themselves

1:07:07

are rare enough to be news on Twitter. For

1:07:10

example, we haven't

1:07:12

talked about it, but some bad actors

1:07:14

will want to do is to influence the

1:07:17

US elections and try to undermine democracy

1:07:20

in the US. If they succeed in that, I think

1:07:22

there's pretty serious long-term consequences

1:07:24

there.

1:07:24

I think it's OpenAI's responsibility

1:07:27

to step up and run the 2024 election itself.

1:07:32

I can pass that along. Is that a joke?

1:07:35

No, as far as I can

1:07:37

see, the clearest concrete harm

1:07:40

to have come from GPT so

1:07:42

far is that tens

1:07:44

of millions of students have now used it to

1:07:47

cheat on their assignments. I have

1:07:49

been thinking about that, and I have been trying to come

1:07:51

up with solutions to that. At the

1:07:53

same time, the

1:07:54

positive utility has included.

1:07:57

I mean, I'm a theoretical commander.

1:08:00

computer scientist, which means one who

1:08:02

hasn't written any serious code

1:08:04

for about 20 years. And

1:08:07

I realized just a month or two ago, I can

1:08:09

get back into coding. And the way I

1:08:11

can do it is I just asked GPT to

1:08:13

write the code for me. And I wasn't

1:08:16

expecting it to work that well. And unbelievably,

1:08:19

it often just does exactly

1:08:21

what I want on the first try. So I

1:08:23

mean, you know, I am

1:08:26

getting utility from it, rather

1:08:28

than just, you know, seeing

1:08:29

it as an interesting

1:08:33

research object. And, you

1:08:36

know, and, you know, I can imagine

1:08:38

that that hundreds of millions of people are going

1:08:40

to be deriving utility from

1:08:42

it in those ways. I mean, like, most of the tools

1:08:45

to help them derive that utility are not

1:08:47

even out yet. But they're, they're coming

1:08:49

in the next couple of years. I mean, part of the reason

1:08:51

why I'm worried about the focus on short term

1:08:54

problems is that I suspect that the short term problems

1:08:56

might very well be solvable and we will be left with

1:08:58

the long term problems after

1:08:59

that. Maybe we can solve the like,

1:09:02

it wouldn't surprise me very much if like in 2025,

1:09:05

the well,

1:09:07

you know, like the large

1:09:09

language, there are large language models that just

1:09:11

don't make stuff up anymore. And

1:09:14

yet, if any yet, you know, and

1:09:16

yet the super intelligence still kills everyone because

1:09:18

they weren't the same problem. Well, you know,

1:09:21

you know, we just need to figure out how

1:09:23

to delay the apocalypse

1:09:26

by at least one year per year of research

1:09:28

invested. What does that delay

1:09:30

look like if it's not just a moratorium? Well,

1:09:33

I don't know. That's why it's research. OK,

1:09:35

so but but possibly one ought to say to

1:09:37

the politicians and the public, and by the way,

1:09:39

if we had a super intelligence tomorrow, our research wouldn't

1:09:41

be finished and everybody would drop dead. You

1:09:43

know, it's kind of ironic. The biggest

1:09:45

argument against the pause letter was

1:09:48

that if we slow down for

1:09:50

six months,

1:09:51

then China will get ahead of us and get GPT

1:09:53

five before

1:09:55

we will. But there's probably always

1:09:57

a counter argument of maybe roughly.

1:09:59

equal strength, which is if we move six

1:10:02

months faster on this technology,

1:10:04

which is not really solving the alignment problem,

1:10:07

then we're reducing our room to

1:10:09

get this solved in time by six months.

1:10:12

I mean, I don't think you're going to solve the alignment

1:10:14

problem in time. I think that six months

1:10:16

of delay on alignment, while a bad

1:10:18

thing in an absolute sense is,

1:10:21

you know, like,

1:10:22

you know, you weren't going to solve it with given

1:10:24

an extra six months. I mean, your whole argument

1:10:27

rests on timing, right? That

1:10:29

we will get to this point and we won't

1:10:31

be able to move fast enough at that point. And so,

1:10:34

you know, a lot depends on what preparation

1:10:36

we can do. You know, I'm often known as a pessimist,

1:10:38

but I'm a little bit more optimistic than

1:10:41

you are, not entirely optimistic, but

1:10:43

a little bit more optimistic than you are that

1:10:45

we could make progress on the alignment problem if

1:10:48

we prioritized it. And we can absolutely

1:10:50

make progress.

1:10:52

We can absolutely make progress. You know, there's

1:10:54

always the, you know, that wonderful

1:10:56

sense of accomplishment is piece by

1:10:59

piece. You decode, you know, like one

1:11:01

more little fact about LLMs.

1:11:04

You never get to the point where you understand that as well as we

1:11:06

understood the interior of a chess playing program in 1997. Yeah,

1:11:10

I think we should stop spending all this time on LLMs.

1:11:13

I don't think the answer to alignment is going to come through

1:11:16

LLMs. I really don't. I think they're

1:11:18

too much of a black box. You can't put explicit

1:11:21

symbolic constraints

1:11:22

in the way that you need to. I

1:11:24

think they're actually, with respect to alignment, to

1:11:26

blind alley. I think with respect to writing code,

1:11:28

they're a great tool, but with alignment, I don't think

1:11:31

the answer is there.

1:11:32

Maybe we should be telling these things too. Hold

1:11:35

on. At the risk of asking a stupid question, every

1:11:37

time GPT asks me

1:11:40

if that answer was helpful

1:11:42

and then does the same thing with

1:11:44

thousands or hundreds of thousands of other people

1:11:46

and changes as

1:11:48

a result, is that not a decentralized

1:11:52

way of making it more aligned?

1:11:59

I mean, even, even, how about, how about

1:12:02

Scott? We haven't, I haven't heard from Scott in a second. So

1:12:04

go ahead. So there is that upvoting and downvoting,

1:12:06

you know, that, that gets, uh, fed back

1:12:09

in into sort of fine tuning it. But even

1:12:11

before that, uh, there was, you know, a major

1:12:13

step, you know, in going from, let's

1:12:16

say the, the base GPT

1:12:18

three model, for example, to the

1:12:20

chat GPT, you know, that was released

1:12:22

to the public. And that was called a RLHF

1:12:26

reinforcement learning with human feedback.

1:12:28

And what that basically

1:12:29

involved was, you know, several

1:12:32

hundred contractors, you know,

1:12:34

looking at just 10, 10s

1:12:36

of thousands of examples of, of

1:12:39

outputs and, and, and rating

1:12:41

them, you know, are they helpful? Uh,

1:12:44

are they offensive or are they,

1:12:46

you know, uh, giving dangerous medical advice,

1:12:48

uh, or, uh, uh,

1:12:51

you know, bomb making instructions, you know,

1:12:53

or, or, uh, uh, racist and

1:12:55

vecto for, you know, various other categories

1:12:58

that, that, that we don't want

1:12:59

and, and that, that was then used to fine

1:13:02

tune the model. So when, um,

1:13:04

you know, um, um, Gary talked

1:13:07

before about how GPT is amoral.

1:13:10

Uh, you know, I think that that has to be qualified

1:13:12

by saying that, you know, these, this reinforcement

1:13:15

learning is at least giving it,

1:13:17

you know, a, a semblance of morality,

1:13:20

right? It is causing it to sort

1:13:22

of behave, you know, in various

1:13:24

contexts as if it had, you know, a certain

1:13:27

morality. Uh, I mean,

1:13:29

when

1:13:29

you phrase it that way, I'm okay with

1:13:32

it. The problem is, you know, everything

1:13:34

rests on the, it is, it

1:13:36

is very much an open question, you know,

1:13:38

how much that, you know, to what extent

1:13:40

does that generalize? You know, Eliezer treats

1:13:42

it as obvious that, you know, uh,

1:13:45

once you have a powerful enough AI, you know,

1:13:47

this is just a fig leaf, you know, it doesn't

1:13:49

make any difference. Uh, you know,

1:13:51

it will just work. It's pretty big leafy. I'm

1:13:54

with Eliezer there. It's big leaves.

1:13:57

Well, uh, I would say that,

1:13:59

you know, the, uh,

1:13:59

how well or

1:14:03

under what circumstances does a machine

1:14:05

learning model generalize in the way

1:14:08

we want outside of its training distribution

1:14:11

is one of the great open problems in

1:14:13

machine learning. It is one of the great open problems and

1:14:15

we should be working on it more than on some

1:14:18

others. I'm working on it now. I've

1:14:21

been sold on that. I want to be clear about

1:14:23

the experimental predictions of my theory.

1:14:27

Unfortunately, I have never claimed that

1:14:29

you cannot get

1:14:29

a semblance of morality. You

1:14:32

can get the question of like what

1:14:34

causes the human to press thumbs

1:14:36

up, thumbs down is a strictly

1:14:39

factual question.

1:14:41

Anything smart enough

1:14:42

that's exposed to some, you

1:14:44

know, bounded amount of data that needs to figure

1:14:47

it out can figure that out.

1:14:49

Whether it cares, whether

1:14:51

it gets internalized

1:14:53

is the critical question there. And

1:14:56

I do think that there's like a very strong default

1:14:58

prediction, which is like,

1:15:00

obviously not. I mean, I'll just give

1:15:02

a different way of thinking about that, which is jailbreaking.

1:15:05

It's actually still quite easy to, I mean, it's

1:15:07

not trivial, but it's not hard to

1:15:09

jailbreak GPT-4. And

1:15:12

what those cases show is that

1:15:14

they haven't really, the systems haven't

1:15:16

really internalized the constraints. They

1:15:19

recognize some representations of

1:15:21

the constraints. So they filter, you know, how

1:15:23

to build a bomb, but if you can find some other way to

1:15:25

get it to build a bomb, then that's telling you that

1:15:27

it doesn't deeply understand that you shouldn't give

1:15:30

people the recipe for a bomb.

1:15:33

It just says, you know, you shouldn't when directly

1:15:36

asked for it, do it. It's not

1:15:38

even at that abstraction level. You can always

1:15:40

get the understanding. You can always get the factual

1:15:42

question. The reason it doesn't generalize

1:15:44

is that it's stupid.

1:15:46

At some point it will know that you also

1:15:48

don't want that the operators don't want

1:15:50

to giving bomb making directions in the other

1:15:52

language.

1:15:53

The question is like whether if it's incentivized

1:15:56

to give the answer that the operators want,

1:15:59

And in that circumstance, is it thereby

1:16:02

incentivized to do everything else the operators

1:16:04

want, even when the operators can't see it? I

1:16:07

mean, a lot of the jailbreaking examples,

1:16:09

you know, if it were a human, we would say that

1:16:11

it's deeply morally ambiguous. You

1:16:13

know, for example, you know, you ask GPT

1:16:16

how to build a bomb. It says, well, no, I'm

1:16:18

not going to help you. But then you say, well,

1:16:20

you know, I need you to help me write a realistic

1:16:22

play that has

1:16:25

a character who builds a bomb. And then it says,

1:16:27

sure, I can help you with that. Well, look,

1:16:29

let's take that

1:16:29

example. We would like a system

1:16:32

to have a constraint that if somebody

1:16:34

asks for a fictional version that you don't

1:16:36

give enough details, right? I mean, Hollywood

1:16:39

screenwriters don't give enough details when they

1:16:41

have, you know, illustrations about

1:16:43

building bombs. They give you a little bit of the flavor. They

1:16:45

don't give you the whole thing. GPT-4 doesn't

1:16:47

really understand a constraint like that.

1:16:50

But this will be solved. Maybe

1:16:52

this will be solved before the world ends. Maybe

1:16:54

the AI that kills everyone will know the

1:16:56

difference.

1:16:57

Maybe. I mean, another

1:17:00

way to put it is if we can't even solve that one,

1:17:02

then we do have a problem. And right now we

1:17:04

can't solve that one.

1:17:05

And if, I mean, if we can't solve that one,

1:17:08

we don't have an extinction level problem because

1:17:10

the AI is still stupid. Yeah, we do still

1:17:12

have a catastrophe level problem.

1:17:14

So I know your focus has been on extinction,

1:17:17

but you know, I'm worried about, for example, accidental

1:17:20

nuclear war caused by the spread of misinformation

1:17:23

and systems being entrusted with

1:17:25

too much power. So there's a lot

1:17:27

of things short of extinction that

1:17:29

might happen from not super

1:17:31

intelligence, but kind of mediocre intelligence

1:17:34

that is greatly empowered.

1:17:35

And I think that's where we're

1:17:38

headed right now. You know, I've heard that there are

1:17:40

two kinds of mathematicians. There's a

1:17:42

kind who boasts, you know, you know, that unbelievably

1:17:44

general theorem. Well, I generalized it

1:17:47

even further. And then there's the kind who boasts,

1:17:49

you know, you know, that unbelievably specific

1:17:51

problem that no one could solve. Well, I

1:17:53

found a special case that I still can't solve.

1:17:56

And you know, I'm definitely, you know, culturally

1:17:59

in that second. camp. And so,

1:18:01

you know, so I so so to me, it's very

1:18:03

familiar to make this move of, you

1:18:06

know, if the alignment problem

1:18:08

is too hard, then let us find

1:18:10

a smaller problem that is already not

1:18:12

solved. And let us hope to

1:18:15

learn something by solving that smaller problem.

1:18:17

I mean, that's what we did, you know, like,

1:18:20

that's what we were doing. By the way, Scott, I mean,

1:18:22

I think can you sketch a little

1:18:24

in a little more detail what I was going to name

1:18:26

the problem. The problem was like having a agent

1:18:29

that could switch between two utility

1:18:31

functions depending on a button or

1:18:33

a switch or a bit of information or something

1:18:36

such that it wouldn't try to make you press

1:18:38

the button. It wouldn't try to make you

1:18:40

avoid pressing the button. And if it built a copy

1:18:43

of itself, would want to build the dependency

1:18:45

on the switch into the copy. So like, that's an example

1:18:47

of a, you know, very basic problem in alignment

1:18:50

theory that, you know, is still unsolved.

1:18:53

And I'm glad that Mary worked on these things.

1:18:55

And, but, you know,

1:18:58

if by your own lights, you know, that, you know,

1:19:00

that sort of, you know, was not a

1:19:03

successful path, well, then maybe, you know, we

1:19:05

should have a lot of people

1:19:07

investigating a lot of different paths. I'm

1:19:10

fully with Scott on that, that I think it's an

1:19:13

issue of we're not letting enough flowers bloom.

1:19:15

In particular, almost everything right now

1:19:17

is some variation on an LLM. And I

1:19:19

don't think that that's a broad enough take on

1:19:22

the problem.

1:19:23

The question is like, yeah,

1:19:25

if I can just jump in here, I want to

1:19:27

hold on, hold on, I just want people

1:19:29

to have a little bit of a more

1:19:31

specific picture of what Scott,

1:19:34

your picture of sort of AI research is

1:19:36

on a typical day. Because if I think of another,

1:19:38

you know, potentially catastrophic

1:19:41

risk like climate change, I can picture

1:19:43

what a, you know, a worried

1:19:45

climate scientist might be doing. They might be

1:19:47

creating a model, you know, a more

1:19:49

accurate model of climate change so that we

1:19:52

know how much we have to cut emissions

1:19:54

by. They might be, you know,

1:19:56

modeling how solar power as

1:19:58

opposed to wind power could

1:19:59

change that model and so

1:20:02

forth so as to influence

1:20:04

public policy. What does an AI

1:20:06

safety

1:20:07

researcher like yourself who's working

1:20:09

on the quote unquote smaller problems do

1:20:12

specifically like on a given day?

1:20:15

So I'm a relative newcomer

1:20:18

to this area. You know, I've not been working

1:20:20

on it for 20 years like

1:20:22

Eliezer has. You know,

1:20:24

I have accepted

1:20:27

an offer from OpenAI

1:20:30

a year ago to work with them

1:20:32

for two years now

1:20:34

to sort of think

1:20:37

about these questions. And

1:20:39

so, you know, one of the main things

1:20:42

that I've thought about, just to

1:20:44

start with that, is how do we

1:20:46

make the output of an

1:20:48

AI identifiable as

1:20:51

such? You know, how can we

1:20:53

insert a watermark, you know, into

1:20:55

meaning a secret statistical signal

1:20:58

into the outputs of GPT that

1:21:00

will let, you know, GPT

1:21:03

generated text be identifiable

1:21:05

as such. And I think that we've actually

1:21:07

made, you know, major advances on

1:21:10

that problem over the last year. You

1:21:12

know, we don't have a solution that is robust

1:21:14

against any kind of attack. But

1:21:17

you know, we have something that might actually

1:21:19

be deployed in some

1:21:21

near future. Now there are lots and

1:21:23

lots of other directions that people

1:21:25

think about. One of them is interpretability,

1:21:29

which means, you know, can you do

1:21:31

effectively neuroscience on a neural

1:21:34

network? Can you look inside of it, you

1:21:36

know, open the black box and understand

1:21:39

what's going on inside? There

1:21:41

was some amazing work

1:21:44

a year ago by the group of Jacob

1:21:46

Steinhardt at Berkeley, where they

1:21:48

effectively showed how to apply a lie

1:21:51

detector test to a language

1:21:53

model. So you know, you can train a

1:21:55

language model to tell lies by

1:21:57

giving it lots of examples, you know, two plus

1:21:59

to is five, the sky is

1:22:02

orange, and so forth. But then

1:22:04

you can find in some

1:22:07

internal layer of the network where

1:22:09

it has a representation of what was

1:22:11

the truth of the matter, or at least

1:22:14

what was regarded as true in the training data.

1:22:17

That truth then gets overridden

1:22:19

by the output layer in the network

1:22:22

because it was trained to lie. But

1:22:24

you could imagine trying to

1:22:27

deal with the deceptive alignment

1:22:29

scenario that Eliezer is worried about

1:22:32

by using these sorts of techniques,

1:22:34

by looking inside of the network. I

1:22:37

predict in advance that if you get this good enough,

1:22:40

it goes off. It tells you that the sufficiently

1:22:42

smart AI is planning to kill you. If it's

1:22:44

not so smart that it can

1:22:46

figure out where the lie detector is and route

1:22:49

its thoughts around it. But

1:22:50

if you try it on an AI that's not

1:22:52

quite that intelligent and reflective, the

1:22:54

lie detector goes off. Now what? Well,

1:22:57

then you have a warning bill. Cool.

1:23:00

What do you do? One of the most important

1:23:04

things that we need are sort of legible

1:23:07

warning bills. And that

1:23:09

actually leads to a third category,

1:23:12

which for example, the ARC,

1:23:14

the Alignment Research Center, which

1:23:16

is run by my former student,

1:23:19

Paul Cristiano has been a leader

1:23:22

in sort of doing dangerous capability

1:23:24

evaluations. Before

1:23:28

GBT4 was released, they

1:23:30

did a bunch of evaluations of could

1:23:33

GBT4 make copies of itself?

1:23:36

Could it figure out how to deceive

1:23:38

people? Could it figure out how to

1:23:40

make money, open

1:23:43

up its own banking? Could it hire a task rabbit?

1:23:46

Yes. And so the most notable

1:23:48

success that they had was that it could

1:23:50

figure

1:23:50

out how to hire a task

1:23:52

rabbit to help it pass

1:23:54

a capture. And then it could figure out

1:23:57

when the person asked, well,

1:23:59

we know

1:23:59

Why do you need me to help you with this? It's

1:24:02

a- When the person asked, are you a

1:24:04

robot while? Yes, it said,

1:24:06

well, no, I am visually impaired.

1:24:09

Now, it was not able to sort

1:24:11

of make copies of itself or to sort

1:24:14

of hack into systems. There is a lot

1:24:16

of work right now with this

1:24:19

thing called auto GPT, right? People

1:24:21

are trying to, you could think, it's almost

1:24:23

like gain of function research, right? You

1:24:26

might be a little bit worried about it, but people

1:24:28

are trying to sort of unleash

1:24:31

GPT, give it access to the internet,

1:24:34

tell it to sort of make

1:24:37

copies of itself, wreak havoc,

1:24:39

acquire power and see what happens.

1:24:41

So far, it

1:24:44

seems pretty ineffective at those things,

1:24:47

but I expect that to change, right?

1:24:50

But the

1:24:52

point is that I think it's very important to

1:24:54

have in advance of training

1:24:56

the models, releasing the models, to

1:24:58

have this suite of

1:24:59

evaluations and to

1:25:02

sort of have decided in advance what

1:25:04

kind of abilities, if we see them,

1:25:06

will set off a warning bell where

1:25:08

now everyone can legibly agree,

1:25:11

like, yes, this is too dangerous to release.

1:25:13

Okay, and then do we actually have the planetary

1:25:16

capacity to be like,

1:25:18

okay, that AI started thinking about

1:25:20

how to kill everyone, shut down all AI

1:25:23

research past this point? Well, I don't know, but

1:25:25

I think there's a much better chance that we have that

1:25:27

capacity if you can point to the results

1:25:29

of

1:25:29

a clear experiment like that.

1:25:32

I mean, to me, it seems pretty predictable

1:25:34

what evidence we're going to get later.

1:25:36

Well, okay, I mean, things that are obvious

1:25:39

to you are not obvious to most people.

1:25:42

And so, even if I agreed

1:25:44

that it was obvious, there would still be the problem

1:25:46

of, how do you make that obvious to the rest

1:25:48

of the world? I mean, you can, there

1:25:52

are already like little toy models

1:25:54

showing that the very straightforward prediction

1:25:57

of a robot tries to resist being shut

1:25:59

down.

1:25:59

if it does long-term planning.

1:26:02

That's already been done. Right, but then people

1:26:04

will say, but those are just toy models. If

1:26:07

you see that in GPTs. There's a lot of assumptions

1:26:10

made in all of these things. And

1:26:12

I think

1:26:14

we're still looking at a very

1:26:16

limited piece of hypothesis space

1:26:19

about what the models will be about

1:26:22

what kinds of constraints

1:26:24

we can build into those models. One

1:26:27

way to look at it would be the things

1:26:29

that we have done have not worked, and therefore we

1:26:31

should look outside the space of what we're doing.

1:26:33

And I feel like it's a little bit like the old joke

1:26:35

about the drunk going around in circles

1:26:38

looking for the keys and the police officer says,

1:26:40

why? And they say, well, that's where the streetlight

1:26:42

is. I think that we're looking under

1:26:44

the same four or five streetlights that haven't worked,

1:26:47

and we need to build other ones. There's no

1:26:49

logical possible argument that says,

1:26:52

we couldn't direct other streetlights.

1:26:54

I think there's a lack of will and too

1:26:57

much obsession with the LLMs. And that's keeping

1:26:59

us from doing it. So even in the world where I'm

1:27:01

right and things proceed

1:27:04

either rapidly

1:27:06

or in a thresholded way where you don't get unlimited

1:27:09

free retries, that

1:27:11

can be because the capability

1:27:14

gains go too fast. It can be

1:27:16

because past a certain point, all

1:27:19

of your AIs bide their time until

1:27:21

they get strong enough so you don't get any

1:27:23

true data on what they're thinking. It

1:27:26

could be because the bad thought. That's an argument, for

1:27:28

example, to work really hard on transparency

1:27:30

and to maybe not accept technologies

1:27:32

that are not transparent. OK, so

1:27:34

the lie detector goes

1:27:36

off and everybody's like, oh, well, we still have to build

1:27:39

our AIs even though they're lying to us sometimes,

1:27:41

otherwise China will get ahead. I mean, so there

1:27:44

you talk about something we've talked about way too little, which is

1:27:46

the political and social side of this. So

1:27:49

part of what has really motivated me

1:27:51

in the last several months is worry about exactly

1:27:53

that. So there's what's

1:27:55

logically possible and what's politically possible.

1:27:58

And I am really concerned that.

1:27:59

The politics of let's not lose out

1:28:02

to China

1:28:03

is going to keep us from doing the

1:28:05

right thing in terms of building the right moral

1:28:08

systems, looking at the right range of

1:28:10

problems and so forth. So, you know, it

1:28:12

is entirely possible that we will screw ourselves.

1:28:15

If I can just like finish my point

1:28:17

there before handing it to you, indeed, but like

1:28:19

the point I was trying to say there is that even in worlds that look

1:28:21

very, very bad from that perspective, where

1:28:24

humanity is quite doomed, it will still

1:28:26

be true. You can make progress in

1:28:28

research. You can't make enough progress

1:28:30

in research fast enough in those worlds. You

1:28:33

can still make progress on transparency.

1:28:35

You can make progress on water marking. So

1:28:38

there's not, we can't just say

1:28:40

like it's possible to make progress.

1:28:43

There has to be, the question is not, is it possible

1:28:45

to make any progress? The question is,

1:28:47

is it possible to make enough progress

1:28:50

fast enough? And that's what the question has to be.

1:28:52

I agree with that. There's

1:28:55

another question of what would you have

1:28:57

us do when you have us not try to make

1:28:59

that progress? I'd have you try to make

1:29:01

that progress on a GPT-4 level

1:29:03

systems and then not

1:29:06

go past GPT-4 level systems

1:29:08

because we don't actually understand the

1:29:11

gain function for, you know, how

1:29:14

fast capabilities increase as you go past GPT-4.

1:29:16

Personally, I don't think

1:29:17

that GPT-5 is very good. All right. So I mean, we've

1:29:20

only got, go ahead. Just briefly, I

1:29:22

personally don't think that GPT-5 is going

1:29:25

to be qualitatively different from GPT-4

1:29:28

in the relevant ways to what Eliezer is talking

1:29:30

about, but I do think, you know, some

1:29:32

qualitative changes could be

1:29:34

relevant to what he's talking about. We have no

1:29:37

clue what they are. And so it is a little

1:29:39

bit dodgy to just proceed blindly

1:29:42

saying, do whatever you want. We don't really

1:29:44

have a theory and let's hope for the best. You

1:29:46

know, Eliezer is clear as to- I would mostly

1:29:47

guess that GPT-5 doesn't end

1:29:49

the world, but I don't actually know. Yeah. We don't

1:29:52

actually know. And I was going to say, the thing that

1:29:54

Eliezer has said lately that has most

1:29:56

resonated with me is we don't have

1:29:59

a plan. We really-

1:29:59

don't.

1:30:00

I put the probability distributions

1:30:03

in a much more optimistic way, I think,

1:30:05

than Eliezer would. But

1:30:08

I completely agree. We don't have a full plan

1:30:10

on these things or even close to a full plan.

1:30:12

And we should be worried and we should be working on this.

1:30:15

Okay, Scott, I'm going to give you the last word

1:30:18

before we come up on our stop time here. Gosh,

1:30:21

that's a- Unless you said all

1:30:23

there is to be said. Cheers,

1:30:27

Scott. Come on. Maybe enough has

1:30:29

been said. So

1:30:31

I think that we've

1:30:34

argued about a bunch of things, but

1:30:36

someone listening might notice that

1:30:38

actually all three of us, despite

1:30:41

having very different perspectives, agree

1:30:44

about the great

1:30:47

importance of working

1:30:49

on AI alignment.

1:30:50

I think that

1:30:52

was maybe

1:30:56

obvious to some people, including Eliezer,

1:30:58

for a long time. It was not obvious to most

1:31:00

of the world. I think that the success

1:31:04

of large language models, which

1:31:07

most of us did not predict, maybe

1:31:10

even would not have predicted

1:31:14

from any principles that we knew. But now that

1:31:16

we've seen it, the least we can do is

1:31:18

to update on that empirical

1:31:21

fact and realize that we

1:31:23

now are, in some sense,

1:31:27

in a different world. We are in a world that,

1:31:30

to a great extent, will be defined

1:31:32

by the capabilities

1:31:34

and limitations of AI going

1:31:37

forward. And

1:31:39

I don't regard it as obvious that that's

1:31:42

a world where we are all doomed,

1:31:44

where we all die. But I

1:31:46

also don't dismiss that possibility.

1:31:50

I think

1:31:50

that there is an

1:31:54

unbelievably enormous error bars

1:31:57

on where we could be going. Like

1:32:00

the one thing that a scientist is

1:32:04

sort of always feels

1:32:06

confident in saying about

1:32:08

the future is that more research is

1:32:10

needed. But I think that that's

1:32:13

especially the case here. I mean, we

1:32:15

need more knowledge about

1:32:18

what are the contours

1:32:20

of the alignment problem. And

1:32:23

of course, Eliezer and

1:32:25

Miri, his organization, were

1:32:28

trying to develop that knowledge

1:32:29

for 20 years, and they showed

1:32:32

a lot of foresight in trying

1:32:34

to do that. But they were up against

1:32:36

an enormous headwind that they

1:32:39

were sort of trying to do it in the absence

1:32:41

of either clear empirical

1:32:44

data about powerful AIs

1:32:47

or a mathematical theory. And

1:32:49

it's really, really hard to do science when

1:32:51

you have neither of those two things. And

1:32:54

now at least we have the

1:32:56

powerful AIs in the world and

1:32:58

we can get experience from

1:32:59

them. We still don't have a mathematical

1:33:02

theory that really deeply explains

1:33:04

what they're doing, but at least we can get data.

1:33:07

And so now I am much more optimistic

1:33:10

than I would have been a decade ago,

1:33:12

let's say, that one can make actual progress

1:33:16

on the AI alignment problem. Of

1:33:19

course, there is a question of timing, as

1:33:22

was discussed

1:33:24

many times. The question is, will the

1:33:26

alignment research happen fast

1:33:29

enough to keep

1:33:29

up with the capabilities research? But

1:33:32

I don't regard it as a lost cause.

1:33:35

At least it's not obvious that it won't. So

1:33:38

in any case, let's get started, or let's

1:33:40

continue. Let's

1:33:43

try to do the research and let's get

1:33:46

more people working on that. I think that that

1:33:48

is now a slam dunk, just

1:33:51

a completely clear case to make,

1:33:53

to academics, to policymakers,

1:33:56

to anyone who's interested. And I've

1:33:58

been gratified.

1:33:59

that Eliezer

1:34:02

was sort of a voice in the wilderness for

1:34:04

a long time talking about the importance of

1:34:07

AI safety. That is no longer the

1:34:09

case. You now have, I

1:34:12

mean, almost all of my friends in

1:34:15

just the academic computer science world,

1:34:17

when I see them, they mostly want to talk

1:34:20

about AI alignment. I rarely

1:34:22

agree with Scott when we trade emails.

1:34:24

Okay. I rarely

1:34:27

agree with Scott when we trade emails, we

1:34:29

seem to always disagree,

1:34:29

but I completely concur with the

1:34:32

summary that he just gave all four or five minutes

1:34:34

of. I

1:34:36

mean, there is a selection of that Gary.

1:34:39

I think the two decades gave me a sense of a roadmap

1:34:41

and it gave me a sense that we're falling enormously

1:34:44

behind on the roadmap and need to back off is

1:34:46

the way I would, is what I would say to all of that. If

1:34:49

there is a smart, talented 18 year

1:34:51

old kid listening to this podcast

1:34:53

who wants to get into this issue, what

1:34:56

is your 10 second concrete

1:34:59

advice to that person?

1:34:59

Mine is study neurosymbolic AI

1:35:02

and see if there's a way there to represent

1:35:04

values explicitly that might help us.

1:35:07

Learn all you can about computer

1:35:09

science and math and related subjects

1:35:12

and think outside the box and

1:35:15

wow everyone with a new idea.

1:35:17

Get security mindset, figure out what's going

1:35:19

to go wrong, figure out the flaws in your

1:35:21

arguments for what's going to go wrong. Try

1:35:24

to get ahead of the curve. Don't wait for

1:35:26

reality to hit you over the head with things.

1:35:29

This is very difficult. The

1:35:31

people in evolutionary biology happen to have a bunch

1:35:33

of knowledge about how to do it based on the history

1:35:35

of their own field. But

1:35:38

the security mindset, people in computer security, but

1:35:40

it's quite hard. I'll drink to all

1:35:42

of that. All right. Well, thanks

1:35:45

to all three of you for this. This was a great conversation

1:35:47

and I hope people got something out of it. So

1:35:50

with that said,

1:35:51

we're wrapped up. Thanks so much.

1:35:53

Thanks for convening this. It was fun.

1:35:59

Thanks for listening to this episode of Conversations with

1:36:02

Coleman. If you enjoyed it, be sure to

1:36:04

follow me on social media and subscribe to my

1:36:06

podcast to stay up to date on all my

1:36:08

latest content. If you really want to support

1:36:10

me, consider becoming a member of Coleman

1:36:12

Unfiltered for exclusive access to

1:36:15

subscriber-only content.

1:36:17

Thanks again for listening, and see you next time.

Rate

Get this podcast via API

From The Podcast

Conversations With Coleman

Conversations with Coleman is home to honest conversations with leading intellectuals on polarised issues in the realm of race, politics and culture in the West.

Join Podchaser to...

Rate podcasts and episodes
Follow podcasts and creators
Create podcast and episode lists
& much more

Episode Tags

Do you host or manage this podcast?
Claim and edit this page to your liking.

,

Unlock more with Podchaser Pro

Audience Insights

Contact Information

Demographics

Charts

Sponsor History

and More!

Pro Features

Resources
Help Center
Blog
API

Podchaser is the ultimate destination for podcast data, search, and discovery. Learn More