LW - On Dwarkesh's Podcast with OpenAI's John Schulman by Zvi

Released Tuesday, 21st May 2024

Good episode? Give it some love!

LW - On Dwarkesh's Podcast with OpenAI's John Schulman by Zvi

Tuesday, 21st May 2024

Good episode? Give it some love!

Rate Episode

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Dwarkesh's Podcast with OpenAI's John Schulman, published by Zvi on May 21, 2024 on LessWrong.Dwarkesh Patel recorded a Podcast with John Schulman, cofounder of OpenAI and at the time their head of current model post-training. Transcript here. John's job at the time was to make the current AIs do what OpenAI wanted them to do. That is an important task, but one that employs techniques that their at-the-time head of alignment, Jan Leike, made clear we should not expect to work on future more capable systems. I strongly agree with Leike on that.Then Sutskever left and Leike resigned, and John Schulman was made the new head of alignment, now charged with what superalignment efforts remain at OpenAI to give us the ability to control future AGIs and ASIs.This gives us a golden opportunity to assess where his head is at, without him knowing he was about to step into that role.There is no question that John Schulman is a heavyweight. He executes and ships. He knows machine learning. He knows post-training and mundane alignment.The question is, does he think well about this new job that has been thrust upon him?The Big TakeOverall I was pleasantly surprised and impressed.In particular, I was impressed by John's willingness to accept uncertainty and not knowing things.He does not have a good plan for alignment, but he is far less confused about this fact than most others in similar positions.He does not know how to best navigate the situation if AGI suddenly happened ahead of schedule in multiple places within a short time frame, but I have not ever heard a good plan for that scenario, and his speculations seem about as directionally correct and helpful as one could hope for there.Are there answers that are cause for concern, and places where he needs to fix misconceptions as quickly as possible? Oh, hell yes.His reactions to potential scenarios involved radically insufficient amounts of slowing down, halting and catching fire, freaking out and general understanding of the stakes.Some of that I think was about John and others at OpenAI using a very weak definition of AGI (perhaps partly because of the Microsoft deal?) but also partly he does not seem to appreciate what it would mean to have an AI doing his job, which he says he expects in a median of five years.His answer on instrumental convergence is worrisome, as others have pointed out. He dismisses concerns that an AI given a bounded task would start doing things outside the intuitive task scope, or the dangers of an AI 'doing a bunch of wacky things' a human would not have expected. On the plus side, it shows understanding of the key concepts on a basic (but not yet deep) level, and he readily admits it is an issue with commands that are likely to be given in practice, such as 'make money.'In general, he seems willing to react to advanced capabilities by essentially scaling up various messy solutions in ways that I predict would stop working at that scale or with something that outsmarts you and that has unanticipated affordances and reason to route around typical in-distribution behaviors.He does not seem to have given sufficient thought to what happens when a lot of his assumptions start breaking all at once, exactly because the AI is now capable enough to be properly dangerous.As with the rest of OpenAI, another load-bearing assumption is presuming gradual changes throughout all this, including assuming past techniques will not break. I worry that will not hold.He has some common confusions about regulatory options and where we have viable intervention points within competitive dynamics and game theory, but that's understandable, and also was at the time very much not his department.As with many others, there seems to be a disconnect.A lot of the thinking here seems like excellent practical thi...

Rate