AF - The Problem With the Word 'Alignment' by peligrietzer

Released Tuesday, 21st May 2024

Good episode? Give it some love!

AF - The Problem With the Word 'Alignment' by peligrietzer

Tuesday, 21st May 2024

Good episode? Give it some love!

Rate Episode

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Problem With the Word 'Alignment', published by peligrietzer on May 21, 2024 on The AI Alignment Forum.This post was written by Peli Grietzer, inspired by internal writings by TJ (tushant jha), for AOI[1]. The original post, published on Feb 5, 2024, can be found here: https://ai.objectives.institute/blog/the-problem-with-alignment.The purpose of our work at the AI Objectives Institute (AOI) is to direct the impact of AI towards human autonomy and human flourishing.In the course of articulating our mission and positioning ourselves -- a young organization -- in the landscape of AI risk orgs, we've come to notice what we think are serious conceptual problems with the prevalent vocabulary of 'AI alignment.' This essay will discuss some of the major ways in which we think the concept of 'alignment' creates bias and confusion, as well as our own search for clarifying concepts.At AOI, we try to think about AI within the context of humanity's contemporary institutional structures: How do contemporary market and non-market (eg. bureaucratic, political, ideological, reputational) forces shape AI R&D and deployment, and how will the rise of AI-empowered corporate, state, and NGO actors reshape those forces? We increasingly feel that 'alignment' talk tends to obscure or distort these questions.The trouble, we believe, is the idea that there is a single so-called Alignment Problem. Talk about an 'Alignment Problem' tends to conflate a family of related but distinct technical and social problems, including:P1: Avoiding takeover from emergent optimization in AI agentsP2: Ensuring that AI's information processing (and/or reasoning) is intelligible to usP3: Ensuring AIs are good at solving problems as specified (by user or designer)P4: Ensuring AI systems enhance, and don't erode, human agencyP5: Ensuring that advanced AI agents learn a human utility functionP6: Ensuring that AI systems lead to desirable systemic and long term outcomesEach of P1-P6 is known as 'the Alignment Problem' (or as the core research problem in 'Alignment Research') to at least some people in the greater AI Risk sphere, in at least some contexts. And yet these problems are clearly not simply interchangeable: placing any one of P1-P6 at the center of AI safety implies a complicated background theory about their relationship, their relative difficulty, and their relative significance.We believe that when different individuals and organizations speak of the 'Alignment Problem,' they assume different controversial reductions of the P1-P6 problems network to one of its elements. Furthermore, the very idea of an 'Alignment Problem' precommits us to finding a reduction for P1-P6, obscuring the possibility that this network of problems calls for a multi-pronged treatment.One surface-level consequence of the semantic compression around 'alignment' is widespread miscommunication, as well as fights over linguistic real-estate. The deeper problem, though, is that this compression serves to obscure some of a researcher's or org's foundational ideas about AI by 'burying' them under the concept of alignment.Take a familiar example of a culture clash within the greater AI Risk sphere: many mainstream AI researchers identify 'alignment work' with incremental progress on P3 (task-reliability), which researchers in the core AI Risk community reject as just safety-washed capabilities research. We believe working through this culture-clash requires that both parties state their theories about the relationship between progress on P3 and progress on P1 (takeover avoidance).In our own work at AOI, we've had occasion to closely examine a viewpoint we call the Berkeley Model of Alignment -- a popular reduction of P1-P6 to P5 (agent value-learning) based on a paradigm consolidated at UC Berkeley's CHAI research gr...

Rate