LW - New voluntary commitments (AI Seoul Summit) by Zach Stein-Perlman

Released Tuesday, 21st May 2024

Good episode? Give it some love!

LW - New voluntary commitments (AI Seoul Summit) by Zach Stein-Perlman

Tuesday, 21st May 2024

Good episode? Give it some love!

Rate Episode

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New voluntary commitments (AI Seoul Summit), published by Zach Stein-Perlman on May 21, 2024 on LessWrong.Basically the companies commit to make responsible scaling policies.Part of me says this is amazing, the best possible commitment short of all committing to a specific RSP. It's certainly more real than almost all other possible kinds of commitments. But as far as I can tell, people pay almost no attention to what RSP-ish documents (Anthropic, OpenAI, Google) actually say and whether the companies are following them. The discourse is more like "Anthropic, OpenAI, and Google have safety plans and other companies don't." Hopefully that will change.Maybe "These commitments represent a crucial and historic step forward for international AI governance." It does seem nice from an international-governance perspective that Mistral AI, TII, and a Chinese company joined.The UK and Republic of Korea governments announced that the following organisations have agreed to the Frontier AI Safety Commitments:AmazonAnthropicCohereGoogleG42IBMInflection AIMetaMicrosoftMistral AINaverOpenAISamsung ElectronicsTechnology Innovation InstitutexAIZhipu.aiThe above organisations, in furtherance of safe and trustworthy AI, undertake to develop and deploy their frontier AI models and systems[1] responsibly, in accordance with the following voluntary commitments, and to demonstrate how they have achieved this by publishing a safety framework focused on severe risks by the upcoming AI Summit in France.Given the evolving state of the science in this area, the undersigned organisations' approaches (as detailed in paragraphs I-VIII) to meeting Outcomes 1, 2 and 3 may evolve in the future. In such instances, organisations will provide transparency on this, including their reasons, through public updates.The above organisations also affirm their commitment to implement current best practices related to frontier AI safety, including: internal and external red-teaming of frontier AI models and systems for severe and novel threats; to work toward information sharing; to invest in cybersecurity and insider threat safeguards to protect proprietary and unreleased model weights; to incentivize third-party discovery and reporting of issues and vulnerabilities; to develop and deploy mechanisms thatenable users to understand if audio or visual content is AI-generated; to publicly report model or system capabilities, limitations, and domains of appropriate and inappropriate use; to prioritize research on societal risks posed by frontier AI models and systems; and to develop and deploy frontier AI models and systems to help address the world's greatest challenges.Outcome 1. Organisations effectively identify, assess and manage risks when developing and deploying their frontier AI models and systems. They will:I. Assess the risks posed by their frontier models or systems across the AI lifecycle, including before deploying that model or system, and, as appropriate, before and during training. Risk assessments should consider model capabilities and the context in which they are developed and deployed, as well as the efficacy of implemented mitigations to reduce the risks associated with their foreseeable use and misuse.They should also consider results from internal and external evaluations as appropriate, such as by independent third-party evaluators, their home governments[2], and other bodies their governments deem appropriate.II. Set out thresholds[3] at which severe risks posed by a model or system, unless adequately mitigated, would be deemed intolerable. Assess whether these thresholds have been breached, including monitoring how close a model or system is to such a breach. These thresholds should be defined with input from trusted actors, including organisations' respective ho...

Rate