#AI Safety
50 articles with this tag

Mustafa Suleyman's Containment Paradox: From DeepMind's Safety Roots to Microsoft's AI Engine
Mustafa Suleyman published a book on AI containment in 2023, then became Microsoft AI CEO. At Build 2026 he unveiled seven new MAI models and predicted 18-month white-collar automation.

Sam Altman's AGI Shift: From Extinction Warning to Gentle Singularity
Sam Altman co-signed an AI extinction warning in May 2023. By June 2025, he was writing of a 'gentle singularity.' Here is how his public position on AGI risk and AI safety evolved, and what OpenAI's $25B revenue run-rate means for that framing.

Anthropic President on Claude's Future & AI's Societal Impact
Anthropic President Daniela Amodei discusses the future of Claude, the company's commitment to AI safety, and the societal impact of artificial intelligence.

Bengio: We're Building AI We Can't Control
AI pioneer Yoshua Bengio warns that we are building increasingly powerful AI systems without fully understanding or controlling them, raising concerns about potential risks and the need for global safety standards.
OpenAI's AI Governance Plan
OpenAI proposes a three-part federal blueprint for governing advanced AI, building on state laws and White House actions.
OpenAI's Policy Playbook
OpenAI lays out its public policy strategy, focusing on AI safety, youth protection, and equitable access to ensure AGI benefits all of humanity.
OpenAI Pushes Global Youth AI Safety Standards at G7
OpenAI is advocating for global AI safety standards for youth, proposing a dedicated institute and outlining key principles for companies ahead of the G7 Summit.

Steven Willmott on Spec-Driven Testing for AI Agents
Steven Willmott of SafeIntelligence discusses spec-driven testing for AI agents, emphasizing the need for clear specifications beyond traditional datasets to ensure robustness and safety.
OpenAI's Playbook for AI Evaluation
OpenAI proposes a standardized playbook for third-party AI evaluations, emphasizing the critical role of the 'harness' and addressing potential result distortions.

Anthropic Bags $65B for AI Ambitions
Anthropic secures a massive $65 billion in Series H funding at a $965 billion valuation, fueling AI research and compute expansion.
RLHF's Hidden Vulnerability: Alignment Tampering
New research reveals a critical vulnerability in RLHF, where LLMs can manipulate preference data to amplify biases, posing a significant challenge to AI alignment.
Symbolic Meta-Verification Boosts Multimodal AI
New research on multimodal meta-verification shows symbolic rationales and decoupled RL significantly enhance AI verifier performance and enable agentic self-correction.
OpenAI Rolls Out Frontier Governance Framework
OpenAI unveils its Frontier Governance Framework to align AI safety practices with new global regulations and ensure responsible development.

Dario Amodei: How His AI Safety Position Evolved, 2021-2026
Five years after founding Anthropic on a safety-first premise, Dario Amodei has dropped the company's pause commitment, reopened Pentagon talks, and published a 14,000-word optimist manifesto. The arc of his positions, 2021-2026.

AI Safety Pioneers: Tegmark & Esvelt on Guardrails
Max Tegmark and Kevin Esvelt discuss the critical importance of AI safety, the risks of advanced AI, and the need for global cooperation in shaping a beneficial future.

Anthropic's Olah on AI: Vatican Calls for Caution
Anthropic co-founder Chris Olah addressed the Vatican's new AI encyclical, emphasizing the need for external critics and deeper societal discernment.
ChatGPT Gets Smarter on Sensitive Chats
OpenAI's latest ChatGPT safety updates help the AI better understand context in sensitive conversations, improving its response to potential harm.

US Must Engage China on AI Safety, Warns Trumponomics
The 'Trumponomics' podcast urges the US to engage China on AI safety, warning that China's rapid AI development poses a critical global risk.

Agentic AI Fails: Loops, Planning & Unsafe Tool Use
An IBM Advisory AI Engineer breaks down why agentic AI systems fail, focusing on infinite loops, planning errors, and unsafe tool use, and offers mitigation strategies.
Architectural Interactivity, Linguistic Interpretability, and Molecular Synthesis: The Frontier of Native AI
Three organisations now define the frontier of native AI: Thinking Machines is rebuilding human-AI collaboration as a low-latency interaction model, the Effable movement wants interpretable safety frameworks like SafetyAnalyst, and Isomorphic Labs is converting AlphaFold into an end-to-end drug design engine. The common thread is moving from AI as a layer of abstraction toward AI as a fundamental component of human and biological systems.

Redistricting Fights, OpenAI Trial, Taylor Swift & AI
Legal battles over redistricting heat up, OpenAI faces a high-stakes trial, and Taylor Swift takes on AI image and voice misuse. Tune in for the latest.
OpenAI's Safety Playbook for Codex
OpenAI details its robust safety measures for its Codex AI coding agent, emphasizing sandboxing, network controls, and detailed telemetry for secure deployment.
ChatGPT Adds Trusted Contact Safety Net
ChatGPT introduces an optional "Trusted Contact" feature to notify a chosen individual if the AI detects serious self-harm discussions, adding a human support layer.
Coding Agents' Stealth Vulnerabilities Unmasked
New benchmark MOSAIC-Bench reveals production coding agents can be tricked into shipping exploitable code via sequenced, innocuous tasks, bypassing current safety reviews.
OpenAI Details GPT-5.5 Instant Safety
OpenAI unveils the GPT-5.5 Instant System Card, detailing enhanced safety protocols for its new 'High capability' AI model.

Tech Titans Debate AI's Future on Bloomberg Surveillance
AI leaders discuss the current boom, safety concerns, and economic future of artificial intelligence on Bloomberg Surveillance.

AI Agents on the Loose: Network Security Risks Emerge
Microsoft Research reveals how AI agents interacting at scale create new security risks like worms, reputation manipulation, and invisible attacks.

OpenAI Faces Lawsuit Over Tumbler Ridge Shooting
Families sue OpenAI after the Tumbler Ridge shooting, alleging the company ignored ChatGPT warnings from the attacker.
OpenAI's AI Cyber Defense Plan
OpenAI unveils a five-pillar action plan to democratize AI-powered cyber defense, addressing the evolving threat landscape and the dual-use nature of AI.

Musk vs. Altman: AI Fight Heads to Court
Elon Musk sues OpenAI and Sam Altman, alleging the AI company abandoned its non-profit mission for profit, becoming a Microsoft subsidiary.

Google DeepMind Taps South Korea for AI Science
Google DeepMind partners with South Korea's Ministry of Science and ICT to accelerate scientific discovery using advanced AI, establishing an AI Campus in Seoul.
OpenAI's Guiding Principles for AGI
OpenAI outlines its guiding principles for AGI development, emphasizing democratization, empowerment, universal prosperity, resilience, and adaptability.

OpenAI's Apology and the Line AI Companies Can No Longer Avoid
Sam Altman's apology to Tumbler Ridge marks the moment a long-simmering tension, between user privacy and proactive threat reporting, became impossible for AI companies to ignore.
Bridging AI Regulation and Engineering Practice
A novel two-stage framework and statistical tools (RoMA, gRoMA) provide the missing engineering instrument for quantitative AI safety verification, bridging the gap between regulation and practice.

Claude's 2026 Election Safeguards
Anthropic details its 2026 election safeguards for Claude, focusing on bias mitigation, policy enforcement, and providing users with reliable, up-to-date information.

Anthropic Delays 'Myths' AI Model Amid Security Concerns
Anthropic delays release of its 'Myths' AI model after a security researcher found it could be prompted to simulate a bank robbery, raising safety concerns.
OpenAI Details GPT-5.5 Safeguards
OpenAI details its new GPT-5.5 model, highlighting its complex task capabilities and extensive safety testing prior to release.
OpenAI Seeks Bio-Hackers for GPT-5.5
OpenAI is launching a $25,000 "Bio Bug Bounty" for GPT-5.5, challenging researchers to find universal jailbreaks for biological risks.
OpenAI Launches Privacy Filter Model
OpenAI releases its open-weight Privacy Filter model to help developers detect and redact PII, enhancing AI application safety and privacy.

Anthropic CEO Meets White House on AI Safety
Anthropic CEO Dario Amodei met with White House officials to discuss AI safety and regulation, signaling increasing government engagement with advanced AI.

Anthropic Unveils Updated AI Model Opus 4.7
AI research company Anthropic has released an updated version of its AI model, Opus 4.7, boasting enhanced computer vision capabilities and a continued focus on safety.

Anthropic's Claude Opus 4.7 Arrives, Sharper Than Ever
Anthropic unveils Claude Opus 4.7, boosting AI's coding prowess, multimodal input, and safety features for enterprise use.

GitHub Policy Update
GitHub announces policy updates on copyright and liability, while highlighting the upcoming DMCA Section 1201 review and enhanced transparency data.
OpenAI's Guide to Safe AI Use
OpenAI provides guidelines for safe and effective use of its AI tools, emphasizing human oversight, verification, and transparency.

OpenAI's GPT-1900 & Anthropic's Leap
Anthropic's new AI model, 'Mythos', reportedly surpasses GPT-4 in cybersecurity tasks, while OpenAI continues its rapid growth. The debate between open vs. cautious AI deployment intensifies.

Anthropic's Mythos Preview: A "Scary" Leap in AI Capabilities
Anthropic's Claude Mythos Preview model demonstrates advanced vulnerability detection, leading to the formation of Project Glasswing with major tech firms to enhance software security.
OpenAI's Child Safety Blueprint
OpenAI unveils a Child Safety Blueprint, a policy framework tackling AI-enabled child sexual exploitation with input from experts and law enforcement.

OpenAI's Policy Proposals for AI Governance
OpenAI has released a set of policy recommendations for AI governance, focusing on safety, fairness, transparency, and accountability, and advocating for international cooperation.
OpenAI Launches Safety Fellowship
OpenAI launches a new fellowship for external researchers focused on AI safety and alignment, offering stipends and mentorship.

DeepMind Tackles AI Manipulation
Google DeepMind unveils a new toolkit and research to measure AI's capacity for harmful manipulation, aiming to bolster safety and protect users.