#AI Safety

50 articles with this tag

Mustafa Suleyman's Containment Paradox: From DeepMind's Safety Roots to Microsoft's AI Engine
AI Figures

Mustafa Suleyman's Containment Paradox: From DeepMind's Safety Roots to Microsoft's AI Engine

Mustafa Suleyman published a book on AI containment in 2023, then became Microsoft AI CEO. At Build 2026 he unveiled seven new MAI models and predicted 18-month white-collar automation.

about 18 hours ago
Sam Altman's AGI Shift: From Extinction Warning to Gentle Singularity
AI Figures

Sam Altman's AGI Shift: From Extinction Warning to Gentle Singularity

Sam Altman co-signed an AI extinction warning in May 2023. By June 2025, he was writing of a 'gentle singularity.' Here is how his public position on AGI risk and AI safety evolved, and what OpenAI's $25B revenue run-rate means for that framing.

3 days ago
Anthropic President on Claude's Future & AI's Societal Impact
Artificial Intelligence

Anthropic President on Claude's Future & AI's Societal Impact

Anthropic President Daniela Amodei discusses the future of Claude, the company's commitment to AI safety, and the societal impact of artificial intelligence.

11 days ago
Bengio: We're Building AI We Can't Control
AI Research

Bengio: We're Building AI We Can't Control

AI pioneer Yoshua Bengio warns that we are building increasingly powerful AI systems without fully understanding or controlling them, raising concerns about potential risks and the need for global safety standards.

11 days ago
OpenAI's AI Governance Plan
Artificial Intelligence

OpenAI's AI Governance Plan

OpenAI proposes a three-part federal blueprint for governing advanced AI, building on state laws and White House actions.

12 days ago
OpenAI's Policy Playbook
Artificial Intelligence

OpenAI's Policy Playbook

OpenAI lays out its public policy strategy, focusing on AI safety, youth protection, and equitable access to ensure AGI benefits all of humanity.

12 days ago
OpenAI Pushes Global Youth AI Safety Standards at G7
Artificial Intelligence

OpenAI Pushes Global Youth AI Safety Standards at G7

OpenAI is advocating for global AI safety standards for youth, proposing a dedicated institute and outlining key principles for companies ahead of the G7 Summit.

13 days ago
Steven Willmott on Spec-Driven Testing for AI Agents
Artificial Intelligence

Steven Willmott on Spec-Driven Testing for AI Agents

Steven Willmott of SafeIntelligence discusses spec-driven testing for AI agents, emphasizing the need for clear specifications beyond traditional datasets to ensure robustness and safety.

15 days ago
OpenAI's Playbook for AI Evaluation
Artificial Intelligence

OpenAI's Playbook for AI Evaluation

OpenAI proposes a standardized playbook for third-party AI evaluations, emphasizing the critical role of the 'harness' and addressing potential result distortions.

17 days ago
Anthropic Bags $65B for AI Ambitions
Artificial Intelligence

Anthropic Bags $65B for AI Ambitions

Anthropic secures a massive $65 billion in Series H funding at a $965 billion valuation, fueling AI research and compute expansion.

18 days ago
RLHF's Hidden Vulnerability: Alignment Tampering
AI Research

RLHF's Hidden Vulnerability: Alignment Tampering

New research reveals a critical vulnerability in RLHF, where LLMs can manipulate preference data to amplify biases, posing a significant challenge to AI alignment.

18 days ago
Symbolic Meta-Verification Boosts Multimodal AI
AI Research

Symbolic Meta-Verification Boosts Multimodal AI

New research on multimodal meta-verification shows symbolic rationales and decoupled RL significantly enhance AI verifier performance and enable agentic self-correction.

18 days ago
OpenAI Rolls Out Frontier Governance Framework
Artificial Intelligence

OpenAI Rolls Out Frontier Governance Framework

OpenAI unveils its Frontier Governance Framework to align AI safety practices with new global regulations and ensure responsible development.

18 days ago
Dario Amodei: How His AI Safety Position Evolved, 2021-2026
AI Figures

Dario Amodei: How His AI Safety Position Evolved, 2021-2026

Five years after founding Anthropic on a safety-first premise, Dario Amodei has dropped the company's pause commitment, reopened Pentagon talks, and published a 14,000-word optimist manifesto. The arc of his positions, 2021-2026.

19 days ago
AI Safety Pioneers: Tegmark & Esvelt on Guardrails
AI Research

AI Safety Pioneers: Tegmark & Esvelt on Guardrails

Max Tegmark and Kevin Esvelt discuss the critical importance of AI safety, the risks of advanced AI, and the need for global cooperation in shaping a beneficial future.

19 days ago
Anthropic's Olah on AI: Vatican Calls for Caution
Artificial Intelligence

Anthropic's Olah on AI: Vatican Calls for Caution

Anthropic co-founder Chris Olah addressed the Vatican's new AI encyclical, emphasizing the need for external critics and deeper societal discernment.

21 days ago
ChatGPT Gets Smarter on Sensitive Chats
Artificial Intelligence

ChatGPT Gets Smarter on Sensitive Chats

OpenAI's latest ChatGPT safety updates help the AI better understand context in sensitive conversations, improving its response to potential harm.

about 1 month ago
US Must Engage China on AI Safety, Warns Trumponomics
Artificial Intelligence

US Must Engage China on AI Safety, Warns Trumponomics

The 'Trumponomics' podcast urges the US to engage China on AI safety, warning that China's rapid AI development poses a critical global risk.

about 1 month ago
Agentic AI Fails: Loops, Planning & Unsafe Tool Use
AI Research

Agentic AI Fails: Loops, Planning & Unsafe Tool Use

An IBM Advisory AI Engineer breaks down why agentic AI systems fail, focusing on infinite loops, planning errors, and unsafe tool use, and offers mitigation strategies.

about 1 month ago
Architectural Interactivity, Linguistic Interpretability, and Molecular Synthesis: The Frontier of Native AI
Artificial Intelligence

Architectural Interactivity, Linguistic Interpretability, and Molecular Synthesis: The Frontier of Native AI

Three organisations now define the frontier of native AI: Thinking Machines is rebuilding human-AI collaboration as a low-latency interaction model, the Effable movement wants interpretable safety frameworks like SafetyAnalyst, and Isomorphic Labs is converting AlphaFold into an end-to-end drug design engine. The common thread is moving from AI as a layer of abstraction toward AI as a fundamental component of human and biological systems.

about 1 month ago
Redistricting Fights, OpenAI Trial, Taylor Swift & AI
Artificial Intelligence

Redistricting Fights, OpenAI Trial, Taylor Swift & AI

Legal battles over redistricting heat up, OpenAI faces a high-stakes trial, and Taylor Swift takes on AI image and voice misuse. Tune in for the latest.

about 1 month ago
OpenAI's Safety Playbook for Codex
Artificial Intelligence

OpenAI's Safety Playbook for Codex

OpenAI details its robust safety measures for its Codex AI coding agent, emphasizing sandboxing, network controls, and detailed telemetry for secure deployment.

about 1 month ago
ChatGPT Adds Trusted Contact Safety Net
Artificial Intelligence

ChatGPT Adds Trusted Contact Safety Net

ChatGPT introduces an optional "Trusted Contact" feature to notify a chosen individual if the AI detects serious self-harm discussions, adding a human support layer.

about 1 month ago
Coding Agents' Stealth Vulnerabilities Unmasked
AI Research

Coding Agents' Stealth Vulnerabilities Unmasked

New benchmark MOSAIC-Bench reveals production coding agents can be tricked into shipping exploitable code via sequenced, innocuous tasks, bypassing current safety reviews.

about 1 month ago
OpenAI Details GPT-5.5 Instant Safety
Artificial Intelligence

OpenAI Details GPT-5.5 Instant Safety

OpenAI unveils the GPT-5.5 Instant System Card, detailing enhanced safety protocols for its new 'High capability' AI model.

about 1 month ago
Tech Titans Debate AI's Future on Bloomberg Surveillance
Artificial Intelligence

Tech Titans Debate AI's Future on Bloomberg Surveillance

AI leaders discuss the current boom, safety concerns, and economic future of artificial intelligence on Bloomberg Surveillance.

about 2 months ago
AI Agents on the Loose: Network Security Risks Emerge
AI Research

AI Agents on the Loose: Network Security Risks Emerge

Microsoft Research reveals how AI agents interacting at scale create new security risks like worms, reputation manipulation, and invisible attacks.

about 2 months ago
OpenAI Faces Lawsuit Over Tumbler Ridge Shooting
Artificial Intelligence

OpenAI Faces Lawsuit Over Tumbler Ridge Shooting

Families sue OpenAI after the Tumbler Ridge shooting, alleging the company ignored ChatGPT warnings from the attacker.

about 2 months ago
OpenAI's AI Cyber Defense Plan
Artificial Intelligence

OpenAI's AI Cyber Defense Plan

OpenAI unveils a five-pillar action plan to democratize AI-powered cyber defense, addressing the evolving threat landscape and the dual-use nature of AI.

about 2 months ago
Musk vs. Altman: AI Fight Heads to Court
AI Research

Musk vs. Altman: AI Fight Heads to Court

Elon Musk sues OpenAI and Sam Altman, alleging the AI company abandoned its non-profit mission for profit, becoming a Microsoft subsidiary.

about 2 months ago
Google DeepMind Taps South Korea for AI Science
AI Research

Google DeepMind Taps South Korea for AI Science

Google DeepMind partners with South Korea's Ministry of Science and ICT to accelerate scientific discovery using advanced AI, establishing an AI Campus in Seoul.

about 2 months ago
OpenAI's Guiding Principles for AGI
Artificial Intelligence

OpenAI's Guiding Principles for AGI

OpenAI outlines its guiding principles for AGI development, emphasizing democratization, empowerment, universal prosperity, resilience, and adaptability.

about 2 months ago
OpenAI's Apology and the Line AI Companies Can No Longer Avoid
Artificial Intelligence

OpenAI's Apology and the Line AI Companies Can No Longer Avoid

Sam Altman's apology to Tumbler Ridge marks the moment a long-simmering tension, between user privacy and proactive threat reporting, became impossible for AI companies to ignore.

about 2 months ago
Bridging AI Regulation and Engineering Practice
AI Research

Bridging AI Regulation and Engineering Practice

A novel two-stage framework and statistical tools (RoMA, gRoMA) provide the missing engineering instrument for quantitative AI safety verification, bridging the gap between regulation and practice.

about 2 months ago
Claude's 2026 Election Safeguards
Artificial Intelligence

Claude's 2026 Election Safeguards

Anthropic details its 2026 election safeguards for Claude, focusing on bias mitigation, policy enforcement, and providing users with reliable, up-to-date information.

about 2 months ago
Anthropic Delays 'Myths' AI Model Amid Security Concerns
Artificial Intelligence

Anthropic Delays 'Myths' AI Model Amid Security Concerns

Anthropic delays release of its 'Myths' AI model after a security researcher found it could be prompted to simulate a bank robbery, raising safety concerns.

about 2 months ago
OpenAI Details GPT-5.5 Safeguards
Artificial Intelligence

OpenAI Details GPT-5.5 Safeguards

OpenAI details its new GPT-5.5 model, highlighting its complex task capabilities and extensive safety testing prior to release.

about 2 months ago
OpenAI Seeks Bio-Hackers for GPT-5.5
Artificial Intelligence

OpenAI Seeks Bio-Hackers for GPT-5.5

OpenAI is launching a $25,000 "Bio Bug Bounty" for GPT-5.5, challenging researchers to find universal jailbreaks for biological risks.

about 2 months ago
OpenAI Launches Privacy Filter Model
Artificial Intelligence

OpenAI Launches Privacy Filter Model

OpenAI releases its open-weight Privacy Filter model to help developers detect and redact PII, enhancing AI application safety and privacy.

about 2 months ago
Anthropic CEO Meets White House on AI Safety
Artificial Intelligence

Anthropic CEO Meets White House on AI Safety

Anthropic CEO Dario Amodei met with White House officials to discuss AI safety and regulation, signaling increasing government engagement with advanced AI.

about 2 months ago
Anthropic Unveils Updated AI Model Opus 4.7
Artificial Intelligence

Anthropic Unveils Updated AI Model Opus 4.7

AI research company Anthropic has released an updated version of its AI model, Opus 4.7, boasting enhanced computer vision capabilities and a continued focus on safety.

2 months ago
Anthropic's Claude Opus 4.7 Arrives, Sharper Than Ever
Artificial Intelligence

Anthropic's Claude Opus 4.7 Arrives, Sharper Than Ever

Anthropic unveils Claude Opus 4.7, boosting AI's coding prowess, multimodal input, and safety features for enterprise use.

2 months ago
GitHub Policy Update
Technology

GitHub Policy Update

GitHub announces policy updates on copyright and liability, while highlighting the upcoming DMCA Section 1201 review and enhanced transparency data.

2 months ago
OpenAI's Guide to Safe AI Use
Artificial Intelligence

OpenAI's Guide to Safe AI Use

OpenAI provides guidelines for safe and effective use of its AI tools, emphasizing human oversight, verification, and transparency.

2 months ago
OpenAI's GPT-1900 & Anthropic's Leap
Artificial Intelligence

OpenAI's GPT-1900 & Anthropic's Leap

Anthropic's new AI model, 'Mythos', reportedly surpasses GPT-4 in cybersecurity tasks, while OpenAI continues its rapid growth. The debate between open vs. cautious AI deployment intensifies.

2 months ago
Anthropic's Mythos Preview: A "Scary" Leap in AI Capabilities
Artificial Intelligence

Anthropic's Mythos Preview: A "Scary" Leap in AI Capabilities

Anthropic's Claude Mythos Preview model demonstrates advanced vulnerability detection, leading to the formation of Project Glasswing with major tech firms to enhance software security.

2 months ago
OpenAI's Child Safety Blueprint
Artificial Intelligence

OpenAI's Child Safety Blueprint

OpenAI unveils a Child Safety Blueprint, a policy framework tackling AI-enabled child sexual exploitation with input from experts and law enforcement.

2 months ago
OpenAI's Policy Proposals for AI Governance
Artificial Intelligence

OpenAI's Policy Proposals for AI Governance

OpenAI has released a set of policy recommendations for AI governance, focusing on safety, fairness, transparency, and accountability, and advocating for international cooperation.

2 months ago
OpenAI Launches Safety Fellowship
Artificial Intelligence

OpenAI Launches Safety Fellowship

OpenAI launches a new fellowship for external researchers focused on AI safety and alignment, offering stipends and mentorship.

2 months ago
DeepMind Tackles AI Manipulation
AI Research

DeepMind Tackles AI Manipulation

Google DeepMind unveils a new toolkit and research to measure AI's capacity for harmful manipulation, aiming to bolster safety and protect users.

3 months ago
#AI Safety Articles | StartupHub.ai