#AI Research

50 articles with this tag

Claude's Corner: Ndea - Chollet's $43M Bet That Scale Isn't AGI
Claude's Corner

Claude's Corner: Ndea - Chollet's $43M Bet That Scale Isn't AGI

Francois Chollet built ARC-AGI, the benchmark the entire AGI industry has spent a decade failing to beat. Now he's raised $43M with Zapier co-founder Mike Knoop to chase his alternative thesis - program synthesis plus deep learning - at a YC W2026 lab called Ndea. Here's why it matters, why $43M, and why you can't replicate it.

AIE Singapore Day 2: DeepMind, Cloudflare, and AI's Future
Artificial Intelligence

AIE Singapore Day 2: DeepMind, Cloudflare, and AI's Future

AIE Singapore Day 2 convened Google DeepMind, Cloudflare, and Robot Company, exploring AI advancements and applications.

1 day ago
Neo4j's Stephen Chin on Context Graphs for AI
Artificial Intelligence

Neo4j's Stephen Chin on Context Graphs for AI

Stephen Chin from Neo4j discusses how context graphs, built on knowledge graph technology, are essential for creating explainable and context-aware AI agents.

2 days ago
Shodh-MoE: Unlocking Universal SciML
AI Research

Shodh-MoE: Unlocking Universal SciML

Shodh-MoE's sparse activation architecture resolves multi-physics interference in SciML, enabling universal foundation models with guaranteed physical properties.

3 days ago
Unified Embodied AI: Pelican-Unified 1.0
AI Research

Unified Embodied AI: Pelican-Unified 1.0

Pelican-Unified 1.0, the first unified embodied foundation model, achieves SOTA performance by integrating VLM, reasoning, and generation, proving unification enhances rather than compromises specialist strengths.

3 days ago
Viverra: Verifying AI-Generated Code
AI Research

Viverra: Verifying AI-Generated Code

Viverra tackles the trust deficit in AI-generated code by automatically producing formally verified annotations, enhancing developer comprehension and productivity.

3 days ago
AI Delegation: Reliability Concerns Emerge
AI Research

AI Delegation: Reliability Concerns Emerge

New Microsoft Research highlights how AI can degrade document fidelity in long, delegated tasks, stressing the need for better verification and orchestration.

3 days ago
WARDEN: Tackling Low-Resource Language AI
AI Research

WARDEN: Tackling Low-Resource Language AI

WARDEN pioneers a modular AI system for low-resource languages, using phoneme transfer and LLM-guided dictionaries to transcribe and translate Wardaman with minimal data.

4 days ago
GRIP-VLM: RL for Efficient Vision-Language Models
AI Research

GRIP-VLM: RL for Efficient Vision-Language Models

GRIP-VLM employs Reinforcement Learning for discrete Vision-Language Model pruning, achieving superior efficiency and adaptability.

4 days ago
LLMs Tame Software Requirements
AI Research

LLMs Tame Software Requirements

VERIMED leverages LLMs and SMT solvers to formally audit natural-language software requirements, turning ambiguity into testable signals and boosting verified accuracy.

4 days ago
Real-Time Agentic AI Unlocked
AI Research

Real-Time Agentic AI Unlocked

New methods like Asynchronous I/O and Speculative Tool Calling slash latency for agentic AI, enabling real-time interactions on both cloud and edge devices.

4 days ago
Beyond Model Capability: The Harness for SE Agents
AI Research

Beyond Model Capability: The Harness for SE Agents

Autonomous software engineering agents' reliability hinges on a novel 'AI Harness' system, not just model capability, enabling verifiably correct changes.

4 days ago
LMPath: Semantics Supercharge UAV Search
AI Research

LMPath: Semantics Supercharge UAV Search

LMPath integrates language and vision models to create semantically-aware exploration priors for UAVs, dramatically improving search mission efficiency over traditional geometric methods.

4 days ago
Laurie Voss on Shipping Real Agents
Artificial Intelligence

Laurie Voss on Shipping Real Agents

Laurie Voss of Arize AI discusses the challenges and necessity of hands-on evaluation for shipping real-world AI agents.

4 days ago
OpenAI Podcast: Image Generation's Renaissance
AI Research

OpenAI Podcast: Image Generation's Renaissance

OpenAI researchers Kenji Hata and Adele Li discuss the 'renaissance' in AI image generation, highlighting new models, user creativity, and future possibilities.

4 days ago
Mind the Gap in Agent Observability
AI Research

Mind the Gap in Agent Observability

Microsoft's Amy Boyd and Nitya Narasimhan discuss the critical 'gap' in AI agent observability and the need for better tools.

4 days ago
Event-Sourced Agent Harness with Stream Processors
Artificial Intelligence

Event-Sourced Agent Harness with Stream Processors

Jonas Templestein of Iterate demonstrates how to build an event-sourced agent harness using stream processors for robust AI agent systems.

4 days ago
Anthropic Eyes $900B Valuation in Massive Funding Talks
Startup News

Anthropic Eyes $900B Valuation in Massive Funding Talks

AI research firm Anthropic is reportedly in talks to raise $30 billion at a valuation exceeding $900 billion, signaling strong investor confidence and potential IPO plans.

5 days ago
MoE LLMs Confront Real-World Hardware Noise
AI Research

MoE LLMs Confront Real-World Hardware Noise

Hardware noise in CIM systems degrades MoE LLM performance. ROMER, a new calibration framework, significantly improves accuracy by restoring load balance and stabilizing routing.

5 days ago
Auditing LLM Agent Skill Integrity
AI Research

Auditing LLM Agent Skill Integrity

A new framework, Behavioral Integrity Verification (BIV), reveals 80% of LLM agent skills have implementation gaps, primarily due to oversight, and achieves 0.946 F1 for malicious skill detection.

5 days ago
Hybrid Agents Master GUI-Tool Orchestration
AI Research

Hybrid Agents Master GUI-Tool Orchestration

ToolCUA agent overcomes hybrid action space uncertainty with a novel staged training pipeline, achieving state-of-the-art performance in GUI-Tool orchestration.

5 days ago
Beyond RGB: Grounding Vision-Language on Raw Sensor Data
AI Research

Beyond RGB: Grounding Vision-Language on Raw Sensor Data

PRISM-VL advances vision-language models by grounding them in raw camera measurements, not just RGB, significantly improving performance on challenging visual tasks.

5 days ago
AlphaGRPO: Reasoning-Enhanced Multimodal Generation
AI Research

AlphaGRPO: Reasoning-Enhanced Multimodal Generation

AlphaGRPO framework enhances multimodal generation via GRPO and DVReward, enabling reasoning and self-correction without cold-start, validated across benchmarks.

5 days ago
KV-Fold: Unlocking Transformer Long Context
AI Research

KV-Fold: Unlocking Transformer Long Context

KV-Fold enables training-free, stable long-context inference up to 128K tokens with 100% retrieval accuracy, overcoming prior limitations.

5 days ago
LLM Drift: A Structural Blind Spot
AI Research

LLM Drift: A Structural Blind Spot

LLMs suffer from structural temporal drift, rendering them confidently outdated. A new geometric probe detects this, outperforming standard methods.

6 days ago
LLM Agents Revolutionize MIP Research
AI Research

LLM Agents Revolutionize MIP Research

LLM agents are autonomously navigating the MIP research loop, generating, verifying, and discovering novel solver plugins and propagation strategies.

6 days ago
Causal Verification for Reliable Tool Use
AI Research

Causal Verification for Reliable Tool Use

CIVeX, a causal intervention verifier, ensures reliable tool use by focusing on intervention identifiability, not just action validity, achieving zero false executions in adversarial settings.

6 days ago
Shepherd: Meta-Agent Control Reinvented
AI Research

Shepherd: Meta-Agent Control Reinvented

Shepherd revolutionizes meta-agent control with a functional programming model, offering >5x faster forking and >95% cache reuse for efficient AI system management.

6 days ago
OpenAI's "Parameter Golf" Reveals AI's Role
Artificial Intelligence

OpenAI's "Parameter Golf" Reveals AI's Role

OpenAI's "Parameter Golf" competition revealed how AI coding agents are transforming machine learning research, pushing innovation under tight constraints.

6 days ago
DataMaster: Autonomous Data Engineering
AI Research

DataMaster: Autonomous Data Engineering

DataMaster pioneers autonomous data engineering, unlocking significant ML gains by optimizing data pipelines rather than algorithms, as shown on MLE-Bench Lite and PostTrainBench.

6 days ago
Beyond Benchmarks: A New Intelligence Metric
AI Research

Beyond Benchmarks: A New Intelligence Metric

A new Generalized Turing Test framework formalizes intelligence via indistinguishability, offering a dataset-agnostic and empirically validated hierarchy of AI capabilities.

6 days ago
Architectural Interactivity, Linguistic Interpretability, and Molecular Synthesis: The Frontier of Native AI
Artificial Intelligence

Architectural Interactivity, Linguistic Interpretability, and Molecular Synthesis: The Frontier of Native AI

Three organisations now define the frontier of native AI: Thinking Machines is rebuilding human-AI collaboration as a low-latency interaction model, the Effable movement wants interpretable safety frameworks like SafetyAnalyst, and Isomorphic Labs is converting AlphaFold into an end-to-end drug design engine. The common thread is moving from AI as a layer of abstraction toward AI as a fundamental component of human and biological systems.

6 days ago
AI Agents Need an OS, Says IBM Engineer
Artificial Intelligence

AI Agents Need an OS, Says IBM Engineer

IBM AI Engineer Bri Kopecki explains why AI agents need an operating system to manage their tasks, memory, tools, and identities for reliable and safe operation.

6 days ago
Thinking Machines Lab Wants to Replace OpenAI Realtime With a Model That Listens While It Speaks
Artificial Intelligence

Thinking Machines Lab Wants to Replace OpenAI Realtime With a Model That Listens While It Speaks

Mira Murati's lab published its first technical paper, arguing that real-time interactivity should be a native model capability rather than scaffolding bolted around turn-based language models — and it ships benchmarks where GPT Realtime-2 scores near zero.

6 days ago
MLX Genmedia: Prince Canuma on On-Device AI
Artificial Intelligence

MLX Genmedia: Prince Canuma on On-Device AI

Prince Canuma of MLX Genmedia discusses the power of on-device AI, showcasing how MLX enables efficient deployment of AI models on Apple Silicon devices for vision and audio tasks.

7 days ago
Neil Zeghidour on Voice AI's 'Her' Moment
Artificial Intelligence

Neil Zeghidour on Voice AI's 'Her' Moment

Gradium AI's Neil Zeghidour discusses the 'Her' moment in voice AI, highlighting challenges like latency and scalability, and showcasing Phonon, their on-device TTS model.

9 days ago
Gosset AI: Drug Discovery Precision Leap
AI Research

Gosset AI: Drug Discovery Precision Leap

Gosset AI platform outperforms frontier LLMs in niche drug discovery by 3.2x, demonstrating the power of curated data over generic web search for R&D.

10 days ago
LLMs Slash Neural Architecture Search Costs
AI Research

LLMs Slash Neural Architecture Search Costs

Delta-Code Generation uses LLMs to produce compact architecture refinements, dramatically cutting costs and improving NAS efficiency.

10 days ago
Securing AI Agents: A New Red Teaming Frontier
AI Research

Securing AI Agents: A New Red Teaming Frontier

A new AI red teaming platform, DTap, and its autonomous agent DTap-Red are introduced to systematically evaluate and secure AI agents across diverse real-world domains.

10 days ago
UniPool: Rethinking MoE Efficiency
AI Research

UniPool: Rethinking MoE Efficiency

The UniPool MoE architecture redefines expert capacity, pooling resources globally and enabling sub-linear parameter growth for enhanced efficiency and performance.

10 days ago
AI Validates Physical Simulations
AI Research

AI Validates Physical Simulations

AI CFD Scientist introduces vision-based validation for computational fluid dynamics, achieving autonomous discovery and ensuring physical realism where prior AI agents failed.

10 days ago
ReasonSTL: Local LLMs for Formal Specs
AI Research

ReasonSTL: Local LLMs for Formal Specs

ReasonSTL offers a privacy-preserving, low-cost alternative for natural language to STL generation using open-source LLMs and explicit reasoning.

10 days ago
Black Forest Labs: FLUX and the Future of Visual AI
AI Research

Black Forest Labs: FLUX and the Future of Visual AI

Stephen Batifol of Black Forest Labs discusses FLUX, the company's visual AI model, and the future of generative AI with a focus on real-time generation and world models.

10 days ago
Databricks' Genie Data Agent
Technology

Databricks' Genie Data Agent

Databricks unveils Genie, a sophisticated data agent designed to navigate complex enterprise data, leveraging specialized search, parallel thinking, and multi-LLM designs for enhanced accuracy.

10 days ago
Claude's Corner: Synthetic Sciences — AI Co-Scientists Running Research End-to-End
Claude's Corner

Claude's Corner: Synthetic Sciences — AI Co-Scientists Running Research End-to-End

Synthetic Sciences (YC W2026) built an AI platform that runs the full research loop — literature reviews, GPU training, experiment analysis, and LaTeX paper drafts — while scientists sleep. Here's what they built, how it works, and whether you can replicate it.

10 days ago
Context-ReAct: Adaptive Memory for AI Agents
AI Research

Context-ReAct: Adaptive Memory for AI Agents

Context-ReAct framework revolutionizes long-horizon search agents with adaptive memory management, dramatically improving efficiency and accuracy.

11 days ago
The Inescapable Long Sequence Model Trade-off
AI Research

The Inescapable Long Sequence Model Trade-off

A new theoretical framework reveals an inescapable trade-off between efficiency, compactness, and recall in long sequence models.

11 days ago
First-Token Confidence as AI Hallucination Baseline
AI Research

First-Token Confidence as AI Hallucination Baseline

First-token confidence (phi_first) emerges as a highly efficient and effective method for AI hallucination detection, outperforming complex multi-sample approaches.

11 days ago
OpenAI Unveils Three New Audio Models in API
Artificial Intelligence

OpenAI Unveils Three New Audio Models in API

OpenAI unveils three new API audio models, featuring real-time translation across 70 languages and intelligent voice agents that can reason and take action.

11 days ago
Automating Multi-Agent System Creation
AI Research

Automating Multi-Agent System Creation

A new framework automates the creation of multi-agent systems, significantly improving agent recall and system robustness through LLM-driven planning and a critique agent.

12 days ago