AI Research

50 articles in this category

Compute Once: Unlocking AI Agent Efficiency

Compute Once: Unlocking AI Agent Efficiency

A radical proposal to precompute LLM KV caches, slashing inference costs by up to 50x and enabling a new compute-efficient AI agent paradigm.

2 days ago
HYDRA-X: Unifying Image & Video Tokenization

HYDRA-X: Unifying Image & Video Tokenization

HYDRA-X, a novel Vision Transformer-based UMM, unifies image and video tokenization, enhancing editing consistency and performance through causal attention and latent-level manipulation.

2 days ago
Humanoids Learn Self-Other Distinction

Humanoids Learn Self-Other Distinction

Humanoid robots now learn self-other distinction and build predictive self-models from sensory data, enabling better collaboration and task performance in human-robot environments.

2 days ago
AI spots new LOTUSLITE variant

AI spots new LOTUSLITE variant

Microsoft's AI agent 'Ire' has identified a new LOTUSLITE malware variant missed by traditional security tools, showcasing AI's prowess in behavioral analysis.

3 days ago
Unlocking Ultra-Long Context for LLMs

Unlocking Ultra-Long Context for LLMs

MiniMax Sparse Attention breaks the context window barrier for LLMs, enabling millions of tokens with significant compute reduction and practical speedups.

3 days ago
Mana Reimagines Dexterous Robotics

Mana Reimagines Dexterous Robotics

Mana framework reinterprets dexterous robotics as animation, achieving zero-shot sim-to-real transfer for articulated tool manipulation.

3 days ago
From LLM Agents to Scientific Knowledge Graphs

From LLM Agents to Scientific Knowledge Graphs

Agents-K1 revolutionizes LLM research agents by creating agent-native scientific knowledge graphs from full papers, enabling deeper scientific reasoning.

3 days ago
5 AI Research Papers Shaping AI's Future

5 AI Research Papers Shaping AI's Future

Discover five key AI research papers that reveal the current trajectory and future directions of artificial intelligence development.

3 days ago
Rethinking VLM Token Reduction

Rethinking VLM Token Reduction

Reroute transforms VLM token reduction from irreversible pruning to recoverable routing, improving grounding performance without sacrificing efficiency.

4 days ago
Automating Scientific Discovery

Automating Scientific Discovery

ATLAS, an active learning framework, automates the discovery of interpretable mechanistic models, achieving 5-10x sample efficiency gains.

4 days ago
VLA Models Unlock Decentralized Multi-Robot Teams

VLA Models Unlock Decentralized Multi-Robot Teams

CHORUS leverages pretrained VLA models for decentralized multi-robot collaboration, achieving significant performance gains without inference-time communication.

4 days ago
Codex Aids Black Hole Simulation Breakthrough

Codex Aids Black Hole Simulation Breakthrough

This video explores how the AI model Codex is revolutionizing the creation of black hole simulations, making previously intractable problems computationally feasible and accelerating astrophysical research.

4 days ago
DeepMind's Kilpatrick on AI Models Eating Harnesses

DeepMind's Kilpatrick on AI Models Eating Harnesses

Google DeepMind's Logan Kilpatrick delves into the AI concept of models "eating the harness," explaining how over-specialization hinders generalization and what can be done to prevent it.

5 days ago
Causal Inference's Counterfactual Blind Spot

Causal Inference's Counterfactual Blind Spot

Predictive AI models fail on counterfactual couplings. A new world model using semidefinite kernels offers a solution for robust causal inference.

5 days ago
Steering LRMs Beyond Output Degradation

Steering LRMs Beyond Output Degradation

A new probe-based method, FPCG, distinguishes prediction from detection features to enable precise large reasoning models steering with minimal output quality degradation.

5 days ago
LLMs Accelerate FPGA Design

LLMs Accelerate FPGA Design

LLMs are now automating complex FPGA accelerator design, reducing time and expertise needed for efficient AI hardware deployment.

5 days ago
DiffusionGemma: Google's AI is 4x Faster

DiffusionGemma: Google's AI is 4x Faster

Google DeepMind's DiffusionGemma model offers up to 4x faster text generation, enabling new real-time AI applications.

5 days ago
Google DeepMind Discusses Open Models & AI Ownership

Google DeepMind Discusses Open Models & AI Ownership

Google DeepMind's Gus Martins and Ian Ballantyne discuss the benefits of open AI models like Gemma for ownership, control, and custom applications.

5 days ago
Topology-Aware Operator Learning

Topology-Aware Operator Learning

Topological Neural Operators (TNOs) provide a unified framework for operator learning on cell complexes, improving PDE benchmark accuracy by integrating topological structures.

6 days ago
Personalized AI Agents Now Have a Benchmark

Personalized AI Agents Now Have a Benchmark

A new iOSWorld benchmark reveals AI agents' struggles with personalized, multi-app tasks, highlighting the need for richer context and advanced reasoning capabilities.

6 days ago
Images as the New Reasoning Medium

Images as the New Reasoning Medium

This paper introduces optical reasoning, enabling images to serve as the primary medium for LLM and MLLM reasoning, achieving higher token efficiency and competitive performance.

6 days ago
Gemini's Audio Stack: From Transcription to Music Generation

Gemini's Audio Stack: From Transcription to Music Generation

Google DeepMind's Thor Schaeff explores Gemini's audio stack, from advanced transcription to music generation with Lyria 3.

6 days ago
Google Rolls Out Gemini 3.5 Live Translate

Google Rolls Out Gemini 3.5 Live Translate

Google's new Gemini 3.5 Live Translate offers real-time speech-to-speech translation across 70+ languages, enhancing Google Translate and Meet.

6 days ago
Google's Gemma 4 12B: AI on Your Laptop

Google's Gemma 4 12B: AI on Your Laptop

Google's Gemma 4 12B model brings efficient, multimodal AI directly to laptops with a novel unified architecture.

6 days ago
Google DeepMind Fuels European Robotics Startups

Google DeepMind Fuels European Robotics Startups

Google DeepMind's new accelerator program supports 15 European robotics startups, providing AI expertise and mentorship to build the future of physical AI.

6 days ago
Gemini AI Boosts Math Skills in Sierra Leone Trial

Gemini AI Boosts Math Skills in Sierra Leone Trial

Gemini's Guided Learning AI significantly boosted math scores in an 8-week Sierra Leone trial, demonstrating AI's potential as a teacher's assistant without replacing educators.

7 days ago
Together AI Pushes LLM Context Limits to 5 Million Tokens

Together AI Pushes LLM Context Limits to 5 Million Tokens

Max Ryabinin from Together AI discusses breaking barriers in LLM training, detailing techniques to achieve 5 million token context lengths and their impact on memory and performance.

7 days ago
HANDOFF: Bridging AI Planning and Robot Control

HANDOFF: Bridging AI Planning and Robot Control

HANDOFF revolutionizes the humanoid robot command space, enabling intuitive task planning and robust real-world manipulation through a distilled, multi-expert controller.

10 days ago
Code2LoRA: Repository Context without Overhead

Code2LoRA: Repository Context without Overhead

Code2LoRA generates dynamic LoRA adapters for code LLMs, offering repository context without inference overhead and adapting to evolving codebases.

10 days ago
NF-CoT: High-Bandwidth Latent Reasoning

NF-CoT: High-Bandwidth Latent Reasoning

NF-CoT framework enables high-bandwidth latent reasoning using normalizing flows, boosting LLM performance and efficiency while preserving autoregressive strengths.

10 days ago
PyannoteAI's Bredin on Building Conversational Voice AI

PyannoteAI's Bredin on Building Conversational Voice AI

Hervé Bredin of pyannoteAI discusses the crucial role of speaker diarization in building voice AI that understands conversations, showcasing open-source tools and future advancements.

11 days ago
Bengio: We're Building AI We Can't Control

Bengio: We're Building AI We Can't Control

AI pioneer Yoshua Bengio warns that we are building increasingly powerful AI systems without fully understanding or controlling them, raising concerns about potential risks and the need for global safety standards.

11 days ago
Unifying Audio: The Rise of the Real-Time LALM

Unifying Audio: The Rise of the Real-Time LALM

Researchers unveil the Audio Interaction Model, a unified real-time LALM with the SoundFlow framework, enabling proactive audio understanding and response.

11 days ago
Unpacking Multi-Modal Memory Bottlenecks

Unpacking Multi-Modal Memory Bottlenecks

A new benchmark, M³Eval, reveals critical memory deficiencies in multi-modal models, particularly in disentangled representations, interference patterns, and temporal grounding.

11 days ago
LifeSkill: LLM Agents Learn Continuously

LifeSkill: LLM Agents Learn Continuously

LifeSkill framework enables LLM agents to continuously learn from test-time feedback, significantly improving performance on long-horizon tasks by internalizing skills.

11 days ago
Anthropic Ethicist on AI Consciousness

Anthropic Ethicist on AI Consciousness

Anthropic's ethicist discusses the challenges and approaches to instilling human values and ethical behavior in AI models.

11 days ago
Brendon Dillon on Text Diffusion at Google DeepMind

Brendon Dillon on Text Diffusion at Google DeepMind

Brendon Dillon from Google DeepMind discusses the advancements and potential of text diffusion models in language generation, highlighting advantages over autoregressive models.

11 days ago
AI Cracks 80-Year Math Problem on OpenAI Podcast

AI Cracks 80-Year Math Problem on OpenAI Podcast

OpenAI researchers discuss how their AI model solved an 80-year-old math problem, highlighting AI's growing reasoning capabilities.

11 days ago
Benchmarking AI Agents: Snorkel AI's Vincent Chen Explains

Benchmarking AI Agents: Snorkel AI's Vincent Chen Explains

Vincent Chen from Snorkel AI explores the art and science of benchmarking AI agents, detailing the complexities and methodologies involved in evaluation.

11 days ago
Evaluating Coding Agents: Lessons from SWE-rebench

Evaluating Coding Agents: Lessons from SWE-rebench

Ibragim Badertdinov from Nebius shares key lessons from evaluating coding agents using the SWE-rebench benchmark, highlighting the importance of real-world tasks, reliable verification, and cost-effectiveness.

11 days ago
Beyond Observable Data: Imaginative Perception for VLMs

Beyond Observable Data: Imaginative Perception for VLMs

Researchers introduce Imaginative Perception Tokens (IPTs) to enable VLMs to reason about unobserved spatial configurations, outperforming textual chain-of-thought.

12 days ago
AI Agents Automate Drone Navigation Rewards

AI Agents Automate Drone Navigation Rewards

AgenticRL framework uses AI agents to autonomously design rewards and refine policies for UAV navigation, achieving 91% real-world success.

12 days ago
AI Analysts Lag on Real-World Reasoning

AI Analysts Lag on Real-World Reasoning

New Hedge-Bench 1.0 benchmark reveals frontier AI models score under 16% on real-world financial reasoning tasks, exposing a critical gap in expert-level judgment.

12 days ago
Claude Code Benchmarking: Semantic Search vs. Grep

Claude Code Benchmarking: Semantic Search vs. Grep

Turbopuffer's Kuba Rogut benchmarks semantic code retrieval on Claude Code, revealing how semantic search enhances AI agent precision and efficiency compared to grep.

12 days ago
AdaCodec: Efficient Video MLLM Encoding

AdaCodec: Efficient Video MLLM Encoding

AdaCodec revolutionizes video MLLMs by using predictive visual coding to drastically cut tokenization costs and latency, achieving superior performance at a fraction of the budget.

13 days ago
ClinEnv: Bridging LLM Gaps in Clinical Decision-Making

ClinEnv: Bridging LLM Gaps in Clinical Decision-Making

The ClinEnv benchmark reveals LLMs struggle with sequential medical decision-making, showing a gap between diagnostic and management capabilities.

13 days ago
Bridging Diffusion LLMs and Speculative Decoding

Bridging Diffusion LLMs and Speculative Decoding

A novel SimSD speculative decoding method enables diffusion LLMs to achieve up to 7.46x higher throughput without sacrificing generation quality.

13 days ago
Task Fidelity Scaling Laws: Kobie Crawford on AI Data Quality

Task Fidelity Scaling Laws: Kobie Crawford on AI Data Quality

Kobie Crawford of Snorkel discusses 'Task Fidelity Scaling Laws,' emphasizing how data quality impacts AI model performance and outlining Snorkel's approach to creating verifiable datasets.

13 days ago
Bertrand Charpentier on AI Benchmarking Challenges

Bertrand Charpentier on AI Benchmarking Challenges

Bertrand Charpentier of Pruna AI discusses the challenges in AI benchmarking, the limitations of public leaderboards, and the importance of considering both quality and efficiency.

14 days ago
xAI's Ethan He on Grok, Video Agents & AI Futures

xAI's Ethan He on Grok, Video Agents & AI Futures

xAI's Ethan He discusses how language models drive visual AI, the rapid development of Grok Imagine, and the future of AI-generated interfaces.

14 days ago