AI Research

50 articles in this category

AI Tackles Clinical Research Autonomy

AI Tackles Clinical Research Autonomy

The Medical AI Scientist framework enables autonomous, clinically grounded research, outperforming commercial LLMs in ideation and manuscript quality.

about 2 hours ago
Drifting Models Revolutionize MRI-to-CT Synthesis

Drifting Models Revolutionize MRI-to-CT Synthesis

Drifting models outperform diffusion and traditional methods in MRI-to-CT synthesis, offering millisecond inference for efficient, high-quality pelvic imaging.

about 2 hours ago
EdgeDiT: Transformers on the Edge

EdgeDiT: Transformers on the Edge

EdgeDiT brings high-fidelity generative AI to mobile devices by optimizing Diffusion Transformers for NPUs, achieving significant efficiency gains.

about 2 hours ago
Meta-Harness: AI Optimizes AI Development

Meta-Harness: AI Optimizes AI Development

Researchers unveil Meta-Harness, a novel AI system that automates harness optimization, leading to faster and more capable LLMs.

about 19 hours ago
Personalized Driving with Vega

Personalized Driving with Vega

The Vega vision-language-action model enhances autonomous driving by enabling personalized, instruction-based navigation through a novel dataset and hybrid AI architecture.

4 days ago
WriteBack-RAG: Trainable Knowledge for RAG

WriteBack-RAG: Trainable Knowledge for RAG

WriteBack-RAG enables trainable RAG knowledge bases by distilling relevant facts into the corpus, boosting performance universally across RAG systems.

4 days ago
Externalizing Agent Harnesses with Language

Externalizing Agent Harnesses with Language

Researchers introduce Natural-Language Agent Harnesses (NLAHs) and an Intelligent Harness Runtime (IHR) to externalize agent control logic, enabling greater transferability and scientific study.

4 days ago
François Chollet on ARC-AGI-3: The Future of AI Reasoning

François Chollet on ARC-AGI-3: The Future of AI Reasoning

François Chollet discusses ARC-AGI-3, a new benchmark for AI reasoning, highlighting current AI's limitations and the path toward general intelligence.

4 days ago
AI in Science: Faster Discovery, New Insights

AI in Science: Faster Discovery, New Insights

AI is revolutionizing scientific research, from data analysis to hypothesis generation. Experts discuss how AI tools like LLMs are accelerating discovery while highlighting the continued importance of human expertise.

4 days ago
Microsoft's AsgardBench Tests AI's Planning Skills

Microsoft's AsgardBench Tests AI's Planning Skills

Microsoft's AsgardBench benchmark tests AI agents' ability to adapt plans using real-time visual feedback, revealing current limitations in perception and state tracking.

5 days ago
Google Researchers Explore AI Storage Efficiency

Google Researchers Explore AI Storage Efficiency

Google researchers are developing AI compression techniques to reduce model storage needs by sixfold, aiming to lower costs and boost efficiency in AI development.

5 days ago
Robots Get Better at Long-Term Planning

Robots Get Better at Long-Term Planning

Microsoft's GroundedPlanBench and V2GP framework improve robot planning by jointly considering actions and locations, overcoming limitations of decoupled approaches.

5 days ago
Google's Gemini 3.1 Flash Live Ups Audio AI

Google's Gemini 3.1 Flash Live Ups Audio AI

Google's Gemini 3.1 Flash Live audio AI model enhances naturalness, reliability, and speed for voice interactions.

5 days ago
DeepMind Tackles AI Manipulation

DeepMind Tackles AI Manipulation

Google DeepMind unveils a new toolkit and research to measure AI's capacity for harmful manipulation, aiming to bolster safety and protect users.

5 days ago
Medical VLMs Fail Critical Input Sanity Checks

Medical VLMs Fail Critical Input Sanity Checks

Medical VLMs fail critical input validation tests, as revealed by the new MedObvious benchmark, highlighting a significant safety risk.

6 days ago
Bridging the AI Code Quality Gap

Bridging the AI Code Quality Gap

A new benchmark, c-CRAB, reveals current AI code review agents only solve ~40% of tasks, highlighting gaps and potential for human-AI collaboration in code quality assurance.

6 days ago
Mecha-Nudges: Optimizing AI Decision Environments

Mecha-Nudges: Optimizing AI Decision Environments

Introducing 'mecha-nudges': a novel framework to systematically influence AI agents by optimizing choice presentation, with evidence found in Etsy's marketplace.

6 days ago
Anthropic Sues Pentagon Over AI Ban

Anthropic Sues Pentagon Over AI Ban

AI safety firm Anthropic sues the Pentagon over a national security ban, seeking to overturn the decision and protect its AI technology.

6 days ago
Google's Lyria 3 Pro powers longer AI music

Google's Lyria 3 Pro powers longer AI music

Google's Lyria 3 Pro AI model can now generate longer, structurally aware music tracks, integrated across Google products.

6 days ago
Jason Wolfe on OpenAI Model Specs & Behavior

Jason Wolfe on OpenAI Model Specs & Behavior

Jason Wolfe from OpenAI discusses the concept of 'model specs' and their importance in guiding AI behavior, transparency, and the ongoing pursuit of safe and beneficial AI.

6 days ago
UniMotion: Unifying Motion, Vision, and Language

UniMotion: Unifying Motion, Vision, and Language

UniMotion establishes a unified framework for continuous motion, vision, and text, overcoming discrete tokenization limits and achieving SOTA cross-modal performance.

7 days ago
Bridging Dense Dynamics and Semantic Reasoning

Bridging Dense Dynamics and Semantic Reasoning

A new VLM-guided JEPA latent world modeling framework fuses dense motion dynamics with semantic reasoning for robust long-horizon forecasting.

7 days ago
DoRA Efficiency Breakthrough

DoRA Efficiency Breakthrough

New factored norm and fused kernels unlock DoRA's potential, delivering 1.5-2x speedups and significant VRAM reduction.

7 days ago
Andrej Karpathy's Auto-Research: AI Self-Improvement

Andrej Karpathy's Auto-Research: AI Self-Improvement

Andrej Karpathy unveils 'auto-research,' an open-source AI project demonstrating autonomous model self-improvement and accelerated AI development.

7 days ago
AI Brains vs. Human Minds

AI Brains vs. Human Minds

Exploring the fundamental differences between transformer AI models and the human brain's continuous learning and sensory grounding.

8 days ago
Asynchronous MAPF Solved: Completeness Guaranteed

Asynchronous MAPF Solved: Completeness Guaranteed

A new CBS-AA algorithm guarantees completeness and optimality for asynchronous multi-agent pathfinding, overcoming prior theoretical limitations and enhancing scalability.

10 days ago
MAPF-AA Solved: Completeness Guaranteed

MAPF-AA Solved: Completeness Guaranteed

New CBS-AA algorithm achieves guaranteed completeness and optimality for asynchronous multi-agent pathfinding (MAPF-AA), overcoming prior theoretical hurdles and improving scalability.

10 days ago
Perceptio: Spatial Grounding for LVLMs

Perceptio: Spatial Grounding for LVLMs

Perceptio LVLM introduces explicit 2D/3D spatial tokens into autoregressive sequences, achieving SOTA in spatial grounding and understanding tasks.

10 days ago
Perceptio: Spatial Grounding for LVLMs

Perceptio: Spatial Grounding for LVLMs

Perceptio LVLM integrates explicit spatial tokens (segmentation, depth) to overcome LVLM limitations in fine-grained visual grounding, achieving SOTA across benchmarks.

10 days ago
Agent-Designing Agents Emerge

Agent-Designing Agents Emerge

Memento-Skills introduces an agent-designing agent that autonomously creates and refines specialized LLM agents through skill evolution, bypassing core LLM retraining.

10 days ago
F2LLM-v2: Multilingual Embeddings Unleashed

F2LLM-v2: Multilingual Embeddings Unleashed

F2LLM-v2 offers a new family of highly efficient, multilingual embedding models supporting over 200 languages, setting SOTA on 11 MTEB benchmarks.

11 days ago
F2LLM-v2: Multilingual Embeddings at Scale

F2LLM-v2: Multilingual Embeddings at Scale

F2LLM-v2 launches a family of efficient, multilingual embedding models, setting new SOTA on MTEB benchmarks and championing low-resource languages.

11 days ago
OS-Themis: Scalable Rewards for GUI Agents

OS-Themis: Scalable Rewards for GUI Agents

OS-Themis revolutionizes GUI agent training with a scalable, milestone-based critic framework and OGRBench, achieving significant performance uplifts.

11 days ago
OS-Themis: Scalable Rewards for Robust RL

OS-Themis: Scalable Rewards for Robust RL

OS-Themis, a new multi-agent critic framework, revolutionizes GUI agent training by providing scalable, accurate rewards through milestone decomposition and evidence auditing.

11 days ago
3D Grounding for Vision-Language Models

3D Grounding for Vision-Language Models

Loc3R-VLM enhances 2D VLMs with 3D spatial reasoning from monocular video, achieving SOTA in language-based localization and 3D QA.

12 days ago
3D Spatial Reasoning for VLM

3D Spatial Reasoning for VLM

Loc3R-VLM injects 3D spatial reasoning into 2D VLMs using monocular video, achieving SOTA in localization and 3D QA.

12 days ago
AgentFactory: Executable Code for LLM Agents

AgentFactory: Executable Code for LLM Agents

AgentFactory revolutionizes LLM agent self-evolution by creating executable Python subagents, fostering continuous learning and reducing task execution effort.

12 days ago
VideoAtlas: Unlocking Long-Context Video AI

VideoAtlas: Unlocking Long-Context Video AI

VideoAtlas AI offers a lossless, hierarchical grid representation and Video-RLM for scalable, robust long-context video understanding with logarithmic compute growth.

12 days ago
Unlocking Parallelism in Sequential AI

Unlocking Parallelism in Sequential AI

Breakthrough parallel Newton methods and theoretical insights enable stable, scalable acceleration of sequential AI computations.

13 days ago
Parallelizing Sequential Dynamics

Parallelizing Sequential Dynamics

New parallel Newton methods overcome sequential bottlenecks in dynamical systems, offering stability and provable acceleration guarantees.

13 days ago
InCoder-32B: Bridging AI to Industrial Code

InCoder-32B: Bridging AI to Industrial Code

InCoder-32B, a 32B-parameter model, advances AI in industrial code by mastering hardware semantics and resource constraints through novel training.

13 days ago
InCoder-32B: Bridging General and Industrial Code AI

InCoder-32B: Bridging General and Industrial Code AI

InCoder-32B, a 32B-parameter model, bridges the gap in code LLMs for industrial applications by unifying chip design, embedded systems, and more with a novel 128K context training.

13 days ago
Gaussian Masking Unlocks Deeper EEG Insights

Gaussian Masking Unlocks Deeper EEG Insights

Novel Gaussian masking and SpecMoE architecture enhance foundation models for robust, generalized EEG decoding across diverse tasks and species.

13 days ago
DeepMind's AGI Roadmap

DeepMind's AGI Roadmap

Google DeepMind unveils a cognitive framework and Kaggle hackathon to standardize AGI progress measurement, offering $200K in prizes.

14 days ago
MoDA: Unlocking LLM Depth Scaling

MoDA: Unlocking LLM Depth Scaling

Mixture-of-Depths Attention (MoDA) tackles LLM signal degradation by enabling cross-layer attention, boosting performance with minimal overhead.

14 days ago
PRIMO R1: Active Critics for Robotic Manipulation

PRIMO R1: Active Critics for Robotic Manipulation

PRIMO R1 transforms video MLLMs into active critics for robotic manipulation via outcome-based RL, achieving SOTA on RoboFail and outperforming larger models.

14 days ago
OpenSearch Democratizes Frontier LLM Search

OpenSearch Democratizes Frontier LLM Search

OpenSeeker, a fully open-source search agent, breaks LLM search data scarcity with novel synthesis techniques, achieving state-of-the-art performance.

14 days ago
Pretraining's Hidden Experts Revealed

Pretraining's Hidden Experts Revealed

Large pretrained models contain dense task-specific experts, unlockable via simple random sampling and ensembling, rivaling complex post-training AI model optimization.

18 days ago
Pretraining's Hidden Experts: A New Post-Training Paradigm

Pretraining's Hidden Experts: A New Post-Training Paradigm

Large pretrained models are dense with task-experts, enabling simple random sampling and ensembling to rival complex post-training AI optimization methods.

18 days ago
MLIPs: From Blind Screening to Certified Discovery

MLIPs: From Blind Screening to Certified Discovery

New framework Proof-Carrying Materials (PCM) ensures reliability for machine-learned interatomic potentials (MLIPs), dramatically improving materials discovery.

18 days ago