AI Research
50 articles in this category
AI Tackles Clinical Research Autonomy
The Medical AI Scientist framework enables autonomous, clinically grounded research, outperforming commercial LLMs in ideation and manuscript quality.
Drifting Models Revolutionize MRI-to-CT Synthesis
Drifting models outperform diffusion and traditional methods in MRI-to-CT synthesis, offering millisecond inference for efficient, high-quality pelvic imaging.
EdgeDiT: Transformers on the Edge
EdgeDiT brings high-fidelity generative AI to mobile devices by optimizing Diffusion Transformers for NPUs, achieving significant efficiency gains.

Meta-Harness: AI Optimizes AI Development
Researchers unveil Meta-Harness, a novel AI system that automates harness optimization, leading to faster and more capable LLMs.
Personalized Driving with Vega
The Vega vision-language-action model enhances autonomous driving by enabling personalized, instruction-based navigation through a novel dataset and hybrid AI architecture.
WriteBack-RAG: Trainable Knowledge for RAG
WriteBack-RAG enables trainable RAG knowledge bases by distilling relevant facts into the corpus, boosting performance universally across RAG systems.
Externalizing Agent Harnesses with Language
Researchers introduce Natural-Language Agent Harnesses (NLAHs) and an Intelligent Harness Runtime (IHR) to externalize agent control logic, enabling greater transferability and scientific study.

François Chollet on ARC-AGI-3: The Future of AI Reasoning
François Chollet discusses ARC-AGI-3, a new benchmark for AI reasoning, highlighting current AI's limitations and the path toward general intelligence.

AI in Science: Faster Discovery, New Insights
AI is revolutionizing scientific research, from data analysis to hypothesis generation. Experts discuss how AI tools like LLMs are accelerating discovery while highlighting the continued importance of human expertise.

Microsoft's AsgardBench Tests AI's Planning Skills
Microsoft's AsgardBench benchmark tests AI agents' ability to adapt plans using real-time visual feedback, revealing current limitations in perception and state tracking.

Google Researchers Explore AI Storage Efficiency
Google researchers are developing AI compression techniques to reduce model storage needs by sixfold, aiming to lower costs and boost efficiency in AI development.

Robots Get Better at Long-Term Planning
Microsoft's GroundedPlanBench and V2GP framework improve robot planning by jointly considering actions and locations, overcoming limitations of decoupled approaches.

Google's Gemini 3.1 Flash Live Ups Audio AI
Google's Gemini 3.1 Flash Live audio AI model enhances naturalness, reliability, and speed for voice interactions.

DeepMind Tackles AI Manipulation
Google DeepMind unveils a new toolkit and research to measure AI's capacity for harmful manipulation, aiming to bolster safety and protect users.
Medical VLMs Fail Critical Input Sanity Checks
Medical VLMs fail critical input validation tests, as revealed by the new MedObvious benchmark, highlighting a significant safety risk.
Bridging the AI Code Quality Gap
A new benchmark, c-CRAB, reveals current AI code review agents only solve ~40% of tasks, highlighting gaps and potential for human-AI collaboration in code quality assurance.
Mecha-Nudges: Optimizing AI Decision Environments
Introducing 'mecha-nudges': a novel framework to systematically influence AI agents by optimizing choice presentation, with evidence found in Etsy's marketplace.

Anthropic Sues Pentagon Over AI Ban
AI safety firm Anthropic sues the Pentagon over a national security ban, seeking to overturn the decision and protect its AI technology.

Google's Lyria 3 Pro powers longer AI music
Google's Lyria 3 Pro AI model can now generate longer, structurally aware music tracks, integrated across Google products.

Jason Wolfe on OpenAI Model Specs & Behavior
Jason Wolfe from OpenAI discusses the concept of 'model specs' and their importance in guiding AI behavior, transparency, and the ongoing pursuit of safe and beneficial AI.
UniMotion: Unifying Motion, Vision, and Language
UniMotion establishes a unified framework for continuous motion, vision, and text, overcoming discrete tokenization limits and achieving SOTA cross-modal performance.
Bridging Dense Dynamics and Semantic Reasoning
A new VLM-guided JEPA latent world modeling framework fuses dense motion dynamics with semantic reasoning for robust long-horizon forecasting.
DoRA Efficiency Breakthrough
New factored norm and fused kernels unlock DoRA's potential, delivering 1.5-2x speedups and significant VRAM reduction.

Andrej Karpathy's Auto-Research: AI Self-Improvement
Andrej Karpathy unveils 'auto-research,' an open-source AI project demonstrating autonomous model self-improvement and accelerated AI development.

AI Brains vs. Human Minds
Exploring the fundamental differences between transformer AI models and the human brain's continuous learning and sensory grounding.
Asynchronous MAPF Solved: Completeness Guaranteed
A new CBS-AA algorithm guarantees completeness and optimality for asynchronous multi-agent pathfinding, overcoming prior theoretical limitations and enhancing scalability.
MAPF-AA Solved: Completeness Guaranteed
New CBS-AA algorithm achieves guaranteed completeness and optimality for asynchronous multi-agent pathfinding (MAPF-AA), overcoming prior theoretical hurdles and improving scalability.
Perceptio: Spatial Grounding for LVLMs
Perceptio LVLM introduces explicit 2D/3D spatial tokens into autoregressive sequences, achieving SOTA in spatial grounding and understanding tasks.
Perceptio: Spatial Grounding for LVLMs
Perceptio LVLM integrates explicit spatial tokens (segmentation, depth) to overcome LVLM limitations in fine-grained visual grounding, achieving SOTA across benchmarks.
Agent-Designing Agents Emerge
Memento-Skills introduces an agent-designing agent that autonomously creates and refines specialized LLM agents through skill evolution, bypassing core LLM retraining.
F2LLM-v2: Multilingual Embeddings Unleashed
F2LLM-v2 offers a new family of highly efficient, multilingual embedding models supporting over 200 languages, setting SOTA on 11 MTEB benchmarks.
F2LLM-v2: Multilingual Embeddings at Scale
F2LLM-v2 launches a family of efficient, multilingual embedding models, setting new SOTA on MTEB benchmarks and championing low-resource languages.
OS-Themis: Scalable Rewards for GUI Agents
OS-Themis revolutionizes GUI agent training with a scalable, milestone-based critic framework and OGRBench, achieving significant performance uplifts.
OS-Themis: Scalable Rewards for Robust RL
OS-Themis, a new multi-agent critic framework, revolutionizes GUI agent training by providing scalable, accurate rewards through milestone decomposition and evidence auditing.
3D Grounding for Vision-Language Models
Loc3R-VLM enhances 2D VLMs with 3D spatial reasoning from monocular video, achieving SOTA in language-based localization and 3D QA.
3D Spatial Reasoning for VLM
Loc3R-VLM injects 3D spatial reasoning into 2D VLMs using monocular video, achieving SOTA in localization and 3D QA.
AgentFactory: Executable Code for LLM Agents
AgentFactory revolutionizes LLM agent self-evolution by creating executable Python subagents, fostering continuous learning and reducing task execution effort.
VideoAtlas: Unlocking Long-Context Video AI
VideoAtlas AI offers a lossless, hierarchical grid representation and Video-RLM for scalable, robust long-context video understanding with logarithmic compute growth.
Unlocking Parallelism in Sequential AI
Breakthrough parallel Newton methods and theoretical insights enable stable, scalable acceleration of sequential AI computations.
Parallelizing Sequential Dynamics
New parallel Newton methods overcome sequential bottlenecks in dynamical systems, offering stability and provable acceleration guarantees.
InCoder-32B: Bridging AI to Industrial Code
InCoder-32B, a 32B-parameter model, advances AI in industrial code by mastering hardware semantics and resource constraints through novel training.
InCoder-32B: Bridging General and Industrial Code AI
InCoder-32B, a 32B-parameter model, bridges the gap in code LLMs for industrial applications by unifying chip design, embedded systems, and more with a novel 128K context training.
Gaussian Masking Unlocks Deeper EEG Insights
Novel Gaussian masking and SpecMoE architecture enhance foundation models for robust, generalized EEG decoding across diverse tasks and species.

DeepMind's AGI Roadmap
Google DeepMind unveils a cognitive framework and Kaggle hackathon to standardize AGI progress measurement, offering $200K in prizes.
MoDA: Unlocking LLM Depth Scaling
Mixture-of-Depths Attention (MoDA) tackles LLM signal degradation by enabling cross-layer attention, boosting performance with minimal overhead.
PRIMO R1: Active Critics for Robotic Manipulation
PRIMO R1 transforms video MLLMs into active critics for robotic manipulation via outcome-based RL, achieving SOTA on RoboFail and outperforming larger models.
OpenSearch Democratizes Frontier LLM Search
OpenSeeker, a fully open-source search agent, breaks LLM search data scarcity with novel synthesis techniques, achieving state-of-the-art performance.
Pretraining's Hidden Experts Revealed
Large pretrained models contain dense task-specific experts, unlockable via simple random sampling and ensembling, rivaling complex post-training AI model optimization.
Pretraining's Hidden Experts: A New Post-Training Paradigm
Large pretrained models are dense with task-experts, enabling simple random sampling and ensembling to rival complex post-training AI optimization methods.
MLIPs: From Blind Screening to Certified Discovery
New framework Proof-Carrying Materials (PCM) ensures reliability for machine-learned interatomic potentials (MLIPs), dramatically improving materials discovery.