#Natural Language Processing
50 articles with this tag
Personalized Driving with Vega
The Vega vision-language-action model enhances autonomous driving by enabling personalized, instruction-based navigation through a novel dataset and hybrid AI architecture.
Externalizing Agent Harnesses with Language
Researchers introduce Natural-Language Agent Harnesses (NLAHs) and an Intelligent Harness Runtime (IHR) to externalize agent control logic, enabling greater transferability and scientific study.
Medical VLMs Fail Critical Input Sanity Checks
Medical VLMs fail critical input validation tests, as revealed by the new MedObvious benchmark, highlighting a significant safety risk.
Perceptio: Spatial Grounding for LVLMs
Perceptio LVLM introduces explicit 2D/3D spatial tokens into autoregressive sequences, achieving SOTA in spatial grounding and understanding tasks.
Perceptio: Spatial Grounding for LVLMs
Perceptio LVLM integrates explicit spatial tokens (segmentation, depth) to overcome LVLM limitations in fine-grained visual grounding, achieving SOTA across benchmarks.
F2LLM-v2: Multilingual Embeddings Unleashed
F2LLM-v2 offers a new family of highly efficient, multilingual embedding models supporting over 200 languages, setting SOTA on 11 MTEB benchmarks.
3D Grounding for Vision-Language Models
Loc3R-VLM enhances 2D VLMs with 3D spatial reasoning from monocular video, achieving SOTA in language-based localization and 3D QA.
3D Spatial Reasoning for VLM
Loc3R-VLM injects 3D spatial reasoning into 2D VLMs using monocular video, achieving SOTA in localization and 3D QA.
AI Drills Deeper for Oilfield Insights
Databricks introduces an AI agent that translates complex drilling data into natural language, simplifying operations and reducing costly downtime.
Descript Masters Multilingual Dubbing
Descript enhances its AI-powered video editor with OpenAI models for natural-sounding multilingual dubbing, overcoming timing and meaning challenges.

Microsoft's Phi-4-reasoning-vision-15B compact AI model
Microsoft Research's Phi-4-reasoning-vision-15B offers efficient multimodal AI, excelling in reasoning and vision tasks with less data and compute.
CHIMERA Dataset Boosts LLM Reasoning
Researchers introduce CHIMERA, a synthetic dataset enabling LLMs to achieve strong cross-domain reasoning capabilities with efficient training.

OpenAI's GPT-4.5 Enhances Web Search Integration
OpenAI researcher Josh discusses how GPT-4.5's web search integration is becoming more natural, conversational, and context-aware.
Multimodal LLMs: What's Lost in Translation?
New research reveals multimodal LLMs struggle to utilize non-textual data due to a 'mismatched decoder problem,' impacting their true understanding.
Less Data, More Alignment: SOTAlign
Researchers introduce SOTAlign, a framework that achieves robust cross-modal alignment using significantly less paired data by leveraging unpaired samples.
NAP: Unlocking Parallel Generation in Diffusion Language Models
Researchers propose NAP, a data-centric approach to enable true parallel generation in Diffusion Language Models by aligning training data with non-autoregressive decoding.
AI Agent for Grounded Chest X-ray Diagnosis
Researchers introduce CXReasonAgent, an AI diagnostic agent enhancing Chest X-ray interpretation by grounding LLM reasoning in clinical tools and visual evidence.
Multilingual LLM Guardrails Tested
Researchers tested how LLM guardrails perform across languages and policy phrasings, revealing significant variations that impact AI safety assessments.

Small language model optimization cracks complex business math
Microsoft’s OptiMind is a 20-billion parameter small language model that achieves high accuracy in converting natural language business problems into mathematical optimization models through expert-aligned training.

Ask Photos Transforms Personal Photo Discovery

Gemini Google Translate Elevates Nuance

AI Powers Railway History: A New Era for Digital Archives

Gemini Android Auto Redefines In-Car AI

Paage raises $2.2M to advance AI social commerce platform
Paage secured $2.2 million in new funding. This capital will advance its AI social commerce platform . The platform empowers creators and brands.
Paage raises $2.2M to advance AI social commerce platform
Paage secured $2.2 million in new funding. This capital will advance its AI social commerce platform . The platform empowers creators and brands.

Google Photos AI Features Redefine Memory Management
Solidatus raises £5M to advance AI data lineage platform
Data lineage provider Solidatus secured £5M to accelerate its AI-powered platform for enterprise data governance and compliance.

RealWear Arc 3 launch: A lighter AR headset with natural language AI for industry

Unpacking the Transformer: From RNNs to AI's Cornerstone

Sesame raises $250M to advance conversational AI

PolyAI’s Agentic AI Redefines Customer Service with Human-Like Empathy

Juicebox’s AI recruiting agents land $30M from Sequoia
Juicebox is betting the future of AI recruiting isn't just better search, but fully autonomous agents that handle the entire hiring pipeline.

AI Won't Kill Language Learning: The German Verb and Human Connection
Kotoba Technologies Lands $11.83M Seed 2 for AI Interpretation
Kotoba Technologies secured $11.83 million in Seed 2 funding. Globis Capital Partners and Boost Capital led the round, accelerating commercialization of its AI-...

Kotoba Technologies Lands $11.83M Seed 2 for AI Interpretation
Kotoba Technologies secured $11.83 million in Seed 2 funding. Globis Capital Partners and Boost Capital led the round. This investment will accelerate commercialization of its AI-powered simultaneous interpretation technology.
Nebulock Raises $8.5M for AI-Powered Threat Hunting
\n Nebulock , a Boston-based cybersecurity startup, has secured $8.5 million in total funding.

Nebulock Raises $8.5M for AI-Powered Threat Hunting
\n Nebulock , a Boston-based cybersecurity startup, has secured $8.5 million in total funding.

Mozart AI Secured $730K to Herald GenAI Music Creation
Software 3.0: The English Revolution in Computing
\n The very fabric of software is undergoing a fundamental transformation, shifting paradigms that have defined computing for decades.

Software 3.0: The English Revolution in Computing
"We are in the mainframe and time-sharing era of computing" for LLMs, akin to the 1960s. Billions of people now have unprecedented access to this new computational power.

Voice AI Agent Exit: Salesforce to Acquire Tenyx

Loora Launches Android App to Expand Access to its AI English Tutor

Hi Auto Crowned 'Innovator of the Year' for Pioneering Voice AI in Drive-Thrus

Panda Trading Systems Unveils New AI Features for Trading Brokerages

Solvo Launches Game-Changer Generative AI for Cloud Security, SecurityGenie
Vendict Exits Stealth with $9.5 Million Funding to Power Security Compliance Teams with Generative AI
\n Vendict , an Israel-based technology startup that leverages the latest advancements in linguistic generative AI to power security compliance teams, emerges f...

Vendict Exits Stealth with $9.5 Million Funding to Power Security Compliance Teams with Generative AI
\n Vendict , an Israel-based technology startup that leverages the latest advancements in linguistic generative AI to power security compliance teams, emerges f...

Beaconcure Secures $14 Million in Series B Funding for AI Clinical Data Technology

Hyro Secures $20 Million Series B for Conversational and Generative AI in Healthcare
