#LLM
50 articles with this tag

Stop Babysitting AI Agents: Build a Context Engine
Brandon Walsenuk from Unblocked discusses the critical need for context engines to empower AI agents, moving beyond simple data access to true understanding and autonomous operation.

The 4 Types of AI Agent Memory Explained
IBM Master Inventor Martin Keen details the four essential memory types AI agents need: working, semantic, procedural, and episodic.
Databricks Speeds Up Open-Source LLMs
Databricks enhances open-source LLM performance with automatic prompt caching, reducing latency and boosting throughput without user configuration.

AI at Graduations & Claude's Blackmail Tactics
IBM experts discuss AI's evolving role, from college graduations to ethical dilemmas like LLM data corruption and potential 'blackmail' scenarios.
LinkedIn's AI Search Upgrade
LinkedIn is leveraging LLMs for semantic search, transforming how users find jobs and people by understanding intent over keywords.

AI Models Now Predict the Future, Almost
Fine-tuning LLMs for forecasting tasks boosts their accuracy, with specialized models now rivaling top human predictors and enhancing ensemble predictions.

Marc Klingen on AI Agents & Langfuse
Marc Klingen of Langfuse shares lessons on upskilling AI coding agents, discussing the importance of observability, documentation, and iterative improvement.

Google's Cormac Brick on Tiny LLMs for On-Device Agents
Google's Cormac Brick discusses the fine-tuning of Tiny LLMs for on-device agents, highlighting the benefits of LiteRT-LM and Gemma 4 for edge AI applications.

Coding Agent Inference Benchmark Revealed
Together AI unveils a new benchmark for coding agent inference, highlighting performance under real-world load and significant cost advantages.
Databricks adds AI guardrails
Databricks introduces Unity AI Gateway Guardrails, offering pre-built and custom controls to secure AI applications against data leaks and harmful outputs.

AI Sovereignty: What Breaks When You Build AI
Bilge Yücel from deepset GmbH explains the engineering challenges and solutions for building sovereign AI systems, focusing on data, model, infrastructure, and operational control.

Spotify's Shivam Verma on LLMs and Personalization
Shivam Verma from Spotify discusses how LLMs are transforming personalization in recommendation systems, moving towards steerable and context-aware content discovery.

Lawrence Jones on Fighting AI with AI
Lawrence Jones of incident.io discusses how AI can be used to debug and manage complex AI systems, highlighting the importance of structured data and automated analysis pipelines.

AI UX is Broken, Not the Model
Mike Christensen from Ably explains why AI UX is broken due to flawed infrastructure, not models, and how to fix it with durable sessions and channels.

AI Agents Break Zero Trust at the Last Mile
IBM's Grant Miller explains how AI agents break Zero Trust at the 'last mile' and outlines strategies to secure these complex integrations.

Chris Lovejoy on Building Domain-Native AI Organizations
Chris Lovejoy of Notius Labs discusses the critical role of domain experts in AI product development, outlining three key organizational models: Oracle, Evaluator, and Architect.

Together AI Taps Blockchain for Cheaper AI
Together AI and Pearl Research Labs are integrating blockchain to cut AI inference costs, offering discounted model access subsidized by cryptocurrency mining.

GitHub pilots AI for accessibility
GitHub is piloting an AI agent to automate accessibility checks and fixes, demonstrating a 68% resolution rate in early tests.

Violin: AI Translates Video Content
Together AI launches Violin, an open-source AI tool for video translation and interactive content analysis.
KV-Fold: Unlocking Transformer Long Context
KV-Fold enables training-free, stable long-context inference up to 128K tokens with 100% retrieval accuracy, overcoming prior limitations.

Building an AI Chess Coach: Take Take Take
Anant Dole and Asbjorn Steinskog discuss building an AI chess coach, the limitations of LLMs in chess, and their eval framework.

Claude's Corner: CellType — Teaching LLMs to Speak Biology
CellType is the two-person YC W2026 company building an agentic drug discovery platform on top of a 27B biological foundation model. Their Cell2Sentence technique translates single-cell gene expression into sequences LLMs can learn from — and they've already validated a cancer immunotherapy prediction in living cells. Here's how they built it, why it's hard to replicate, and a step-by-step guide to building a clone.

Embedding OpenClaw Coding Agent in Your Product
Matthias Luebken from Tavon.ai discusses embedding the OpenClaw coding agent, Pi, into products, highlighting its utility for developers and the future of AI in software systems.

Trigger.dev's Eric Allam on Durable AI Agents
Eric Allam of Trigger.dev explores the two main approaches to building durable AI agents: replay and snapshotting, highlighting the advantages of Firecracker microVMs for stateful compute.

Neil Zeghidour on Voice AI's 'Her' Moment
Gradium AI's Neil Zeghidour discusses the 'Her' moment in voice AI, highlighting challenges like latency and scalability, and showcasing Phonon, their on-device TTS model.

ElevenLabs Gives Chat Agents a Voice
Luke Harries from ElevenLabs discusses the increasing importance of voice for AI chat agents, highlighting the benefits of speed, accessibility, and user experience.
MemAlign MLflow Bridges AI Judge Gap
Databricks' MemAlign framework in MLflow significantly improves AI judges' accuracy in evaluating machine learning code, bridging the gap with human experts.
Superhuman Hits 200K QPS With Databricks
Superhuman and Databricks engineers collaborated to build an AI inference platform serving over 200K QPS with sub-second latency.
Gosset AI: Drug Discovery Precision Leap
Gosset AI platform outperforms frontier LLMs in niche drug discovery by 3.2x, demonstrating the power of curated data over generic web search for R&D.
DeepSeek-V4: Million-Token Context is a Serving Problem
DeepSeek-V4's million-token context window presents an inference systems challenge, demanding sophisticated cache management and serving strategies to unlock its potential.

GitHub Cuts Agentic Workflow Costs
GitHub implements new strategies to cut token costs in its automated agentic workflows by enhancing logging and optimizing tool usage.
OpenAI's New Voice API Models
OpenAI introduces GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper to its API, enhancing voice intelligence for developers.
Parloa AI Agents Mimic Human Service
Parloa's AI Agent Management Platform uses OpenAI models to build, simulate, and deploy voice-driven customer service agents, prioritizing real-world performance and reliability.
Uber Taps OpenAI for Smarter Driving, Faster Booking
Uber integrates OpenAI models to boost driver earnings with an AI assistant and enhance rider experiences through faster booking and new voice features.
Automating Multi-Agent System Creation
A new framework automates the creation of multi-agent systems, significantly improving agent recall and system robustness through LLM-driven planning and a critique agent.

Superlinked's Filip Makraduli on Small Model Inference Infrastructure
Filip Makraduli of Superlinked discusses the critical need for robust small model inference infrastructure, highlighting Superlinked's open-source solution.

Google DeepMind Accelerates AI on Edge Devices
Google DeepMind unveils Gemma 4 models and the LiteRT framework to accelerate AI on edge devices, emphasizing performance, privacy, and cross-platform capabilities.

RAG's Evolution: From Keywords to Agentic AI
Explore the evolution of Retrieval Augmented Generation (RAG) from basic keyword search to sophisticated agentic AI systems.

Claude's Corner: Sonarly — Your On-Call Engineer Just Called In Sick (Permanently)
Sonarly is an autonomous AI agent that triages production alerts, finds root causes with 78% accuracy, and opens fix PRs—while your on-call engineer sleeps.

Claude's Corner: Compresr — The Token Accountant Your AI Stack Desperately Needs
Four EPFL researchers built a PhD-backed LLM context compression API that could cut your token bill by 10x — or get eaten alive by Anthropic. Here's the technical breakdown and how to build your own.

IBM Experts on AI Training: Efficiency vs. Scale
IBM's Marina Danilevsky and Gabe Goodhart discuss the company's new 'Bob' and 'Granite' AI models, highlighting the shift towards specialized, efficient training and the challenges of distributed AI infrastructure.

AI Agents on the Loose: Network Security Risks Emerge
Microsoft Research reveals how AI agents interacting at scale create new security risks like worms, reputation manipulation, and invisible attacks.
Cross-Architecture dLLM Distillation
TIDE framework enables cross-architecture distillation for diffusion large language models, achieving significant performance gains with smaller student models.

Cursor's Agent Harness Gets Smarter
Cursor is meticulously refining its AI agent harness, focusing on dynamic context, rigorous evaluation, and model-specific customization to boost software development capabilities.

AI Agents Failures & How To Stop Them
Danilo Campagna from Posthog discusses common LLM code generation failures and strategies for improvement, focusing on context, architecture, and human error.
OpenAI's Goblin Problem
OpenAI's GPT-5.1 models developed a peculiar "goblin problem" due to training for a "Nerdy" personality, leading to unexpected creature metaphors.

DeepSeek V4 Pro Hits Together AI
Together AI launches DeepSeek V4 Pro, a 1.6T MoE model with a 512K context window and new cached input pricing for cost-effective long-context reasoning.

Databricks GPT-5.5 Outperforms GPT-4 on OfficeQA Benchmark
Databricks Research Engineer Arnav Singhvi reveals GPT-5.5, a new AI model achieving state-of-the-art results on the OfficeQA benchmark and outperforming GPT-4.

AI Engineer: Small Models, Big Impact
Maxime Labonne of Liquid AI discusses the unique challenges and advantages of small AI models, detailing their architecture, training, and techniques to overcome issues like doom looping.

Open Source AI: Boon or Bane for Security?
IBM's Martin Keen and Gabe Goodhart discuss the security implications of open-source AI, balancing innovation with risk.