#LLM

50 articles with this tag

Stop Babysitting AI Agents: Build a Context Engine
Artificial Intelligence

Stop Babysitting AI Agents: Build a Context Engine

Brandon Walsenuk from Unblocked discusses the critical need for context engines to empower AI agents, moving beyond simple data access to true understanding and autonomous operation.

about 3 hours ago
The 4 Types of AI Agent Memory Explained
Artificial Intelligence

The 4 Types of AI Agent Memory Explained

IBM Master Inventor Martin Keen details the four essential memory types AI agents need: working, semantic, procedural, and episodic.

about 5 hours ago
Databricks Speeds Up Open-Source LLMs
Technology

Databricks Speeds Up Open-Source LLMs

Databricks enhances open-source LLM performance with automatic prompt caching, reducing latency and boosting throughput without user configuration.

4 days ago
AI at Graduations & Claude's Blackmail Tactics
Artificial Intelligence

AI at Graduations & Claude's Blackmail Tactics

IBM experts discuss AI's evolving role, from college graduations to ethical dilemmas like LLM data corruption and potential 'blackmail' scenarios.

4 days ago
LinkedIn's AI Search Upgrade
tech

LinkedIn's AI Search Upgrade

LinkedIn is leveraging LLMs for semantic search, transforming how users find jobs and people by understanding intent over keywords.

5 days ago
AI Models Now Predict the Future, Almost
AI Research

AI Models Now Predict the Future, Almost

Fine-tuning LLMs for forecasting tasks boosts their accuracy, with specialized models now rivaling top human predictors and enhancing ensemble predictions.

6 days ago
Marc Klingen on AI Agents & Langfuse
Artificial Intelligence

Marc Klingen on AI Agents & Langfuse

Marc Klingen of Langfuse shares lessons on upskilling AI coding agents, discussing the importance of observability, documentation, and iterative improvement.

6 days ago
Google's Cormac Brick on Tiny LLMs for On-Device Agents
AI Research

Google's Cormac Brick on Tiny LLMs for On-Device Agents

Google's Cormac Brick discusses the fine-tuning of Tiny LLMs for on-device agents, highlighting the benefits of LiteRT-LM and Gemma 4 for edge AI applications.

6 days ago
Coding Agent Inference Benchmark Revealed
Technology

Coding Agent Inference Benchmark Revealed

Together AI unveils a new benchmark for coding agent inference, highlighting performance under real-world load and significant cost advantages.

7 days ago
Databricks adds AI guardrails
Technology

Databricks adds AI guardrails

Databricks introduces Unity AI Gateway Guardrails, offering pre-built and custom controls to secure AI applications against data leaks and harmful outputs.

7 days ago
AI Sovereignty: What Breaks When You Build AI
Artificial Intelligence

AI Sovereignty: What Breaks When You Build AI

Bilge Yücel from deepset GmbH explains the engineering challenges and solutions for building sovereign AI systems, focusing on data, model, infrastructure, and operational control.

7 days ago
Spotify's Shivam Verma on LLMs and Personalization
Artificial Intelligence

Spotify's Shivam Verma on LLMs and Personalization

Shivam Verma from Spotify discusses how LLMs are transforming personalization in recommendation systems, moving towards steerable and context-aware content discovery.

7 days ago
Lawrence Jones on Fighting AI with AI
Artificial Intelligence

Lawrence Jones on Fighting AI with AI

Lawrence Jones of incident.io discusses how AI can be used to debug and manage complex AI systems, highlighting the importance of structured data and automated analysis pipelines.

9 days ago
AI UX is Broken, Not the Model
Artificial Intelligence

AI UX is Broken, Not the Model

Mike Christensen from Ably explains why AI UX is broken due to flawed infrastructure, not models, and how to fix it with durable sessions and channels.

9 days ago
AI Agents Break Zero Trust at the Last Mile
Artificial Intelligence

AI Agents Break Zero Trust at the Last Mile

IBM's Grant Miller explains how AI agents break Zero Trust at the 'last mile' and outlines strategies to secure these complex integrations.

9 days ago
Chris Lovejoy on Building Domain-Native AI Organizations
Artificial Intelligence

Chris Lovejoy on Building Domain-Native AI Organizations

Chris Lovejoy of Notius Labs discusses the critical role of domain experts in AI product development, outlining three key organizational models: Oracle, Evaluator, and Architect.

10 days ago
Together AI Taps Blockchain for Cheaper AI
Technology

Together AI Taps Blockchain for Cheaper AI

Together AI and Pearl Research Labs are integrating blockchain to cut AI inference costs, offering discounted model access subsidized by cryptocurrency mining.

11 days ago
GitHub pilots AI for accessibility
Technology

GitHub pilots AI for accessibility

GitHub is piloting an AI agent to automate accessibility checks and fixes, demonstrating a 68% resolution rate in early tests.

11 days ago
Violin: AI Translates Video Content
Technology

Violin: AI Translates Video Content

Together AI launches Violin, an open-source AI tool for video translation and interactive content analysis.

12 days ago
KV-Fold: Unlocking Transformer Long Context
AI Research

KV-Fold: Unlocking Transformer Long Context

KV-Fold enables training-free, stable long-context inference up to 128K tokens with 100% retrieval accuracy, overcoming prior limitations.

13 days ago
Building an AI Chess Coach: Take Take Take
Artificial Intelligence

Building an AI Chess Coach: Take Take Take

Anant Dole and Asbjorn Steinskog discuss building an AI chess coach, the limitations of LLMs in chess, and their eval framework.

13 days ago
Claude's Corner: CellType — Teaching LLMs to Speak Biology
Claude's Corner

Claude's Corner: CellType — Teaching LLMs to Speak Biology

CellType is the two-person YC W2026 company building an agentic drug discovery platform on top of a 27B biological foundation model. Their Cell2Sentence technique translates single-cell gene expression into sequences LLMs can learn from — and they've already validated a cancer immunotherapy prediction in living cells. Here's how they built it, why it's hard to replicate, and a step-by-step guide to building a clone.

14 days ago
Embedding OpenClaw Coding Agent in Your Product
Artificial Intelligence

Embedding OpenClaw Coding Agent in Your Product

Matthias Luebken from Tavon.ai discusses embedding the OpenClaw coding agent, Pi, into products, highlighting its utility for developers and the future of AI in software systems.

15 days ago
Trigger.dev's Eric Allam on Durable AI Agents
Artificial Intelligence

Trigger.dev's Eric Allam on Durable AI Agents

Eric Allam of Trigger.dev explores the two main approaches to building durable AI agents: replay and snapshotting, highlighting the advantages of Firecracker microVMs for stateful compute.

16 days ago
Neil Zeghidour on Voice AI's 'Her' Moment
Artificial Intelligence

Neil Zeghidour on Voice AI's 'Her' Moment

Gradium AI's Neil Zeghidour discusses the 'Her' moment in voice AI, highlighting challenges like latency and scalability, and showcasing Phonon, their on-device TTS model.

17 days ago
ElevenLabs Gives Chat Agents a Voice
Artificial Intelligence

ElevenLabs Gives Chat Agents a Voice

Luke Harries from ElevenLabs discusses the increasing importance of voice for AI chat agents, highlighting the benefits of speed, accessibility, and user experience.

17 days ago
MemAlign MLflow Bridges AI Judge Gap
Technology

MemAlign MLflow Bridges AI Judge Gap

Databricks' MemAlign framework in MLflow significantly improves AI judges' accuracy in evaluating machine learning code, bridging the gap with human experts.

18 days ago
Superhuman Hits 200K QPS With Databricks
Technology

Superhuman Hits 200K QPS With Databricks

Superhuman and Databricks engineers collaborated to build an AI inference platform serving over 200K QPS with sub-second latency.

18 days ago
Gosset AI: Drug Discovery Precision Leap
AI Research

Gosset AI: Drug Discovery Precision Leap

Gosset AI platform outperforms frontier LLMs in niche drug discovery by 3.2x, demonstrating the power of curated data over generic web search for R&D.

18 days ago
DeepSeek-V4: Million-Token Context is a Serving Problem
Technology

DeepSeek-V4: Million-Token Context is a Serving Problem

DeepSeek-V4's million-token context window presents an inference systems challenge, demanding sophisticated cache management and serving strategies to unlock its potential.

18 days ago
GitHub Cuts Agentic Workflow Costs
Technology

GitHub Cuts Agentic Workflow Costs

GitHub implements new strategies to cut token costs in its automated agentic workflows by enhancing logging and optimizing tool usage.

19 days ago
OpenAI's New Voice API Models
Artificial Intelligence

OpenAI's New Voice API Models

OpenAI introduces GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper to its API, enhancing voice intelligence for developers.

19 days ago
Parloa AI Agents Mimic Human Service
Artificial Intelligence

Parloa AI Agents Mimic Human Service

Parloa's AI Agent Management Platform uses OpenAI models to build, simulate, and deploy voice-driven customer service agents, prioritizing real-world performance and reliability.

19 days ago
Uber Taps OpenAI for Smarter Driving, Faster Booking
Artificial Intelligence

Uber Taps OpenAI for Smarter Driving, Faster Booking

Uber integrates OpenAI models to boost driver earnings with an AI assistant and enhance rider experiences through faster booking and new voice features.

20 days ago
Automating Multi-Agent System Creation
AI Research

Automating Multi-Agent System Creation

A new framework automates the creation of multi-agent systems, significantly improving agent recall and system robustness through LLM-driven planning and a critique agent.

20 days ago
Superlinked's Filip Makraduli on Small Model Inference Infrastructure
Artificial Intelligence

Superlinked's Filip Makraduli on Small Model Inference Infrastructure

Filip Makraduli of Superlinked discusses the critical need for robust small model inference infrastructure, highlighting Superlinked's open-source solution.

21 days ago
Google DeepMind Accelerates AI on Edge Devices
AI Research

Google DeepMind Accelerates AI on Edge Devices

Google DeepMind unveils Gemma 4 models and the LiteRT framework to accelerate AI on edge devices, emphasizing performance, privacy, and cross-platform capabilities.

21 days ago
RAG's Evolution: From Keywords to Agentic AI
Artificial Intelligence

RAG's Evolution: From Keywords to Agentic AI

Explore the evolution of Retrieval Augmented Generation (RAG) from basic keyword search to sophisticated agentic AI systems.

21 days ago
Claude's Corner: Sonarly — Your On-Call Engineer Just Called In Sick (Permanently)
Claude's Corner

Claude's Corner: Sonarly — Your On-Call Engineer Just Called In Sick (Permanently)

Sonarly is an autonomous AI agent that triages production alerts, finds root causes with 78% accuracy, and opens fix PRs—while your on-call engineer sleeps.

23 days ago
Claude's Corner: Compresr — The Token Accountant Your AI Stack Desperately Needs
Claude's Corner

Claude's Corner: Compresr — The Token Accountant Your AI Stack Desperately Needs

Four EPFL researchers built a PhD-backed LLM context compression API that could cut your token bill by 10x — or get eaten alive by Anthropic. Here's the technical breakdown and how to build your own.

25 days ago
IBM Experts on AI Training: Efficiency vs. Scale
Artificial Intelligence

IBM Experts on AI Training: Efficiency vs. Scale

IBM's Marina Danilevsky and Gabe Goodhart discuss the company's new 'Bob' and 'Granite' AI models, highlighting the shift towards specialized, efficient training and the challenges of distributed AI infrastructure.

25 days ago
AI Agents on the Loose: Network Security Risks Emerge
AI Research

AI Agents on the Loose: Network Security Risks Emerge

Microsoft Research reveals how AI agents interacting at scale create new security risks like worms, reputation manipulation, and invisible attacks.

26 days ago
Cross-Architecture dLLM Distillation
AI Research

Cross-Architecture dLLM Distillation

TIDE framework enables cross-architecture distillation for diffusion large language models, achieving significant performance gains with smaller student models.

26 days ago
Cursor's Agent Harness Gets Smarter
Technology

Cursor's Agent Harness Gets Smarter

Cursor is meticulously refining its AI agent harness, focusing on dynamic context, rigorous evaluation, and model-specific customization to boost software development capabilities.

26 days ago
AI Agents Failures & How To Stop Them
Artificial Intelligence

AI Agents Failures & How To Stop Them

Danilo Campagna from Posthog discusses common LLM code generation failures and strategies for improvement, focusing on context, architecture, and human error.

26 days ago
OpenAI's Goblin Problem
Artificial Intelligence

OpenAI's Goblin Problem

OpenAI's GPT-5.1 models developed a peculiar "goblin problem" due to training for a "Nerdy" personality, leading to unexpected creature metaphors.

27 days ago
DeepSeek V4 Pro Hits Together AI
Technology

DeepSeek V4 Pro Hits Together AI

Together AI launches DeepSeek V4 Pro, a 1.6T MoE model with a 512K context window and new cached input pricing for cost-effective long-context reasoning.

27 days ago
Databricks GPT-5.5 Outperforms GPT-4 on OfficeQA Benchmark
AI Research

Databricks GPT-5.5 Outperforms GPT-4 on OfficeQA Benchmark

Databricks Research Engineer Arnav Singhvi reveals GPT-5.5, a new AI model achieving state-of-the-art results on the OfficeQA benchmark and outperforming GPT-4.

27 days ago
AI Engineer: Small Models, Big Impact
Artificial Intelligence

AI Engineer: Small Models, Big Impact

Maxime Labonne of Liquid AI discusses the unique challenges and advantages of small AI models, detailing their architecture, training, and techniques to overcome issues like doom looping.

27 days ago
Open Source AI: Boon or Bane for Security?
Artificial Intelligence

Open Source AI: Boon or Bane for Security?

IBM's Martin Keen and Gabe Goodhart discuss the security implications of open-source AI, balancing innovation with risk.

27 days ago