Databricks Reffy: From Tribal Data to AI Answers

Databricks' Reffy uses AI and RAG to turn scattered customer stories into an instantly searchable knowledge base for sales and marketing.

3 min read
Databricks Reffy application interface showing search results for customer stories.
Image credit: StartupHub.ai

Databricks has tackled its internal challenge of scattered customer success stories with the development of 'Reffy'. This new application transforms a wealth of tribal knowledge into instantly accessible, AI-powered insights, making over 2,400 customer references searchable and analyzable.

The core problem was simple: finding the right customer proof point at the right time was a persistent hurdle for sales and marketing teams. Databricks hosts thousands of case studies, YouTube talks, and internal documents, but accessing this information efficiently was nearly impossible. This led to overuse of popular references, missed opportunities, and a reliance on ad-hoc knowledge sharing.

From Chaos to Clarity: The Reffy Solution

Reffy acts as a full-stack agentic application, built entirely on the Databricks platform. It consolidates disparate customer stories, categorizes them, and employs a Retrieval Augmented Generation (RAG) agent to power its search capabilities. The architecture integrates Databricks' Lakeflow Jobs for ETL, Unity Catalog for governance, Vector Search for retrieval, Model Serving for the AI agent, Lakebase for real-time data handling, and Databricks Apps for the user interface.

Data Pipeline and AI Enrichment

The process begins with collecting text from various sources, including YouTube transcripts, LinkedIn articles, and public case studies. Internal documents are also consolidated. This raw data lands in a 'Bronze' Delta Lake table. The critical step involves using Gemini 2.5 AI Functions to score each story based on a rigorous 31-point system. This AI enrichment extracts metadata, assesses outcome credibility, and identifies the business challenge and Databricks' unique value proposition, filtering out lower-quality content.

Agentic Search and User Experience

An agent built using the DSPy framework facilitates lightning-fast keyword searches and more nuanced, longer-form LLM responses with reasoning. Hybrid keyword and semantic search, coupled with Databricks' Vector Search re-ranker, ensures relevant results. The agent is logged to MLflow and deployed via Databricks Model Serving, optimizing for cost-effectiveness on CPU instances.

The user-facing application, built with React and FastAPI, provides a chat-like interface. It leverages Lakebase for persisting conversation history, logs, and user identities, enabling efficient data retrieval and quality assurance. This makes Reffy an effective tool for internal knowledge management systems.

Impact and Future Scalability

Since its December 2025 launch, over 1,800 Databricks employees have used Reffy, running more than 7,500 queries. This has led to more relevant storytelling, faster campaign execution, and increased confidence in leveraging customer proof points at scale. The system also provides ongoing monitoring and metrics, surfacing popular topics and identifying content gaps, such as user interest in newer products like Agent Bricks and Lakebase.

Looking ahead, Databricks plans to connect Reffy to other solutions via an API and MCP server, integrating customer intelligence directly into existing workflows. This continuous refinement, powered by user interaction data, aims to further enhance the value of this internal knowledge resource.