Databricks A/B Testing Framework Powers Game Analytics

HARDlight leverages Databricks to automate A/B testing analysis, doubling experimentation capacity and building trust through standardized, LLM-enhanced insights.

3 min read
Databricks logo with abstract data visualization elements
Image credit: StartupHub.ai

Mobile gaming giant HARDlight has overhauled its A/B testing analysis with a custom framework built on Databricks. This move addresses a common bottleneck in live-service games: the slow, manual process of analyzing experimental results, which often leads to delayed decisions and eroded confidence in data-driven iteration. The new system aims to standardize analysis, accelerate insight delivery, and democratize access to experiment outcomes across the organization.

The challenge for HARDlight was not just speed but also trust. Inconsistent analytical approaches led to differing interpretations, hindering alignment and weakening A/B testing's role as a scientific decision-making tool. Different stakeholders required varying levels of detail, from daily status updates to deep dives into player behavior, a spectrum current dashboards struggled to serve effectively.

To scale experimentation, HARDlight needed a unified approach to inference and accessible results. They developed a Databricks-native A/B testing analysis framework that automates the entire process from data ingestion to decision support. Statistical modeling is now applied consistently and transparently upstream, with results published to a daily-refreshing dashboard. This dashboard begins with an LLM-generated summary and allows for deeper exploration of metrics, diagnostics, and recommended actions.

Automated Insight Delivery on Databricks

The core of HARDlight's solution lies in its automated workflow within Databricks. Experiment definitions and telemetry are standardized, ensuring consistent statistical modeling and reproducible analysis. The Databricks Unity Catalog provides a central control plane for permissions and lineage of experiment assets, while Spark Declarative Pipelines manage reliable data ingestion and transformations. MLflow supports experiment tracking and model packaging, crucial for repeatable analysis. This integration of MLOps tools ensures that the Databricks A/B testing framework operates with governance and consistency.

Layered Insights for Every Audience

A key innovation is the dashboard's progressive disclosure, starting with an LLM-generated summary. This provides a high-level overview for non-technical stakeholders, translating validated statistical outputs into natural language. Users can choose to stop at the summary or delve into deeper layers of metrics, diagnostics, and segment analysis as needed. This layered approach ensures rapid scanning while retaining analytical depth for expert validation.

Confirmed outcomes and statistical impact are presented with key metrics like player lifetime value (LTV) and retention, alongside effect sizes and confidence levels. The dashboard also forecasts LTV impact, explicitly showing uncertainty margins, and breaks down revenue by source. Player engagement, behavior, and monetization mechanics are detailed, with core gameplay loop data available at the deepest level for expert users.

Frozen Dashboards Ensure Persistent Learnings

At the conclusion of an experiment, the dashboard 'freezes.' This preserves the final results, decisions made, and recommended actions, creating an auditable record. This institutionalizes learnings from past experiments, allowing stakeholders to revisit outcomes without ambiguity and reducing duplicated analysis across future projects.

Tangible Impact: Efficiency and Trust

The framework has significantly reduced manual effort, saving HARDlight's data team over eight hours per week. Standardizing experiment runs has eliminated substantial manual setup, freeing up approximately one day per experiment. This has enabled a targeted two-times increase in monthly A/B testing capacity without adding headcount.

Beyond efficiency, the system has boosted consistency and confidence in results. The frozen dashboard archive serves as a durable source of truth, streamlining knowledge transfer and reducing repeated analysis. This shift from multi-day manual reports to daily, AI/BI-enabled updates has fundamentally changed how insights are consumed across the studio.