Bayer Consumer Health Unifies Data with Databricks

Bayer Consumer Health leverages Databricks and Unity Catalog to create a unified, governed data platform, enabling global self-service analytics and eliminating data silos.

2 min read
Databricks logo alongside Bayer Consumer Health logo, symbolizing data platform integration.
Image credit: StartupHub.ai

Bayer Consumer Health, a division of the global life sciences giant, has overhauled its data infrastructure to enable widespread self-service analytics. By adopting Databricks and its Unity Catalog, the company has moved away from fragmented systems and costly data duplication, often referred to as 'data tourism'.

Previously, Bayer's global operations led to isolated data environments across various markets, hindering efficient decision-making. The company struggled with inconsistent data, high management overhead, and difficulties in leveraging advanced analytics like machine learning. This fragmentation meant data often had to be copied multiple times, increasing costs and slowing down innovation.

Consolidating Data Operations

To address these challenges, Bayer Consumer Health sought a unified, scalable, cloud-based data platform. The new architecture, built on Databricks and Azure services, centralizes data transformation and cleansing, ensuring raw data is converted into reliable, reusable assets. This unified approach supports various data roles, from business intelligence reporting to complex machine learning applications.

The implementation of Unity Catalog proved pivotal, providing a centralized governance and metadata layer. This allows for managing core data assets once while enabling secure consumption and reuse across different projects and regions. This streamlined approach to data governance is crucial for managing sensitive information and ensuring data integrity at scale.

Accelerating Analytics and Innovation

With Unity Catalog replacing its previous Hive Metastore, Bayer transitioned to a pull-based data sharing model. Data consumers now access governed core assets, simplifying data access and reducing the need for data replication. This shift has significantly sped up the development and testing of new analytics solutions, as engineers can work with production-grade data in development environments.

A central reporting endpoint now links to all data catalogs, providing a single, governed entry point for employees. This enables convenient and secure data discovery and combination across domains, allowing self-service analytics to scale without reintroducing silos. The integration of serverless capabilities further enhances performance and cost efficiency, ensuring users receive immediate results regardless of data volume.

This strategic move towards a unified, governed data platform powered by Databricks Unity Catalog positions Bayer Consumer Health to foster a truly data-driven organization, providing insights for all without the burden of data silos.