#Data Engineering
49 articles with this tag

Snowflake Simplifies Data Pipelines
Snowflake introduces DCM Projects and Cortex Code for declarative data pipelines, simplifying workflow management and reducing manual coding.
Databricks Unifies Clinical Data
Databricks' new open-source Site Feasibility Workbench brings clinical trial intelligence onto its Lakehouse, tackling data silos and improving auditability.
DataMaster: Autonomous Data Engineering
DataMaster pioneers autonomous data engineering, unlocking significant ML gains by optimizing data pipelines rather than algorithms, as shown on MLE-Bench Lite and PostTrainBench.
Databricks Syncs Postgres to Lakehouse Natively
Databricks unveils Native Lakehouse Sync, directly replicating operational Postgres data into Unity Catalog without pipelines, simplifying AI and analytics integration.
Databricks Driver Gets Faster, Feature-Rich
Databricks rolls out its open-source JDBC driver with significant speed boosts and new features for enhanced data connectivity.
Databricks Taps Students for AI Leadership
Databricks launches Student Fellows program to equip university students with data and AI skills, fostering campus leadership and career opportunities.
Databricks boosts Postgres writes 5x
Databricks' lakebase architecture boosts Postgres write performance up to 5x by offloading durability tasks to distributed storage.
Databricks Reimagines Serverless Compute
Databricks is overhauling distributed systems for serverless compute, enhancing stability and performance through Spark Connect, intelligent routing, and adaptive autoscaling.

Snowflake's AI Boost for Data Integration
Snowflake fuses its Openflow data integration service with its Cortex Code AI agent to streamline AI-driven data pipelines.
Databricks Crushes Monitoring Scale
Databricks scales monitoring to 10 trillion samples/day using customized open-source tools and a new Lakehouse platform for cost-effective high-cardinality data.
Backstage Ditches Postgres for Databricks Lakebase
Databricks integrates Spotify's Backstage with Lakebase, enabling instant database branching and transforming developer workflows by eliminating operational vs. analytical database silos.
Databricks Speeds Up Analytics with Sketch Functions
Databricks enhances analytics with new sketch functions, delivering orders-of-magnitude speedups for percentile, distinct count, and top-K queries.
Healthcare Data: From Months to Minutes
Databricks and Redox cut clinical data integration times from months to minutes with natural language prompts and subsecond data streaming.

Snowflake Kafka Connector V4 Arrives
Snowflake's Kafka Connector V4 is here, moving ingestion logic server-side for major performance and cost gains.
Databricks Touts AI-Powered Data Pipelines
Databricks unveils Genie Code, an AI agent for Lakeflow, to automate data pipeline creation, orchestration, and debugging via natural language.
AI Needs Faster Databases
AI demands real-time data. Traditional operational databases lag, but new 'lakebase' architectures are bridging the gap for faster, smarter AI.
LLM Agents Tackle Database Joins
Databricks tests LLM agents for SQL join order optimization, achieving significant performance gains over traditional methods.

Snowflake Adds Declarative Infra Management
Snowflake's new DCM Projects feature allows users to manage data infrastructure declaratively, bringing software engineering practices to Snowflake object management.

Snowflake's AI Coder Goes Full Stack
Snowflake's Cortex Code AI assistant now understands and integrates across a broader data stack, offering specialized skills and embeddable platform capabilities.
Databricks Summit: AI Agents, Vibe Coding Take Center Stage
Databricks' Data + AI Summit 2026 in San Francisco will highlight AI agents and 'vibe coding', with new training and a Context Engineering certification.
Mercedes-Benz Cuts Data Costs 66% with Databricks
Mercedes-Benz slashed data egress costs by 66% using Databricks Delta Sharing and intelligent replication, building a secure cross-cloud data mesh.
Snowflake's Cortex Code Tames dbt
Snowflake's Cortex Code streamlines dbt project development by automating model creation, testing, and optimization with an AI agent.
dbt on Databricks: Open Platform Advantage
Databricks is enhancing dbt workflows with its unified lakehouse, offering open foundations, integrated governance, and optimized performance to accelerate data transformation.
Databricks Unlocks Unstructured Data
Databricks enhances its platform with Document Intelligence and Lakeflow, enabling businesses to unlock and process vast amounts of unstructured enterprise data.

Snowflake Simplifies Iceberg Storage
Snowflake's new managed storage for Apache Iceberg tables offers the open format's interoperability with Snowflake's resilient, zero-management infrastructure.
Databricks Powers Real-Time Search
Databricks unveils its platform for building real-time product search, integrating Vector Search, Lakeflow, and Lakebase for ingestion, retrieval, and operational data.
Lovable Taps Databricks for Data Apps
Lovable integrates with Databricks, allowing non-technical users to build data-driven applications using natural language, eliminating backend bottlenecks.
Databricks Embraces Iceberg v3
Databricks previews Apache Iceberg v3 support, integrating Row Lineage, Deletion Vectors, and VARIANT for enhanced performance and interoperability in the open lakehouse.
Databricks Postgres branches like Git
Databricks Lakebase brings Git-style branching to Postgres, offering near-instant, isolated database environments via copy-on-write technology.
Databricks Unifies Data Workflows
Databricks introduces a unified platform to eliminate data silos and enhance collaboration across financial institutions using AI and governed analytics.

Snowflake Taps AI for Retail Scale
Snowflake Intelligence is empowering retailers like the Mark Anthony Group to scale AI, democratize data access, and drive business outcomes through generative BI.

Snowflake's AI Coder Goes Broad
Snowflake's Cortex Code AI agent is now widely available, featuring enhanced usability and new capabilities for data engineering.
Databricks A/B Testing Framework Powers Game Analytics
HARDlight leverages Databricks to automate A/B testing analysis, doubling experimentation capacity and building trust through standardized, LLM-enhanced insights.

Snowflake's AI Coding Agent Streamlines Data Engineering
Snowflake's new AI coding agent, Cortex Code, aims to simplify data pipeline creation and accelerate development for engineers and analysts.
Snowflake Boosts AI Data Sharing
Snowflake enhances its AI data sharing platform with new features for reliability, usability, and transparency, crucial for production AI.
Databricks AutoCDC Ends Hand-Coding Pain
Databricks AutoCDC automates change data capture and SCD pipelines, slashing manual coding, improving performance, and cutting costs with declarative simplicity.
Spark Streaming Hits Millisecond Latency
Databricks' Apache Spark Structured Streaming real-time mode is now GA, offering sub-second latency and consolidating streaming needs onto a single engine.
Databricks Lakeflow Jobs vs Airflow
Databricks Lakeflow Jobs offers native lakehouse orchestration, mapping Airflow patterns like XComs, sensors, and branching to a more integrated, data-centric model.
Databricks Adds Free Data Ingestion Tier
Databricks launches a free tier for its Lakeflow Connect data ingestion tool and enhances its AI capabilities with Lakebase and Genie updates.
Data Science Careers: Skills & Paths
Explore the essential skills, diverse career paths, and educational routes shaping the data science landscape in 2024.
Databricks Serverless JARs Launch
Databricks Serverless JARs enable instant deployment of Scala/Java Spark jobs, eliminating cluster management and offering usage-based billing.
Spark Drops Microbatch for Real-Time
Apache Spark's Real-Time Mode (RTM) breaks microbatch barriers, enabling millisecond latency for streaming workloads with a new hybrid execution model.
Databricks Serverless Simplifies Data Ops
Databricks serverless compute automates infrastructure management, boosting performance and cutting costs for data engineering workflows.
Databricks Unleashes Genie Code AI
Databricks launches Genie Code, an AI agent designed to automate data tasks and significantly improve success rates in data science.
Databricks Unlocks Billion-Scale Vector Search
Databricks unveils a redesigned vector search capable of handling billions of vectors, drastically cutting costs and improving scalability.
Databricks Streamlines Real-Time Data Apps
Databricks' Zerobus Ingest and Lakebase combine for streamlined IoT data ingestion and low-latency operational applications directly on the Lakehouse.
Databricks Lakehouse Gets Postgres Boost on Azure
Databricks launches Azure Databricks Lakebase, a serverless PostgreSQL service integrating operational data into the lakehouse for unified app development and analytics.
Databricks Reffy: From Tribal Data to AI Answers
Databricks' Reffy uses AI and RAG to turn scattered customer stories into an instantly searchable knowledge base for sales and marketing.
Spark Ditches Dual Engines for Real-Time Mode
Databricks' new Real-Time Mode for Spark aims to deliver sub-second streaming speeds, eliminating the need for separate processing engines.