Skip to content

v1.0.0

Latest
Compare
Choose a tag to compare
@pedro93 pedro93 released this 17 Mar 14:37

DataHub v1.0.0

Release Highlights

DataHub v1.0.0 is packed with exciting updates, including:

  • A completely redesigned user experience focused on simplified navigation and a visually stunning interface.
  • Unified support for Data & AI, including AI Model Group Versions, AI Model Lineage, Model Stats, and Experiment/Run ingestion.
  • DataHub Iceberg Catalog, allowing users to manage Iceberg tables directly from DataHub.

Read the blog post here!

Changelog

New User Interface: Putting Usability First

With a completely re-designed user interface, DataHub v1.0 represents a fundamental rethinking of how users interact with their metadata and data assets. The new experience includes:

  • Intuitive Platform-Based Navigation - Hierarchically browse data by database and schema in Snowflake, BigQuery, Redshift, Databricks, and more. Combine hierarchical navigation with filtering by data owners, domain, tags, and glossary terms to find the right data fast.
  • Seamless Lineage Exploration - Our reimagined lineage view features multi-level expansion, name-based search, and column-level visibility, making it easier than ever to understand data relationships and impact.
  • Integrated Data Quality - Make confident decisions with deeply integrated quality signals throughout the platform, helping you quickly identify and trust reliable data assets.

DataHub Admins can enable the new UI for all users by setting the THEME_V2_DEFAULT environment variable to true; until then, Users can opt into the new experience by navigating to Settings > Appearance > Try New User Experience.

Comprehensive AI Asset Support: Unifying Data and AI

DataHub v1.0 treats AI assets as first-class citizens within the data ecosystem, allowing users to track their entire data-to-AI pipeline in one place.

  • Unified Search and Discovery: Seamlessly search across models, model groups, and traditional data assets in one unified interface.
  • Advanced Versioning System: Track multiple versions of datasets and ML models with detailed performance metrics and clear linkages between versions.
  • Rich Model Statistics: Monitor key metrics across versions, understand performance trends, and make data-driven decisions about model deployment.
  • End-to-End Lineage: Trace data flows from raw inputs through models to final outputs, with complete versioning support.

DataHub Iceberg REST Catalog Beta: Simplifying Data Lake Management

This release introduces an integration with Apace Iceberg, allowing users to manage Iceberg tables directly through DataHub, including:

  • Create and manage Iceberg tables through DataHub
  • Maintain consistent metadata across DataHub and Iceberg
  • Facilitate data discovery by exposing Iceberg table metadata in DataHub
  • Enable secure access to Iceberg tables through DataHub's permissions model

Read the docs here!

DataHub CLI

This release introduces the following improvements to our CLI:

  • Added container command to apply tags, terms, and owners on all assets within the container. [ #12418, #12436]
  • Improved delete command to optionally reference a file with a list of URNS to be deleted. [#12247]
  • Expanded ingest command to support ingesting MCPs from S3. [#12649]

Metadata Ingestion

We’re continuously improving our integrations to add new capabilities and squash bugs.

  • dbt: Added the parameter include_database_name to support including the database name in URN generation. [#12411]
  • Iceberg: Alongside our new Iceberg Catalog API, we’ve made various improvements to our Iceberg integration. [#12744]
  • MLFlow: Significantly revamped our MLFlow connector, adding support for tracking Model Group Versions and Model Stats; tracking Model lineage to underlying datasets; and capturing Experiments and Runs.
  • MSSQL: Improved support for extracting stored procedures from MS SQL. [ #12244, #12563]
  • Oracle: Improved the accuracy of column-level lineage resolution.
  • PowerBI: Improved lineage mapping so PowerBI Reports can now contain PowerBI Dashboards. [#12451]
  • Redshift: Added support for data shares and external schemas, including automatic lineage resolution across Redshift namespaces.
  • S3: Added functionality to the S3 ingestion process to ignore paths that do not match the specified depth, resolving warning messages triggered by mismatched paths. [#12326]
  • Snowflake: Added support for Snowflake Streams and Hybrid Tables, and fixed a bug with lineage resolution across table renames. [#12318]
  • Superset: (community contribution!): Added support for Superset virtual datasets and lineage. [#12679]

Additionally, we’re working on a new integration with Vertex AI. Please reach out if you’re interested in joining the beta.

Of course, this only scratches the surface of changes. This release contains 100+ improvements across 25 different integrations.

Thank You to our Contributors!

First-Time Contributors

@Bhadhri03 @brock-acryl @cccs-cat001 @davidebriscese @Deepalijain13 @dougbot01 @Haebuk @haon85 @josges @mihai103 @rajatgl17 @Rasnar @rharisi @samanthafigueredo5 @ttekampe

Repeat Contributors

@bda618 @deepgarg-visa @eagle-25 @jayasimhankv @ksrinath @llance @Masterchen09 @mayurinehate @mkamalas @PeteMango @pinakipb2 @remisalmon @sagar-salvi-apptware @svdimchenko @v-tarasevich-blitz-brain

Project Maintainers

@anshbansal @asikowitz @chakru-r @chriscollins3456 @david-leifker @gabe-lyons @hsheth2 @jayacryl @jjoyce0510 @kevinkarchacryl @pedro93 @RyanHolstien @ryota-cloud @sakethvarma397 @sgomezvillamor @shirshanka @skrydal @treff7es @yoonhyejin

View the full changelog: v0.15.0.1...v1.0.0