Skip to content

Column Statistics Not Generated When All Users Are Denied #15075

@rospe

Description

@rospe

Describe the bug

When using the user_email_pattern deny rule with a wildcard pattern (".*") in Snowflake ingestion to hide all user emails for privacy reasons, the system fails to generate and display any column statistics. The statistics should still be calculated and shown even when user information is hidden.

To Reproduce

Steps to reproduce the behavior:

  1. Configure the Snowflake ingestion recipe with the following setting:
    user_email_pattern:
      deny:
        - ".*"
  2. Run the Snowflake ingestion process
  3. Navigate to the metadata view where column statistics should be displayed
  4. Observe that no column statistics are generated or shown

Expected behavior

  • User email information should be hidden/anonymized (privacy protection working as intended)
  • Column statistics should still be calculated and displayed
  • Query metadata should remain visible without exposing user identities
  • The deny pattern should only affect user visibility, not the underlying statistical data generation

Environment Information

  • DataHub Version: 1.2.0
  • Source: Snowflake
  • Ingestion Type: Snowflake connector

Use Case Context

We need to hide all user identities (who conducted queries) due to privacy requirements, while still maintaining visibility of:

  • Query execution statistics
  • Column usage statistics
  • Other metadata metrics

Additional context

  • We are aware of a bugfix in DataHub 1.3.0 related to user email handling, but that fix addresses respecting user email patterns in general and does not resolve this specific use case
  • The current implementation appears to have a dependency between user visibility and statistics generation, where denying users inadvertently prevents statistics from being calculated/displayed. These should be independent processes

Metadata

Metadata

Assignees

Labels

bugBug report

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions