-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Open
Labels
bugBug reportBug report
Description
Describe the bug
When using the user_email_pattern deny rule with a wildcard pattern (".*") in Snowflake ingestion to hide all user emails for privacy reasons, the system fails to generate and display any column statistics. The statistics should still be calculated and shown even when user information is hidden.
To Reproduce
Steps to reproduce the behavior:
- Configure the Snowflake ingestion recipe with the following setting:
user_email_pattern: deny: - ".*"
- Run the Snowflake ingestion process
- Navigate to the metadata view where column statistics should be displayed
- Observe that no column statistics are generated or shown
Expected behavior
- User email information should be hidden/anonymized (privacy protection working as intended)
- Column statistics should still be calculated and displayed
- Query metadata should remain visible without exposing user identities
- The deny pattern should only affect user visibility, not the underlying statistical data generation
Environment Information
- DataHub Version: 1.2.0
- Source: Snowflake
- Ingestion Type: Snowflake connector
Use Case Context
We need to hide all user identities (who conducted queries) due to privacy requirements, while still maintaining visibility of:
- Query execution statistics
- Column usage statistics
- Other metadata metrics
Additional context
- We are aware of a bugfix in DataHub 1.3.0 related to user email handling, but that fix addresses respecting user email patterns in general and does not resolve this specific use case
- The current implementation appears to have a dependency between user visibility and statistics generation, where denying users inadvertently prevents statistics from being calculated/displayed. These should be independent processes
Metadata
Metadata
Assignees
Labels
bugBug reportBug report