Skip to content

Add columns to hold WARC-Record-ID and WARC-IP-Address#55

Draft
sebastian-nagel wants to merge 4 commits into
mainfrom
30-ip-address-42-record-id
Draft

Add columns to hold WARC-Record-ID and WARC-IP-Address#55
sebastian-nagel wants to merge 4 commits into
mainfrom
30-ip-address-42-record-id

Conversation

@sebastian-nagel
Copy link
Copy Markdown
Contributor

Add two new columns warc_record_id and warc_ip_address holding the WARC-Record-ID resp. WARC-IP-Address. Both columns use the STRING logical type.

This PR addresses:

Add WARC-Record-ID column (#42)

Add two new columns `warc_record_id` and `warc_ip_address`
holding the WARC-Record-ID resp. WARC-IP-Address as STRING
logical type.
Configure column `warc_record_id` as not applicable for
Parquet dictionary encoding, because all values are unique.
Add WARC-Record-ID column (#42)

Make the columns `warc_record_id` and `warc_ip_address` to hold
binary data (primitive type BYTE_ARRAY).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant