-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Pull requests: huggingface/tokenizers
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
chore(deps): bump shell-quote from 1.8.1 to 1.8.4 in /bindings/node
dependencies
Pull requests that update a dependency file
javascript
Pull requests that update Javascript code
#2093
opened Jun 9, 2026 by
dependabot
Bot
Loading…
Reduce BPE merge update allocations and add file progress logs
#2092
opened Jun 8, 2026 by
voidful
Loading…
fix(typing): correct encode() input typing (PreTokenizedInputSequence tuple + stub Any)
#2089
opened Jun 7, 2026 by
Anai-Guo
Loading…
fix(bpe): widen pair_counts to i64 + add overflow regression test (#2058)
#2087
opened Jun 6, 2026 by
pjdurden
Loading…
ByteLevel: single-pass byte-level transform (
apply_byte_map)
#2086
opened Jun 5, 2026 by
dmatth1
Loading…
Avoid walking sparse vocab holes during serialization
#2085
opened Jun 4, 2026 by
dfgvaetyj3456356-hash
Loading…
Validate BPE prefix merges without unchecked UTF-8
#2082
opened Jun 1, 2026 by
dfgvaetyj3456356-hash
Loading…
Complete incomplete Viterbi lattice tests
#2081
opened Jun 1, 2026 by
eunseo9311
Contributor
Loading…
Add group capture support for Replace normalizer in Rust and Python
#2080
opened May 30, 2026 by
ander-db
Loading…
Batch encode: coarsen rayon tasks with with_min_len
#2077
opened May 28, 2026 by
sebpop
Contributor
Loading…
bindings & bench: use mimalloc as global allocator on tested targets
#2073
opened May 26, 2026 by
sebpop
Contributor
Loading…
chore: enable Dependabot weekly GitHub Actions bumps
dependabot
#2071
opened May 26, 2026 by
hf-dependantbot-rollout
Bot
Loading…
Fix Unigram trainer prune loss to use per-piece alternative count
#2070
opened May 24, 2026 by
hunter-heidenreich
Loading…
chore(deps): bump qs and express in /tokenizers/examples/unstable_wasm/www
dependencies
Pull requests that update a dependency file
javascript
Pull requests that update Javascript code
#2067
opened May 22, 2026 by
dependabot
Bot
Loading…
chore(deps): bump qs and body-parser in /tokenizers/examples/unstable_wasm/www
dependencies
Pull requests that update a dependency file
javascript
Pull requests that update Javascript code
#2063
opened May 20, 2026 by
dependabot
Bot
Loading…
chore(deps-dev): bump webpack-dev-server from 5.2.1 to 5.2.4 in /tokenizers/examples/unstable_wasm/www
dependencies
Pull requests that update a dependency file
javascript
Pull requests that update Javascript code
#2062
opened May 20, 2026 by
dependabot
Bot
Loading…
security: reject nested-quantifier regex in Split/Replace to prevent ReDoS (CWE-1333)
#2060
opened May 17, 2026 by
Allen930311
Loading…
4 tasks
fix(bpe): widen pair_counts from i32 to i64 to prevent overflow on large corpora
#2059
opened May 17, 2026 by
xodn348
Loading…
serialize tokenizer vocab and added_tokens compactly
#2056
opened May 13, 2026 by
ArthurZucker
Collaborator
Loading…
Apply type_ids and sequence_id to overflow encodings in post-processors
#2055
opened May 12, 2026 by
1fanwang
Loading…
Fix invalid escape sequence in Whitespace docstring
#2054
opened May 10, 2026 by
eyupcanakman
Loading…
Add scaling_bench: encode_batch vs worker-pool comparison (#1900)
#2048
opened May 1, 2026 by
stargazerZJ
Loading…
5 of 6 tasks
Previous Next
ProTip!
What’s not been updated in a month: updated:<2026-05-10.