Skip to content

chore: bump the training library to 0.8.0 and refactor tokenizer usage#3250

Merged
jaideepr97 merged 2 commits intoinstructlab:mainfrom
RobotSail:update-training
Apr 8, 2025
Merged

chore: bump the training library to 0.8.0 and refactor tokenizer usage#3250
jaideepr97 merged 2 commits intoinstructlab:mainfrom
RobotSail:update-training

Conversation

@RobotSail
Copy link
Member

The latest release of instructlab-training improves the usage for tokenizing datasets by removing the requirement of needing to always supply a chat template. This breaks the API for setup_tokenizer where in the past it needed to be passed some special objects only present in the training library. Now, it accepts only the model path and an optional template path. And given the scenario, it will automatically configure the tokenizer to the best of its abilities.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the
    conventional commits.
  • Changelog updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Functional tests have been added, if necessary.
  • E2E Workflow tests have been added, if necessary.

@mergify mergify bot added ci-failure PR has at least one CI failure dependencies Relates to dependencies and removed ci-failure PR has at least one CI failure labels Mar 31, 2025
@jaideepr97
Copy link
Member

also noticed that there are new linting failures because docling recently released a new version that includes this PR that has new fields in some of their APIs that we use (docling-project/docling#1010) which is getting pulled in

probably needs to be fixed in a separate PR

@booxter booxter mentioned this pull request Apr 2, 2025
Signed-off-by: Oleg Silkin <97077423+RobotSail@users.noreply.github.com>
@mergify mergify bot added ci-failure PR has at least one CI failure and removed ci-failure PR has at least one CI failure labels Apr 8, 2025
Signed-off-by: Jaideep Rao <jrao@redhat.com>
@mergify mergify bot removed the ci-failure PR has at least one CI failure label Apr 8, 2025
@mergify mergify bot added the one-approval PR has one approval from a maintainer label Apr 8, 2025
@mergify mergify bot removed the one-approval PR has one approval from a maintainer label Apr 8, 2025
Copy link
Member

@nathan-weinberg nathan-weinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@jaideepr97 jaideepr97 merged commit 4448a57 into instructlab:main Apr 8, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Relates to dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants