Use instructlab-schema package to parse qna.yaml files by bjhargrave · Pull Request #1962 · instructlab/instructlab

bjhargrave · 2024-08-02T16:34:05Z

We replace parsing/validation code to use the shared code from instructlab-schema package.

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the
conventional commits.
Changelog updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

mergify · 2024-08-08T17:59:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. @bjhargrave please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2024-08-09T21:03:51Z

This pull request has merge conflicts that must be resolved before it can be
merged. @bjhargrave please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mairin · 2024-08-12T15:19:28Z

Right now taxonomy diff does 5 layers of checking. Those 5 layers will be handled by this now.

@nathan-weinberg can you describe the 5 layers here?

This came up in our team weekly meeting today, when it was suggested taxonomy diff should check the yaml schema version and let the user know if supported or not. Right now we flag at sdg generate and that's a bit late in the process.

bjhargrave · 2024-08-12T15:34:58Z

The parsing/validation code will check the knowledge version for v3 and report an error for v1/v2. There is a test case in this PR for this.

https://github.com/instructlab/instructlab/pull/1962/files#diff-f2fb20ddc8bf7f3b80b1a99bfe9fd3e563c5675f9faf7f914ac7f74e276d15f1R348-R368

The parsing/validation code will handle the linting and json schema validation reporting errors if found. I don't know what the 5 layers are.

We replace parsing/validation code to use the shared code from instructlab-schema package. Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>

nathan-weinberg · 2024-08-19T02:51:22Z

5 layers are:

file extension check (is it .yaml)
content check (is the file empty)
YAML lint check (is the file valid yaml)
Schema check (does the file follow the taxonomy schema)
Knowledge check (does the file have everything needed for knowledge if applicable)

This PR should defintely be a 0.19.0 priority IMO - we want to align the schema checking as much as possible while minimizing code duplication (there's some code for this in the SDG library as well)

src/instructlab/utils.py

RobotSail

LGTM

mergify bot added CI/CD Affects CI/CD configuration testing Relates to testing dependencies Relates to dependencies labels Aug 2, 2024

bjhargrave requested a review from hickeyma August 2, 2024 16:35

nathan-weinberg requested review from nathan-weinberg and russellb August 5, 2024 20:20

nathan-weinberg linked an issue Aug 5, 2024 that may be closed by this pull request

Remove yamllint from requirements.txt #879

Closed

mergify bot added the needs-rebase This Pull Request needs to be rebased label Aug 8, 2024

russellb removed their request for review August 8, 2024 18:21

bjhargrave force-pushed the schema-taxonomy-parsing branch from fc4518c to 2c90fb8 Compare August 8, 2024 19:17

mergify bot removed the needs-rebase This Pull Request needs to be rebased label Aug 8, 2024

nathan-weinberg added this to the 0.19.0 milestone Aug 9, 2024

mergify bot added the needs-rebase This Pull Request needs to be rebased label Aug 9, 2024

bjhargrave force-pushed the schema-taxonomy-parsing branch from 2c90fb8 to af90b58 Compare August 9, 2024 21:24

mergify bot removed the needs-rebase This Pull Request needs to be rebased label Aug 9, 2024

Use instructlab-schema package to parse qna.yaml files

0c3a5b2

We replace parsing/validation code to use the shared code from instructlab-schema package. Signed-off-by: BJ Hargrave <hargrave@us.ibm.com>

bjhargrave force-pushed the schema-taxonomy-parsing branch from af90b58 to 0c3a5b2 Compare August 16, 2024 15:34

nathan-weinberg requested a review from a team August 19, 2024 02:51

RobotSail reviewed Aug 19, 2024

View reviewed changes

src/instructlab/utils.py Show resolved Hide resolved

RobotSail approved these changes Aug 19, 2024

View reviewed changes

mergify bot added the one-approval PR has one approval from a maintainer label Aug 19, 2024

bjhargrave mentioned this pull request Aug 19, 2024

Yaml formatting allow more than 120 character lines #2100

Closed

nathan-weinberg approved these changes Aug 19, 2024

View reviewed changes

mergify bot removed the one-approval PR has one approval from a maintainer label Aug 19, 2024

bjhargrave removed the request for review from hickeyma August 19, 2024 19:44

mergify bot merged commit 1bb13cc into instructlab:main Aug 19, 2024

bjhargrave deleted the schema-taxonomy-parsing branch August 19, 2024 19:45

hickeyma mentioned this pull request Aug 22, 2024

Use instructlab-schema package to parse qna.yaml files instructlab/sdg#62

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use instructlab-schema package to parse qna.yaml files#1962

Use instructlab-schema package to parse qna.yaml files#1962
mergify[bot] merged 1 commit intoinstructlab:mainfrom
bjhargrave:schema-taxonomy-parsing

bjhargrave commented Aug 2, 2024

Uh oh!

mergify bot commented Aug 8, 2024

Uh oh!

mergify bot commented Aug 9, 2024

Uh oh!

mairin commented Aug 12, 2024 •

edited

Loading

Uh oh!

bjhargrave commented Aug 12, 2024

Uh oh!

nathan-weinberg commented Aug 19, 2024

Uh oh!

Uh oh!

RobotSail left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

bjhargrave commented Aug 2, 2024

Uh oh!

mergify bot commented Aug 8, 2024

Uh oh!

mergify bot commented Aug 9, 2024

Uh oh!

mairin commented Aug 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjhargrave commented Aug 12, 2024

Uh oh!

nathan-weinberg commented Aug 19, 2024

Uh oh!

Uh oh!

RobotSail left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mairin commented Aug 12, 2024 •

edited

Loading