Skip to content

Add dataset loading coverage for Hugging Face configs#1318

Draft
kajalj22 wants to merge 1 commit into
mainfrom
cursor/dataset-loading-tests-8058
Draft

Add dataset loading coverage for Hugging Face configs#1318
kajalj22 wants to merge 1 commit into
mainfrom
cursor/dataset-loading-tests-8058

Conversation

@kajalj22
Copy link
Copy Markdown
Contributor

Summary

  • Add a unit test that discovers configs with huggingface_identifier entries.
  • Exercise TrainDataProcessor.load_datasets() for train/validation Hugging Face datasets with mocked downloads.
  • Assert download routing, output paths, artifact-vs-split handling, token propagation, and GitLab avoidance when data_source=huggingface.

Tests

  • ruff check tests/unit_tests/test_dataset_loading.py
  • pytest tests/unit_tests/test_train_data_utils.py tests/unit_tests/test_dataset_loading.py -q

Linear Issue: AUT-241

Open in Web Open in Cursor 

Signed-off-by: Cursor Agent <cursoragent@cursor.com>

Co-authored-by: kajalj22 <kajalj22@users.noreply.github.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 13, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants