-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Migrating to typed and validated configurations
This Issue will act as a placeholder to summarize the work in progress for migrating from plain dictionaries to typed and validated configuration classes. Originally this idea was formulated in #3172, but to facilitate easier review and a smooth migration the progress will be spilt into some intermediate steps with separate PRs. The old configuration system will stay in place during the intermediate steps and a versioning system will be put in place to facilitate long-term backwards compatibility across config versions.
Procedure
The migration will consist of small PRs subsequent changes, which can be reviewed separately, before merging into a feature branch.
Work in progress / roadmap
(ticking after review and merging into the feature branch)
- Centralize project configuration logic
- Improve testing to capture existing behavior
- New config_mixin for easy moving between types (yaml file <> dict <> dataclass <> DictConfig)
- Add typed configs for project config, pytorch config, etc. with pydantic validation
- Add typed 3D Project Configs
- Add typed configs for tensorflow
- Replace dictionary configs with identical DictConfig version (smooth transition)
- Add versioning system for migration between old and new configs
- Improve configurations -> reduce duplicated fields, correct casing
- Address in-place configuration edits throughout the pipeline
- Add aliasing system for accessing new fields using old fieldnames (e.g. corer2move2 -> corner2move2)
- Replace loaders in core config (e.g. deprectate
read_configin favor of typed ProjectConfig) - Fix circular imports in core/config
- Verify test coverage and improve if necessary
- Improve loaders in e.g.
deeplabcut/pose_estimation_pytorch/data/base.py(mismatching init, mixed project/pose logic) - Consider removing OmegaConf dependency and migrate to fully typed
- add LazyConfig ?
Related PRs
- [dev] C1 - centralize project config I/O and add testing #3190
- [dev] C2 - Add typed configs as pydantic dataclasses & omegaconf dictconfigs #3191
- [dev] C3 - Replace configuration dictionaries with DictConfigs. #3194
- [dev] C4 - add config migration system #3197
...
[WIP] Final migration to configuration version 1: structured and validated configs #3198
Motivation
As formulated by @arashsm79 in #3172
Summary
- Introduce new configuration classes for inference, logging, model, pose, project, runner, and training settings.
- Refactore data loading mechanisms to utilize new configuration structures.
- Move the multithreading and compilation options in inference configuration to the config module.
- Typed configuration for logging.
- Update dataset loaders to accept model configurations directly or via file paths.
Why Typed & Structured Configuration (OmegaConf + Pydantic)
-
Strong guarantees for correctness
- Runtime type safety ensures invalid configs fail fast with clear errors instead of silently producing incorrect training runs.
- Schema-validated configs dramatically reduce debugging time for users and maintainers.
-
Static typing improves developer velocity
- IDE autocomplete and inline documentation make configs discoverable and self-documenting.
- Refactors become safer: config changes are more likely to be caught at development time.
-
Hierarchical, composable configuration
- Natural representation of DeepLabCut’s nested project/model/training settings.
- Easy composition and merging from multiple sources (base config, model presets, experiment overrides).
-
Cleaner overrides and defaults.
-
Structured configs make it easier to define parameter ranges for tuning and automation.
-
Config schemas can be versioned and evolve safely over time while preserving backward compatibility.
-
Full, validated configuration can be saved alongside results, which improves reproducibility and transparency.
-
Builds on well-maintained, widely adopted libraries (OmegaConf, Pydantic).
Resources for knowing more about structured configs:
- hydra
- MIT Responsible AI's hydra-zen
- Pydantic
- Omegaconf
- Soklaski, R., Goodwin, J., Brown, O., Yee, M. and Matterer, J., 2022. Tools and practices for responsible AI engineering. arXiv preprint arXiv:2201.05647.
Future Work
- Currently default model definitions are still stored as yaml files in the package. Moving to LazyConfig as in Detectron 2 would improve things significantly.
More things that could be done ( @deruyter92 ):
- I think we need to make sure that everytime a model is used, all the changes to the project's
config.yamlare reflected in the model's configuration undermetadataas well. - There might be a better way to handle things in
deeplabcut/pose_estimation_pytorch/data/base.py.