Skip to content

Parallel execution of individual hooks? #510

@chriskuehl

Description

@chriskuehl

I have this one hook that runs puppet parser validate which is extremely slow. Running it on just 185 files takes 105 seconds which makes my tests (which call pre-commit run --all-files, among other things) annoying enough to run that most people don't run them locally.

What if pre-commit's xargs implementation could do things in parallel? Here's a sketch of a patch that does that (only works on Python 3): https://i.fluffy.cc/t43V5vqd3VH9lTQfl8djnfZWBV2zDDTZ.html

This takes my test time from 105 seconds to 15 seconds.

Some thoughts:

  • If any hooks write to files besides the ones they are linting, this could break. This is a problem, though pre-commit is heavily designed around operations on individual files, so the vast majority of hooks should be okay. We could offer an opt-in or opt-out at the individual hook level?
  • Parallelizing different hooks (running puppet-lint at the same time as puppet-validate) would be neat but I think is way more problematic (and introduces crazy locking problems since you can't have two hooks running on the same file at once).
  • Because pre-commit captures output and displays it at the end, I don't think we have any of the usual problems of interleaved/confusing output. The printing happens in the main thread when collecting statuses and shouldn't have races.
  • concurrent.futures works great here, but is Python 3 only. I'm not sure how hard this is to do in a compatible way.

@asottile what do you think? Is this too complicated to be worth pursuing?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions