Whatever test runs during pre-commit must also run during normal CI/CD run. In case of Python: black formatter.

It must run during normal CI/CD because pre-commit hooks can be skipped.

So now I have two different black calls: in the pre-commit hook and in the CI/CD. And they must be of the same version.

Ad infinitum for all other tests.

This is the reason I don't use pre-commit framework. It leads to "double accounting". I have to sync the pre-commit tests and the CI/CD tests. Or am I missing something? Can the framework run off the venv dir?

Here's our setup, which is the result of several iterations and ergonomics refinements. Note: our stack is 90% python, with TS for frontend. Also 95% devs use mac (there's one data scientist on windows, he uses WSL).

We install enough utilities with `brew` to get pyenv working, use that to build all python versions. Then iirc `brew install pipx`, maybe it's `pip3 install --user pipx`. Anyway, that's the only python library binary installed outside a venv.

Pipx installs isort, black, dvc, and pre-commit.

Every repo has a Makefile. This drives all the common operations. Pyproject.toml (/eslint.json?) set the config for isort and black (or eslint). `make format` runs isort and black on python, eslint on js. `make lint` just verifies.

Pre-commit only runs the lint, it doesn't format. It also runs some scripts to ensure you aren't accidentally committing large files. Pre-commit also runs several DVC actions (the default dvc hooks) on commit, push, and checkout. These run in a venv managed by pre-commit. We just pin the version.

Github actions has a dedicated lint.yaml which runs a python linter action. We use the black version here to define which black pipx installs. We use `act` if we wanna see how an action runs without sending a commit just to trigger jobs.

As an aside, I'm still fiddling with the dvc `pre-commit` post-checkout hooks. They don't always pull the files when they ought to.

Most of the actual unit/integration tests run in containers, but they can run in a venv with the same logic, thanks to makefile. We use a dvc action to sync files in CI.

So yeah there's technically 2 copies of black and dvc, but we just use pinning. In practice, we've only had one issue with discrepancies in behavior locally vs CI, which was local black not catching a rule to avoid ''' for docstrings; using """ fixed it. On the whole, pre-commit saves against a lot of annoying goofs, but CI system is law, so we largely harmonize against that.

IMHO, this is the least egregious "double accounting" we have in local vs staging ci vs production ci (I lost that battle, manager would rather keep staing.yaml and production.yaml, rather than parameterize. Shrug.gif).

Other knowledge nuggets:

- pre-commit manages its own dependencies. This leads to surprising behavior if you aren't expecting it. Eg you need a special line to specify dvc[s3].

- black has yet to release a non-beta semver, which messes with solvers. This is super annoying. They might as well use 0ver if they don't want to commit to stability. Don't expect any kind of stability of formatting between versions. Hope they settle down soon.

- git-lfs is a nightmare. Two projects at $lastco used it. It's more trouble than it's worth. Just use DVC for yucky files. I have no affiliation with dvc, other than a few bug reports.

- makefiles are great and IMHO underrated. But they have their limits. More complex logic should be broken out into scripts.

- python dependency management is still a kafkaesque nightmare. I say this with over a decade of python experience and it's my favorite language despite this.

- Suggestions welcome!

Technologies referenced:

https://dvc.org/

https://github.com/iterative/setup-dvc

https://github.com/marketplace/actions/python-linter

https://github.com/nektos/act