What does HackerNews think of pip-tools?

A set of tools to keep your pinned Python dependencies fresh.

Language: Python

#6 in Python
It's a topic close to my heart since that is one of the key thing that we solve in the OSS project windmill [1], letting users define simple single-file scripts in python/deno/go and handle the dependencies for them.

For deno, as the article points out, it's easy, deno has this feature baked in and we just have to deno run the scripts, similar for go where the imports are pretty much a fully qualified pointer to their source and version.

For python, we have to do a bit of machinery: we parse the AST, look at the imports, use the heuristics that most import names correspond to their pypi package (we maintain also a list of exception-mappings), pip-compile [2] all the imports, and then get a requirement file that we attach to the script, pip-install it before running the script (then we do a LOT of magic to cache the dependencies so one doesn't actually have to install them 99.9% of the time).

Note: if you're interested in a collection of single-file scripts to many APIs, that you can just copy-paste and call the main function of, that's what windmill's hub is for [3]

[1]: https://github.com/windmill-labs/windmill [2]: https://github.com/jazzband/pip-tools [3]: https://hub.windmill.dev/

Check out pip-tools [1] which does exactly that, albeit in a slightly more polished way.

[1]: https://github.com/jazzband/pip-tools

I've been using pip-compile from https://github.com/jazzband/pip-tools for this use case; a standard project Makefile defines "make update" which pip-compiles the current requirements, and "make install" installs the frozen requirements list.

This way I can install the same bill of materials every time

Never use pip freeze.

Instead, install pip-tools[0] then use the pip-compile command.

Why?

pip freeze will also pin dependencies of your dependencies, which makes your requirements.txt hard to read and extend.

Never manually create requirements.txt either because a programmer's job is to automate boring tasks like dependency pinning.

[0] https://github.com/jazzband/pip-tools

Look at what the well-regarded pip-tools (https://github.com/jazzband/pip-tools) does, for a disciplined approach to these requirements files. You don't necessarily need to adopt that tool, but can maybe take inspiration from it.
Piptools [1] resolves this by having a requirements.in file where you specify your top level dependencies, doing the lookup and merging of dependency versions and then generating a requirements.txt lock file. Honestly it’s the easiest and least complex of Python’s dependency tools that just gets out of your way instead of mandating a totally separate workflow a la Poetry.

[1] https://github.com/jazzband/pip-tools

pip-compile [0] is the best tool I've found for streamlining dependency resolution, but it may not provide any benefit for one-shot installs.

[0] https://github.com/jazzband/pip-tools

Use pip-compile[1] to generate a lock file and pip to install it. Update said lock file periodically to pull in package updates.

Simple and effective in my experience.

How is this falling short? Why such a push for different tooling? Why the continued complaints about Python packaging being substandard? Honest questions because I've been using this for web and cli applications for years without issue.

I also use pipx[2] to install Python based tools onto my system so they have their own virtualenv.

1: https://github.com/jazzband/pip-tools

2: https://github.com/pypa/pipx

You can use a tool like `pip-compile`[1] to freeze a requirements-style input down to precise version numbers, but I'm not sure if you can tell it to select only the "minimum viable" version for each dependency. It might be easier to assume (perhaps incorrectly) that all dependencies are semver, freeze them, and then re-write them down to their minimal versions and re-install.

[1]: https://github.com/jazzband/pip-tools

With pip-tools.

https://github.com/jazzband/pip-tools

And you can still use standard setup.cuff and pip install -e unlike Poetry. Also, much faster.

I personally still find Poetry and Pipenv to be a little heavy-handed for my preferred workflows. To that end, pip-tools [1] is what I recommend to those who feel similarly. It balances pinning and dependency management with common Python practices (e.g., requirements.txt).

1: https://github.com/jazzband/pip-tools

The lock file shortcoming is better remedied by pip-tools [1] as recommended by a neighboring comment.

[1]: https://github.com/jazzband/pip-tools

I don't use venv and other tools (I use docker for this) But here some points I found interesting comparing vanilla pip to npm (and tools listed in the article fixes it):

1. You have to manually freeze packages (instead of automatic package-lock.json)

2. Each time your install/remove package, dependant packages are not removed from freeze. You have to do it manually. (interesting link: https://github.com/jazzband/pip-tools)

3. Freeze is flat list (npm can restore tree structure)

Seen a few mentions of poetry. Not many for pip-tools which has been around longer, is less opinionated and has many of the same benefits https://github.com/jazzband/pip-tools
FWIW I had a lot of success using https://github.com/jazzband/pip-tools to have dependencies automatically managed in a virtualenv.

* Basically I would have a single bash script that every `.py` entrypoint links to.

* Beside that symlink is a `requirements.in` file that just lists the top-level dependencies I know about.

* There's a `requirements.txt` file generated via pip-tools that lists all the dependencies with explicit version numbers.

* The bash script then makes sure there's a virtual environment in that folder & the installed package list matches exactly the `requirements.txt` file (i.e. any extra packages are uninstalled, any missing/mismatched version packages are installed correctly).

This was great because during development if you want to add a new dependency or change the installed version (i.e. pip-compile -U to update the dependency set), it didn't matter what the build server had & could test any diff independently & inexpensively. When developers pulled a new revision, they didn't have to muck about with the virtualenv - they could just launch the script without thinking about python dependencies. Finally, unrelated pieces of code would have their own dependency chains so there wasn't even a global project-wide set of dependencies (e.g. if 1 tool depends on component A, the other tools don't need to).

I viewed the lack of `setup.py` as a good thing - deploying new versions of tools was a git push away rather than relying on chef or having users install new versions manually.

This was the smoothest setup I've ever used for running python from source without adopting something like Bazel/BUCK (which add a lot of complexity for ingesting new dependencies as you can't leverage pip & they don't support running the python scripts in-place).

I strongly recommend https://github.com/jazzband/pip-tools to solve this. It provides a simple script to take a requirements file and "compile" a full specification of all transitive dependencies. You check both files into the repo, point pip at the generated file, and manually modify the other one. It means you often don't need to pin requirements manually at all, and the versions will be explicitly updated whenever you choose to recompile your requirements.
How does this solve the problem? As I can tell it replaces a centralized dependencies (requirements.txt) containing references to a known and trusted registry (pypi.org) with url's in all files across your project (and 3rd party library's files). Making maitenance only harder and security updates even more so. And making it harder to debug things as you now keep in mind that multiple versions of the same packages can be used in your source. Also, requirements.txt already support's urls if you want your sources from other places.

For me, most of the pain with the Python packaging went away after I started using Pip-tools[0]. It's just a simple utility to add lockfile capabilities to Pip. Nothing new to learn, no new filosophies or paradigm's. No PEP waiting to be adopted by everyone. Just good old requirements.txt + Pip.

[0] https://github.com/jazzband/pip-tools

pip-tools is almost never mentioned because it's boring but great. I always default to it.

https://github.com/jazzband/pip-tools

When I was at Google I had a similar problem (team wasn't using Blaze). So what I did was to have a wrapper entrypoint around every python entrypoint that would just run that python entrypoint (e.g. foo would execute foo.py). The advantage was that the shell script would first set up a virtual environment for every entrypoint and install all the packages in the requirements.txt that was beside the entrypoint (removing any new ones). Each requirements.txt was compiled from a requirements.in file via pip-sync [1] which meant that devs only had to worry about declaring just the packages they actually directly depended on. Any change to requirements.in would require you to have run pip-sync which wouldn't (by default) upgrade any packages & only lock whatever the current version is (automation unit tests would validate that every requirements.txt matched the requirements.in file).

This didn't solve the multiple versions of python on the host. That was managed by having a bootstrap script written in python2 that would set up the development environment to a consistent state (i.e. install homebrew, install required packages) that anyone wanting to run the tools would run (no "getting started guides") which also versioned itself & was idempotent (generally robust against running multiple times). We also shipped this to our external partners in the factory. Generally worked well as once you ran the necessary scripts once no further internet access was required.

It wasn't easy but eventually it worked super reliably.

[1] https://github.com/jazzband/pip-tools

Oh, just realized that pip-tools (being discussed favorably downthread as a pipenv alternative) is a Jazzband member project: https://github.com/jazzband/pip-tools.
For whatever reason, I've never understood the point or actual benefit of virtualenvs.

Having lived more in the JS ecosystem for the last several years, my ideal workflow would be a copy of how the Yarn package manager works:

- Top-level dependencies defined in an editable metadata file

- Transitive dependencies with hashes generated based on the calculated dependency tree

- All dependencies installed locally to the project, in the equivalent of a `node_modules` folder

- All package tarballs / wheels / etc cached locally and committed in an "offline mirror" folder for easy and consistent installation

- Attempting to reinstall packages when they already are installed in that folder should be an almost instantaneous no-op

PEP-582 (adding standard handling for a "__pypackages__" folder) appears to be the equivalent of `node_modules` that I've wanted, but tools don't seem to support it yet. I'd looked through several Python packaging tools over the last year, but none of them seemed to support it yet (including Poetry [0]).

The only tool that I can find that really supports PEP-582 atm is `pythonloc` [1], which is really just a wrapper around `python` and pip` that adds that folder to the path. Using that and `pip-tools`, I was able to mostly cobble together a workflow that mimics the one I want. I wrote a `requirements.in` file with my main deps, generated a `requirements.txt` with the pinned versions and hashes with `pip-compile`, was able to download and cache them using `pip`, and installed them locally with `piploc`.

Admittedly, I've only tried this out once a few weeks ago on an experimental task, but it seemed to work out sufficiently, and I intend to implement that workflow on several of our Python services in the near future.

If anyone's got suggestions on better / alternate approaches, I'd be interested.

[0] https://github.com/python-poetry/poetry/issues/872

[1] https://github.com/cs01/pythonloc

[2] https://github.com/jazzband/pip-tools

I've had a good experience with pip-tools (https://github.com/jazzband/pip-tools/) which takes a requirements.in with loosely-pinned dependencies and writes your requirements.txt with the exact versions including transitive dependencies.
They're typically broken out of the box because they don't pin their dependencies. pip-tools[1] or pipenv[2], and tox[3] if it's a lib, should be considered bare minimum necessities - if a project isn't using them, consider abandoning it ASAP, since apparently they don't know what they're doing and haven't paid attention to the ecosystem for years.

[1] https://github.com/jazzband/pip-tools [2] https://docs.pipenv.org/en/latest/ [3] https://tox.readthedocs.io/en/latest/

I tried pipenv, then poetry, then pip-tools [1]. pip-tools worked best for me. I control my own virtualenvs, and can compose the requirements files pip-tools compiles. It's vendored in pipenv, so it's basically the dependency engine for pipenv.

[1] https://github.com/jazzband/pip-tools

Another good one is pip-tools: https://github.com/jazzband/pip-tools (which pipenv set out to replace initially, but it does a terrible job of it IMO)
Don't forget that usually you'll start to sort your requirements into dev requirements and production requirements which makes these packaging scripts much more complicated.

https://github.com/jazzband/pip-tools would be what I used before pipenv came to be.

I'm asking as someone who is mid level Python & DS experienced and out of real curiosity (not a "mine is better than yours" thought): Could you elaborate what in your opinion makes anaconda superior? I am frequently doing data munging & low level ML things and - so far - I am more happy with the pip side of things as a combination of virtualenv, autoenv [1], pip tools [2], pyup [3] and the rest of the eco system.

Standard procedure, takes two minutes and I have a Jupyter notebook up, smartly automated base processes for dependency management, full control of environment variables and can deploy/ integrate this in any other Python setup (if I were to port IPython code to pure Python).

Is this more a thing of each his own or am I missing a crucial advantage of Anaconda?

[1] https://github.com/kennethreitz/autoenv [2] https://github.com/jazzband/pip-tools [3] https://github.com/pyupio/pyup

requirements.txt don't track transitive dependencies. Your project could break if a dependency decided to swap out its own dependencies. This isn't theoretical either - it happened to us in production.

We are using `pip-tools` to manage that: https://github.com/jazzband/pip-tools

That great! The PyPA works hard to make pip+pypi behave intuitively. That said, a lot of more applications need more machinery, yielding utilities like pip-tools[0] and conda.

[0]: https://github.com/jazzband/pip-tools