What does HackerNews think of pex?

A tool for generating .pex (Python EXecutable) files, lock files and venvs.

Language: Python

We get (very) close to cross-environment reproducible builds for Python with https://github.com/pantsbuild/pex (via Pants). For instance, we build Linux x86-64 artifacts that run on AWS Lambda, and can build them natively on ARM macOS. (As pointed out elsewhere, wheels are an important part of this.)

This is not raw requirements.txt, but isn’t too far off: Pants/PEX can consume one to produce a hash-pinned lock file.

Don't know if I agree about the goto thing, but there are actually a number of options now for delivering varying degrees of self-contained Python executable.

When I evaluated the landscape a few years ago, I settled on PEX [1] as the solution that happened to fit my use-case the best— it uses a system-provided Python + stdlib, but otherwise brings everything (including compiled modules) with it in a self-extracting executable. Other popular options include pyinstaller and cx_freeze, which have different tradeoffs as far as size, speed, convenience, etc.

[1]: https://github.com/pantsbuild/pex

Somewhat related, I had a devil of a time a little bit ago trying to ship a small Python app as a fully standalone environment runnable on "any Linux" (but for practical purposes, Ubuntu 16.04, 18.04, and 20.04). It turns out that if you don't want to use pip, and you don't want to build separate bundles for different OSes and Python versions, it can be surprisingly tricky to get this right. Just bundling the whole interpreter doesn't work either because it's tied to a particular stdlib which is then linked to specific versions of a bunch of system dependencies, so if you go that route, you basically end up taking an entire rootfs/container with you.

After evaluating a number of different solutions, I ended up being quite happy with pex: https://github.com/pantsbuild/pex

It basically bundles up the wheels for whatever your workspace needs, and then ships them in an archive with a bootstrap script that can recreate that environment on your target. But critically, it natively supports the idea of targeting multiple OS and Python versions, you just explicitly tell it which ones to include, eg:

    --platform=manylinux2014_x86_64-cp-38-cp38   # 16.04
    --platform=manylinux2014_x86_64-cp-36-cp36m  # 18.04
    --platform=manylinux2014_x86_64-cp-35-cp35m  # 20.04
Docs on this: https://pex.readthedocs.io/en/latest/buildingpex.html#platfo...

And you can see the tags in use for any package on PyPI which ships compiled parts, eg: https://pypi.org/project/numpy/#files

I don't know that this would be suitable for something like a game, but in my case for a small utility supporting a commercial product, it was perfect.

Nuitka compiles Python to C, so it can make your code run faster, but you're no longer using the tried and tested CPython interpreter so you may run into Nuitka-specific issues (I haven't used it and don't know how common they are, I assume it's usable though)

PyOxidizer and pyinstaller are rather bundlers that build a CPython interpreter and all your dependencies into a single binary. There is also https://github.com/pantsbuild/pex which is a bit like jar files for Java, but it also bundles your code, dependencies and an interpreter into a single executable file.

I wish it had a better async story. We've had production outages that were difficult to debug because some third party library made a sync call deep in the call stack and starved the event loop. APM showed performance degradation in unrelated endpoints. Eventually health checks began failing and containers were killed, putting the load on other containers which inevitably fell over and so on. We've seen similar issues with CPU starvation due to CPU-intensive tasks as well, though they were more straightforward to debug. We also continue to see runtime type errors because someone forgot to await an async function: `rsp = aiohttp.get() # oops, rsp is the promise, not the response!`.

As far as deployment goes, we've had good luck with pex files (executable zip files containing everything but the interpreter). https://github.com/pantsbuild/pex. Your deployment target still needs the right version of the Python interpreter and .so files that your dependencies might link against.

Personally, I've found that Go solves most/all Python issues without introducing too many of its own--and anyone writing Python (sans mypy) doesn't get to chastise Go for lacking generics! :)

It's easy to make self-contained scripts with dependencies using Pex: https://github.com/pantsbuild/pex/
Here we go :)

Packaging, the easy way

Because I'm not on a blog, I can't go too much into details, and I'm sorry about that. It would be better to take more time on each point, but use them as starting point. I'll assume you know what virtualenv and pip. If you don't, check a tutorial on them first, it's important.

But I'm going to go beyond packaging, because it will make your life much easier. If you want to skip context, just go to the short setup.cfg section.

1 - Calling Python

Lots of tutorials tell you to use the "python" command. But in reality, often several versions of Python are installed, or worst, the "python" command is not available.

WINDOWS:

If the python command is not available, uninstall Python, and install it back again (using the official installer), but this time making sure that the "Add Python to PATH" box is ticked. Or add the directory containing "python.exe", and its sibling "Scripts" directory to the OS system PATH manually (check a tutorial on that). Restart the console.

Also, unrelated, but use a better console. cmd.exe sucks. cmder (https://cmder.net/) is a nice alternative.

Then, don't use the Python command on Windows. Use the "py -x.y" command. It will let you choose which version of Python you call. So "py -2.7" calls python 2.7 (if installed) and "py -3.6" calls Python 3.6. Every time you see a tutorial on Python telling you to do "python this", replace it mentally with "py -x.y".

UNIXES (mac, linux, etc):

Python is suffixed. Don't just call "python". Call pythonX.Y. E.G: python2.7 to run python 2.7 and python3.6 to run Python 3.6. Every time you see a tutorial on Python tell you to do "python this", replace it mentally with "pythonX.Y". Not PythonX. Not "python2" or "python3". Insist on being precise: python2.7 or python3.5.

LINUX:

pip and virtualenv are often NOT installed with Python, because of packaging policies. Install it with your package manager for each version of Python. E.G: "yum install python3.6-pip" or "apt install python3.6-venv".

FINALLY, FOR ANY OS:

Use "-m". Don't call "pip", but "python -m pip". Don't call "venv", but "python -m venv". Don't call poetry but "python -m poetry." Which, if you follow the previous advices, will lead to things like "python3.6 -m pip" or "py -3.6 -m pip". Replace it mentally in tutorials, including this one.

This will solve all PATH problems (no .bashrc or windows PATH fiddling :)) and will force you to tell which python version you use it with. It's a good thing.

In any case, __use a virtualenv as soon as you can__. Use virtualenv for everything. One per project. One for testing. One for fun. They are cheap. Abuse them.

In the virtualenv you can discard all the above advices: you can call "python" without any "py -xy" or suffixes, and you can call "pip" or "poetry" without "-m". Because the PATH is set correctly, and the default version of Python is the one you want.

But there are some tools you will first install outside of venv, such as pew, poetry, etc. For those, use "-m" AND "--user". E.G:

    "python -m pip install poetry --user"
    "python -m poetry init"
This solves PATH problems, python version problems, doesn't require admin rights and avoid messing with system packages. Do NOT use "sudo pip" or "sudo easy_install".

2 - Using requirements.txt

You know the "pip install stuff", "pip freeze > requirements.txt", "pip install -r requirements.txt" ?

It's fine. Don't be ashamed of it. It works, it's easy.

I still use it when I want to make a quick prototype, or just a script.

As a bonus, you can bundle a script and all it's dependencies with a tool named "pex" (https://github.com/pantsbuild/pex):

    pex . -r requirements.txt -o resulting_bundle.pex --python pythonX.Y -c your_script.py -f dist --disable-cache
 
It's awesome, and allows you to use as many 3rd party dependencies as you want in quick script. Pex it, send it, "python resulting_bundle.pex" and it runs :)

3 - Using Setup.cfg

At some point you may want to package your script, and distribute it to the world. Or maybe just make it pip installable from your git repo.

Let's say you have this layout for your project:

    root_project_dir/
    ├── your_package
    ├── README.md
Turn it into:

    root_project_dir/
    ├── your_package
    ├── README.md
    ├── setup.cfg
    ├── setup.py

And you are done. Setup.py needs only one line, it's basically just a way to call setuptools to do the job (it replaces the poetry or pipenv command in a way):

    from setuptools import setup; setup()
Setup.cfg will contain the metadata of your package (like a package.json or a pyproject.toml file):

    [metadata]
    name = your_package
    version = attr: your_package.__version__
    description = What does it do ?
    long_description = file: README.md
    long_description_content_type = text/md
    author = You
    author_email = [email protected]
    url = https://stuff.com
    classifiers = # not mandatory but the full list is here: https://pypi.org/pypi?%3Aaction=list_classifiers
        Intended Audience :: Developers
        License :: OSI Approved :: MIT License
        Programming Language :: Python :: 3.5
        Programming Language :: Python :: 3.6
        Programming Language :: Python :: 3.7
        Topic :: Software Development :: Libraries

    [options]
    packages = your_package
    install_requires =
        requests>=0.13 # or whatever

    [options.package_data]
    * = *.txt, *.rst
    hello = *.msg

    [options.extras_require]
    dev = pytest; jupyter # stuff you use for dev

    [options.package_data] # non python file you want to include
    * = *.jpg
You can find all the fields available in the setup.cfg here: https://setuptools.readthedocs.io/en/latest/setuptools.html#...

Setup.cfg has been supported for 2 years now. It's supported by pip, tox, all the legacy infrastructures.

Now, during dev you can do: "pip install -e root_project_dir". This will install your package, but the "-e" option will make it work in "dev mode", which allow you to import it, and see modifications you did to the code without reinstalling it every time. "setup.py develop" works too.

If you publish it on github, you can now install your package doing:

    pip install git+https://github.com/path/to/git/repo.git
You can also create a wheel out of it doing:

    python setup.py bdist_wheel
The wheel will be in the "dist" dir.

Anybody can then "pip install your_package.whl" to install it. Mail it, upload it on an ftp, slack it...

If you want to upload it on pypi, create an account on the site, then "pip install twine" so you can do:

    twine upload dist/*
Read the twine doc though, it's worth it: https://pypi.org/project/twine/

You could use "python setup.py bdist_wheel upload" instead of twine. It will work, but it's deprecated.

4 - Using pew

Pew (https://github.com/berdario/pew#usage) is an alternative to venv, poetry, virtualenvwrapper and pipenv.

It does very little.

    "pew new env_name --python python3.X"
Creates the virtualenv.

    "pew workon env_name"
Activates it. And optionally moves you to a directory of your choice.

That's all. It's just a way to make managing virtualenv easier. Use pip as usual.

You can know where your virtualenv has been created by looking up the $VIRTUAL_ENV var.

This is especially useful for configuring your IDE, although I tend to just type "which python" on unix, and "where python" on Windows.

5 - Using poetry

Now, if you need more reliability, poetry enters the game. Poetry will manage the virtualenv for you, will install packages in it automatically, will check all dependencies in a fast and reliable way (better than pip), creates a lock file AND update your package metadata file.

I'm not going to enter into details on how this work, it's a great tool, with a good doc: https://github.com/sdispater/poetry

You don't need to start with poetry. You can always migrate to it later. Most of my projects do the "requirements.txt" => "setup.cfg" migration at some point. Some of them move to poetry if I need the extra professionalism it provides.

The problem with poetry is that it's only compatible with poetry. It uses the pyproject.toml format, which is supposedly standard now, but is unfinished. Because of this: any tool using it, including poetry, actually stores most data in custom proprietary fields in the file :( Also, it's not compatible with setuptools, which many infrastructures and tutorials assume. So you'll have to adapt to it.

That being said, it's a serious and robust tool.

Nuitka (nuitka.net) does that for python, and is very robust and compatible. It works way better than py2exe, cx_freeze, etx, and is cross platform.

However, the result, even for small scripts, is not what I would qualify as "small" :)

I also like pex (https://github.com/pantsbuild/pex) which allows to bundle a whole venv as a python script. Less heavy than nuitka, doesn't make it stand alone though, but very handy for quick and dirty scripts I want to one shot on my servers.

That doesn't remove any merit to nim. Such a cool project.

I don't know if it'd work the same way, but I've had a lot of success with Twitter's Pex files. They package an entire Python project into an archive with autorun functionality. You distribute a Pex file and users run it just like a Python file and it'll build/install dependencies, etc. before running the main script in the package.

I used it to distribute dependencies to Yarn workers for PySpark applications and it worked flawlessly, even with crazy dependencies like tensorflow. I'm a really big fan of the project, it's well done.

https://github.com/pantsbuild/pex