What does HackerNews think of msgspec?

A fast serialization and validation library, with builtin support for JSON, MessagePack, YAML, and TOML

Language: Python

#23 in JSON
#164 in Python
+1 for litestar[1]. The higher bus-factor is nice, and I like that they're working to embrace a wider set of technologies than just pydantic. The framework currently lets you model objects using msgspec[2] (they actually use msgspec for all serialization), pydantic, or attrs[3], and the upcoming release adds some new mechanisms for handling additional types. I really appreciate the flexibility in modeling APIs; not everything fits well into a pydantic shaped box.

[1]: https://litestar.dev/

[2]: https://github.com/jcrist/msgspec

[3]: https://www.attrs.org/en/stable/

If you like cattrs, you _might_ be interested in trying out my msgspec library [1].

It works out-of-the-box with attrs objects (as well as its own faster `Struct` types), while being ~10-15x faster than cattrs for encoding/decoding/validating JSON. The hope is it's easy to integrate msgspec with other tools (like attrs!) rather than forcing the user to rewrite code to fit the new validation/serialization framework. It may not fit every use case, but if msgspec works for you it should be generally an order-of-magnitude faster than other Python options.

[1]: https://github.com/jcrist/msgspec

> Maybe it was very slow before

That is at least partly the case. I maintain msgspec[1], another Python JSON validation library. Pydantic V1 was ~100x slower at encoding/decoding/validating JSON than msgspec, which was more a testament to Pydantic's performance issues than msgspec's speed. Pydantic V2 is definitely faster than V1, but it's still ~10x slower than msgspec, and up to 2x slower than other pure-python implementations like mashumaro.

Recent benchmark here: https://gist.github.com/jcrist/d62f450594164d284fbea957fd48b...

[1]: https://github.com/jcrist/msgspec

While it's definitely much faster than pydantic V1 (which is a huge accomplishment!), it's still not exactly what I'd call "fast".

I maintain msgspec (https://github.com/jcrist/msgspec), a serialization/validation library which provides similar functionality to pydantic. Recent benchmarks of pydantic V2 against msgspec show msgspec is still 15-30x faster at JSON encoding, and 6-15x faster at JSON decoding/validating.

Benchmark (and conversation with Samuel) here: https://gist.github.com/jcrist/d62f450594164d284fbea957fd48b...

This is not to diminish the work of the pydantic team! For many users pydantic will be more than fast enough, and is definitely a more feature-filled tool. It's a good library, and people will be happy using it! But pydantic is not the only tool in this space, and rubbing some rust on it doesn't necessarily make it "fast".

While I agree that there are ways to write a faster validation library in python, there are also benefits to moving the logic to native code.

msgspec[1] is another parsing/validation library, written in C. It's on average 50-80x faster than pydantic for parsing and validating JSON [2]. This speedup is only possible because we make use of native code, letting us parse JSON directly and efficiently into the proper python types, removing any unnecessary allocations.

It's my understanding that pydantic V2 currently doesn't do this (they still have some unnecessary intermediate allocations during parsing), but having the validation logic already in compiled code makes integrating this with the parser theoretically possible later on. With the logic in python this efficiency gain wouldn't be possible.

[1]: https://github.com/jcrist/msgspec

[2]: https://jcristharif.com/msgspec/benchmarks.html#benchmark-sc...

Someone recommended here msgspec as a Pydantic alternative for serialization/validation and wow. It is fantastic. I really recommend it https://github.com/jcrist/msgspec
Congratulations to the team, Pydantic is an amazing library.

If you find JSON serialization/deserialization a bottleneck, another interesting library (with much less features) for Python is msgspec: https://github.com/jcrist/msgspec

If you're primarily targeting Python as an application layer, you may also want to check out my msgspec library[1]. All the perf benefits of e.g. yyjson, but with schema validation like pydantic. It regularly benchmarks[2] as the fastest JSON library for Python. Much of the overhead of decoding JSON -> Python comes from the python layer, and msgspec employs every trick I know to minimize that overhead.

[1]: https://github.com/jcrist/msgspec

[2]: https://github.com/TkTech/json_benchmark

Integrating `dataclass` more into the language builtins might be nice, if only that it may allow/encourage a more native and performant implementation. Using a dataclass right now results in slightly slower class operations than handwritten types, and much slower import times.

I maintain another dataclass-like library[1] that's written fully as a C extension. Moving this code to C means these types are typically 5-10x faster for common operations[2]. It'd be nice if the builtin dataclasses were equally performant.

[1]: https://github.com/jcrist/msgspec

[2]: https://jcristharif.com/msgspec/benchmarks.html#benchmark-st...