What does HackerNews think of indexing?

The Git source code audit, viewed as a Rust programmer | Jan 2023

Defining a newtype avoids confusing indexes of, say, a graph and a Vec, but doesn't avoid confusing indexes of two graphs

Rust has a design pattern called lifetime branding (also called generativity), which uses phony lifetimes to prevent, at compile time, confusing indexes of two separate collections of the same type. This can also enable disabling out of bounds checks without triggering UB (because, with branding, we can be sure that the index is on bounds at the moment of creating it; essentially we move bounds check from the indexing time to the index creation time)

Here's an earlier mention of that [0] (7 years ago), and here's a crate from 3 years ago, indexing [1] but I'm not sure about recent developments on that.

Now, petgraph doesn't use branding for its index types, so if you have two graphs you can confuse their indexes. On the other hand, petgraph was specifically designed so that you can reuse the node and edge numbering across many graphs (so that if you have a subgraph for example, the nodes and edges share the same ids), in this situation it's kind of hard to use branding

There's another pattern for not confusing index types which is to make different index types for each different collection and make the collection work only with that type; this is done eg. in typed-indexed-collections [2] - but it doesn't use branding so two collections with same index type have interchangeable indexes

Anyway right now this stuff is mostly folklore but I wish it were more used.

[0] https://www.reddit.com/r/rust/comments/3oo0oe/sound_unchecke...

[1] https://github.com/bluss/indexing https://docs.rs/indexing/0.4.1/indexing/

https://crates.io/crates/typed-index-collections https://www.reddit.com/r/rust/comments/hr6xcu/announcing_typ...

Monads and GATs in Nightly Rust | Dec 2020

Does anybody know if GATs would enable to issue indexes/handles tied to a container without the use of closures? This would allow compile-time checks that indexes are only used with their container.

The most advanced crate for such use-case is `indexing` [0] but it is pretty complex inside, requires closures and seems more like an experiment.

[0]: https://github.com/bluss/indexing

Making Wrong Code Look Wrong (2005) | Nov 2019

Expand Context ↕

I think in this particular case they suffice, because they are clearly different subtypes as I've mentioned.

It is much harder, if not impossible, if you have to describe a relation between two or more values of the same subtypes. For example, imagine that you already "know" certain indices never go off the boundary of given array but want to encode that information to types. Sure, there exists a Rust crate [1] that tracks a relation between indices and originating array via lifetime, but it is complicated.

[1] https://github.com/bluss/indexing

Why is Rust slightly slower than C? | Sep 2019

Expand Context ↕

I don't entirely agree. As an analogy, suppose that you want to pass a pointer to an object as a function argument. As the programmer, you expect the object will necessarily remain allocated as long as the function uses it, since it's kept alive elsewhere, but you might have made a mistake. Before Rust, your options would be:

1. Use a C-style raw pointer. This has no overhead but is unsafe.

2. Use a reference counted or garbage collected pointer. This provides safety but is a non-zero-overhead abstraction. In other situations, reference counting can be necessary to manage unpredictable object lifetimes, but in this case we're assuming it would only be used to guard against mistakes, so the cost represents overhead. Even if some language implements reference counting just as fast as you can implement reference counting in C, that doesn't make it zero-overhead.

But Rust adds a new option:

3. Use a borrow-checked reference. This is safe, yet zero-overhead compared to the unsafe option, as long as your usage pattern can be expressed in terms of Rust lifetimes.

Going back to array indexing, the analogous situation is that you want to access an index that you, the programmer, expect to always be in bounds. Option 1 corresponds to unchecked indexing, and option 2 corresponds to bounds-checked indexing. But Rust brings nothing new in this area: there is no option 3.

Yet an option 3 is theoretically possible: safe unchecked indexing if you can prove to the compiler that the index can never be out of bounds. This can be done in languages with dependent types or built-in proof systems. (It can even be done in Rust with a certain neat hack [1], but with a lot of limitations.)

I'm not saying Rust is bad for not having dependent types. As far as I know, it's an open research problem how to make those features feel ergonomic and easy to use, even to the extent that Rust lifetimes are (i.e. not totally). And on the flipside, bounds checks usually don't cause very much overhead in practice, thanks to CPU branch prediction, plus the optimizer can sometimes remove them.

But I'd say that Rust's choice not to go there (...yet...) justifies calling its current approach a non-zero-overhead abstraction.

[1] https://github.com/bluss/indexing

Two Weeks of Rust | Jan 2016

Expand Context ↕

You can pass methods as functions or closures by Type::method. The first argument will be the self argument.

I don't see much of an alternative to panic on bounds checked indexing error, short of not providing indexing at all or developing a system of proven indices that would be extremely unfriendly to newcomers, see this project: https://github.com/bluss/indexing