https://github.com/ben-manes/caffeine cache does something similar by caching the future that gets returned on look-ups if the value is still being computed
Some clever cache implementations internally use special kinds of Bloom filters, e.g. the Caffeine cache (Java) uses TinyLFU, which "builds upon Bloom filter theory" (from the abstract of the paper): https://github.com/ben-manes/caffeine and https://arxiv.org/pdf/1512.00727.pdf
This is worth a read: https://blog.dgraph.io/post/introducing-ristretto-high-perf-...
CompletableFuture is becoming supported in a bunch of other libraries: https://github.com/AsyncHttpClient/async-http-client/, https://github.com/ben-manes/caffeine , https://github.com/mp911de/lettuce .
Most policies try to minimize the impact of low frequency items. For example SLRU uses a small probation space that promotes to a protected region if re-accessed. This allows quickly discarding low frequent items. ARC uses an adaptive version of this idea.
LIRS uses a very small region (HIR) that promotes to a large region (LIR) based in the inter-reference period. This allows it to discard many new arrivals by using a large number of ghosts to retain usage history.
TinyLFU is a probabilistic LFU filter that decides whether to admit by comparing the frequency of the arrival to the victim. This is often near optimal, but can suffer due to sparse bursts causing consecutive misses. An admission window (W-TinyLFU) resolves this problem, usually with a 1% window.
I have a simulator and traces that you could easily add onto to experiment. [1]
I recommend reading the TinyLFU paper as I think it is the best match to your ideas. [2]
[1] https://github.com/ben-manes/caffeine [2] http://arxiv.org/pdf/1512.00727.pdf
[1] http://www.cs.bgu.ac.il/~hendlerd/papers/DECS.pdf [2] https://github.com/ben-manes/caffeine