Don't do image hashing please. Do this:

1. Get CLIP embeddings for text & images 2. Put them in a vector database (Pinecone.io or something similar)

It's unreasonably effective. Checkout this search engine: https://same.energy/

I'd take a look at the open source project https://github.com/marqo-ai/marqo instead. It does 1 and 2 out of the box. You can use CLIP or any model you want really.