Don't do image hashing please. Do this:
1. Get CLIP embeddings for text & images 2. Put them in a vector database (Pinecone.io or something similar)
It's unreasonably effective. Checkout this search engine: https://same.energy/
I'd take a look at the open source project https://github.com/marqo-ai/marqo instead. It does 1 and 2 out of the box. You can use CLIP or any model you want really.