I just tried some pre-trained models to classify images, using a CPU, it takes about 10s to 15s per image.

I did not know it was so inefficient.

Would you share some details? This indeed sounds very slow, so I suppose there should be some easy ways to speed things up.

Just to give you an idea of what's possible: A couple years ago I worked on live object recognition & classification (using Python and Tensorflow) and got to about ~30 FPS on an Nvidia Jetson Nano (i.e. using the GPU) and still ~12 FPS on an average laptop (using only the CPU).

jokoon

https://replicate.com/pharmapsychotic/clip-interrogator

using:

cfg.apply_low_vram_defaults()

interrogate_fast()

I tried lighter models like vit32/laion400 and others etc all are very very slow to load or use (model list: https://github.com/mlfoundations/open_clip)

I'm desperately looking for something more modest and light.