Is it possible to do inference with Falcon 40B on this type of hardware or similar?

Yes, but would be massive overkill. Falcon 40B takes ~35GB of VRAM to load now, and probably less in the future with better quant from llama.cpp and such.

https://huggingface.co/TheBloke/falcon-40b-instruct-GPTQ

Large context size is becoming less of an issue now too.

But maybe it would be good for batched inference?

I mean the 35GB version of Falcon is maybe not something you'd want to use in production

Also ironically, this version of Falcon will require CUDA.

It might work on rocm? I am not sure about the status of GPTQ on rocm.

GPTQ for LLaMAs w ROCm works with https://github.com/turboderp/exllama/ but Falcon inferencing is a different beast.