Does anyone know of any good test suites we can use to benchmark these local models? It would be really interesting to compare all the ones capable of running on consumer hardware so that users can easily choose the best ones to use. Currently, I'm a bit unsure how this compares to the Alpaca model released a few weeks ago.