Offtopic sort of, but does anyone know if folks are working on combining vision and natural language in one model? I think that could wield some interesting results.