Is the main difference in performance between JS and the others (Perl, Ruby, PHP) the fact that JS is JIT'ed and the others are not? I mean JS used to be slowish and until V8 came, what is V8 doing? Why can't Python / Ruby do the same thing?
Not sure - Google pouring money into faster JavaScript execution is probably a big reason, but maybe JavaScript's comparative simplicity and fewer built-ins make it easier to get more out of it?
My understanding is that the main obstacle to higher performance Python is the huge value of preexisting extensions written in C. Maintaining compatibility with existing C extensions, or at least minimizing the porting effort for such extensions, puts a lot of constraints on the solution space.
I think one could radically change the way Python objects work internally, and have the C foreign function interface (FFI) wrap every object passed to a C extension in an API/ABI-preserving facade (which itself would wrap any objects returned from its methods). However, this would probably greatly slow down C extensions, which are often performance-critical sections of Python applications. It's also possible that there are portions of the C extension API that expose enough details of object internals to even make such facades herculean to implement. (I've only written some small simple C extensions and am not very familiar with the API.)
V8 didn't have to deal with API/ABI compatibility with any preexisting C extensions that may have made too many abstraction-violating assumptions about how objects and the VM worked.
Breaking too many important C extensions would almost certainly send Python the way of Perl 6.
Edit: as an aside, a big difficulty with JS is that objects can have their prototype changed arbitrarily at runtime. Even with Metaclass programming in Python, the class of an object can't be changed after creation, making it much easier to cache/memoize dynamic method dispatch. On the other hand, high performance implementation of Python's bound methods require a bit more flow analysis than you need in JS. In Python, if you write f = x.y, f is a "bound method" (a closure that ensure x is passed as "self" to y). It's expensive to create closures for each and every method invocation, so a high performance implementation would need to do a bit of static analysis to identify which method look-ups are used purely for invocation, and which look-ups need to create the closures because the method itself is passed around or stored in a variable.
HPy is building an API abstraction layer which is designed to be used with both the CPython API and JITs. However, IIUC they are not proposing any changes to CPython itself, but rather to provide a smaller API surface and fewer JIT impedance mismatches when extensions are built against something other than CPython. The lead developer is a longtime PyPy developer.