What does HackerNews think of vroom?

VRoom! RISC-V CPU

Language: Verilog

I had VROOM! in mind (https://github.com/MoonbaseOtago/vroom) because I remembered it aims for 4 IPC avg with a width of 8. Though looking again it's 8 compressed 16 bit instructions or 4 uncompressed 32 bit instruction.

So you could argue a real mix of instructions is not going to be all 16 bit but some 16 and some 32, so the 8 is rarely achieved in practice, and also the block diagram only shows 4 decode blocks. But it can in fact peak at 8 instructions decoded per clock, so I'll call that 8 wide decode.

(You could even argue it's especially impressive, since RISC-V technically qualifies as variable-length encoding like x86, it's just that only the 16/32 instructions encoding are really in use at the moment)

> True performance RISCV designs are going to be people's money maker and never open sourced.

That turns out not to be the case.

Alibaba's C910 core -- roughly comparable to the ARM A72 cores (at the same MHz) in the Pi 4 -- is open sourced. It is being used, at 2.5 GHz, in the upcoming "Roma" laptop. That is rather expensive (for now), but I suspect the same TH1520 SoC will quickly find its way onto cheaper SBCs.

There is a very wide OoO GPL'd RISC-V core that is under development. It is aiming for eventual Apple M1 level performance. The current iteration is falling short of that at the moment, but it's already comparable to the ARM A76 in the latest RK3588 SBCs: https://github.com/MoonbaseOtago/vroom

https://github.com/MoonbaseOtago/vroom

One-man project. It works, right now, at A76 levels. The aim is M1 level, and the bottlenecks to that are known and being addressed.

There is NO WAY one person could do with with a complex ISA such as x86, ARM, or POWER.

You're just making stuff up.

ARM doesn't have any cores that do 8 wide decode. Neither do Intel or AMD. Apple has, but Apple is not ARM and doesn't share their designs with ARM or ARM customers.

Cortex X-1 and X-1 have 5 wide decode. Cortex A78 and Neoverse N1 have 4 wide decode.

ARM uses compressed encoding in their 32 bit A-series CPUs, for example the Cortex A7, A15 and so on. The A15 is pretty fast, running at up to 2.5 GHz. It was used in phones such as the Galaxy S4 and Note 3 back before 64 bit became a selling point.

Several organisations are making wide RISC-V implementations. Most of them aren't disclosing what they are doing, but one has actually published details of how it's 4-8 wide RISC-V decoder works -- they decode 16 bytes of code at a time, which is 4 instructions if they are all 32 bit instructions, 8 instructions if they are all 16 bit instructions, somewhere between for a mix.

https://github.com/MoonbaseOtago/vroom

Everything is there, in the open, including the GPL licensed SystemVerilog source code. It's not complex. The decode scheme is modular and extensible to as wide as you want, with no increase in complexity, just slightly longer latency.

There are practical limits to how wide is useful not because you can't build it, but because most code has a branch every 5 or 6 instructions on average. You can build a 20-wide machine if you want -- it just won't be any faster because it doesn't fit most of the code you'll be executing.