iTerm2 author here.
I'll spend some time looking into iTerm2's latency. I'm sure there are some low-hanging fruit here. But there have also been a handful of complaints that latency was too low—when you hit return at the shell prompt, the next frame drawn should include the next shell prompt, not the cursor on the next line before the new shell prompt has been read. So it's tricky to get right, especially considering how slow macOS's text drawing is.
If I could draw a whole frame in a reasonable amount of time, this problem would be much easier! But I can't. Using Core Text, it can easily take over 150ms to draw a single frame for a 4k display on a 2015 macbook pro. The deprecated core graphics API is significantly faster, but it does a not-so-great job at anything but ASCII text, doesn't support ligatures, etc.
Using layers helps on some machines and hurts on others. You also lose the ability to blur the contents behind the window, which is very popular. It also introduces a lot of bugs—layers on macOS are not as fully baked as they are on iOS. So this doesn't seem like a productive avenue.
How is Terminal.app as fast as it is? I don't know for sure. I do know that they ditched NSScrollView. They glued some NSScrollers onto a custom NSView subclass and (presumably) copy-pasted a bunch of scrolling inertia logic into their own code. AFAICT that's the main difference between Terminal and iTerm2, but it's just not feasible for a third-party developer to do.
Holy cow! I wonder if iTerm2 would benefit from using something like pathfinder[1] for text rendering. I mean, web browsers are able to render huge quantities of (complex, non-ASCII, with weird fonts) text in much less than 150ms on OS X somehow; how do they manage it? Pathfinder is part of the answer for how Servo does it, apparently.