I think the 3 absolute demons that will eventually seek to destroy you in any homebrewn GUI framework will eventually be:

- Rendering and dealing with text in all of its extreme permutation. Styling. Localization. Character sets. Etc.

- Modern accessibility standards. There are countless annoying yet de facto UX patterns that are just nightmarish to check off the list from scratch. Inputs/forms/etc. Even worse than accessibility, because you can decide to forgo that completely if you really wanted. The sheer volume of UX use cases and edge cases involved in even seemingly basic input components is also an exercise in rote dread. You've got input validation, error states, just so many ergonomics to deal with.

- Fluid layouts and scaffolding. This is where modern HTML/CSS finally starts to shine because making flexible layouts is actually quite painless. But it's a motherfucker to roll your own approach.

For me it was the text issues that dulled my desire to keep working it. I was updating a Nim based immediate mode GUI called Fidget (https://github.com/treeform/fidget) and fixed a number of issues. It was fun getting 9-patch rectangles with corners working, etc.

I even implemented a fair subset of CSS grid:

    parseGridTemplateRows gt, ["row1-start"] 25'pp \
        ["row1-end"] 100'ux \
        ["third-line"] auto ["last-line"]

( https://github.com/elcritch/cssgrid )

However, text input is hard and tedious! Then you ideally would need to handle the different keybindings for each OS. You also loose any plugins the OS'es provide. Not to mention the lack of accessibility.

I recall reading that browsers shim out to native OS text fields and wonder how that's done. It really seems like the best approach for small GUI libraries to enabled first class text input.