It's sad to see the relationship between WHATWG and W3C has deteriorated to this point. Trying to wrangle a standard from a "living" (i.e. constantly changing) specification was always going to be tough but I'd have hoped both WHATWG and W3C would be able to maintain a working relationship.
Is there an article with the background on this? Why do we have both the W3C and the WHATWG, and why do the W3C just copy and paste work from WHATWG, if that is indeed what happens?
I don't know of an article, sorry. A brief history from memory would be that during XHTML days the W3C essentially let the HTML spec languish and people weren't moving to XHTML (at best they were moving to XHTML-like HTML).
So the WHATWG came along (mainly organised by the major browser vendors) and started the HTML spec moving again. This became part of what's known as HTML5.
However WHATWG doesn't exactly make a "standard" it makes a "living standard", which is a constantly shifting document which aims to describe where browsers currently are and what they hope to implement. The W3C decided to keep publishing its own HTML specifications and, as the WHATWG does describe what browsers are trying to do, the W3C's spec has to build at least partly on that work. There are differences though. For example, the W3C requires at least two implementations of a feature for it to be included in their spec.
The WHATWG has always opposed the W3C's spec. They see it as confusing to have two "official" specifications.
To put a slightly different spin on the same story as perspective always colours the telling:
W3C decided to deprecate HTML in favour of XHTML. Most of the web quickly moved to XHTML. One individual (an employee at Opera, then Mozilla, finally and currently Google) wrote an oddly influential opinion piece saying the the move to XHTML had been somehow harmful and pushed for the major browser vendors to form a rival non-democratic standards body (WHATWG) to the W3C, which forked and completely redefined HTML.
The W3C, which unlike the WHATWG has many voting members from many backgrounds, not all related to browser making, quite understandably was never fully on board with the new WHATWG HTML spec efforts. However, with the level of adoption and support it received (mainly from being the creation of the powerful browser vendors) W3C were eventually pressured into conceding to advocate for HTML. Which they've done by maintaining a copy, rather than blindly directing people to the work by what for all intents and purposes effectively amounts to a rival organisation, and an extremely undemocratic one at that.
As web developers, we should follow the WHATWG and ignore the W3C, because the W3C have lost the political battle for HTML and we need to get our stuff working on browsers, all of whom follow WHATWG. But that's an unfortunately pragmatic approach that shouldn't amount to acceptance.
> Most of the web quickly moved to XHTML.
Most of the web didn't move to XHTML.
A lot of people who were interested in being standards compliant moved to XHTML 1.0 Transitional, which was the HTML compatibility subset, but they only ever served it and validated it as HTML, not XHTML, because if you served it as XHTML, one single stray < that someone had forgotten to quote somewhere would break the parsing of the whole page.
The piece written by Hixie was influential because it was a wake up call that the direction the standards bodies were going in was pretty much fruitless, and that there could be a much better way to do it which wouldn't involve breaking compatibility with all of the existing content and would give web developers and users features that they actually wanted.
> As web developers, we should follow the WHATWG and ignore the W3C, because the W3C have lost the political battle for HTML and we need to get our stuff working on browsers, all of whom follow WHATWG. But that's an unfortunately pragmatic approach that shouldn't amount to acceptance.
I fail to see how there is anything unfortunate about this. What about rewriting everything in XHTML 2.0 (https://www.w3.org/TR/2010/NOTE-xhtml2-20101216/), and having to be extremely conscious of any possible stray < that could sneak in to a page without being quoted, would have been preferable to:
1. Consistent parsing support for existing content, and content that might have slight problems like stray <, in all browsers
2. Standardization of things that people actually use to build web apps, like XMLHttpRequest and Canvas
3. Consistent handling of encodings between browsers, including encoding sniffing
4. Consistent handling of quirks mode vs. standards mode between browsers
5. Actually having browsers support compatibility with vendor-prefixed versions of features, because some browsers widely used introduced prefixed features that web developers actually started relying upon
And also, have you ever tried getting involved with the WHATWG process? I have, and I find that they are very receptive to intelligent discussion of issues.
What doesn't work well is to insist that you have a problem and that this particular solution must be used to address the problem; because a lot of times, it's easy to come up with some proposed solution but it then turns out that it's either a lot more complex in practice, your proposal does't fit in with the rest of the ecosystem well, or the problem can actually be solve just in tooling on top of HTML without having to change the spec at all and then wait for multiple browser vendors to all independently implement it.
> and having to be extremely conscious of any possible stray < that could sneak in to a page without being quoted, would have been preferable to:
Any system that publishes content that would let this kind of thing pass is incredibly insecure, and shouldn't be on the internet. Today it's a stray <. Tomorrow it's a stray
It's no wonder software is where it is today with attitudes like these.
Not if that < had slipped in because it was in a piece of static text in a string somewhere in the source code.
You can apply mandatory quoting to untrusted input all you want, but there are going to be times when you have trusted strings that can still contain stray characters that will make the resulting markup invalid. And in many cases you don't want to have mandatory quoting for all of that, because these strings may have markup you want to include.
And yeah, you can argue that instead of generating content by appending strings, you should be building up a proper type-safe DOM structure that can be serialized. I'll wait while you go boil the ocean of converting every single web application framework that exists now outside of a couple of obscure type-safe functional programming frameworks, and in the meantime I'll be able to browse the real web without every other page giving me validation errors.
To be fair, I only use obscure type-safe functional programming frameworks. That's what I'm employed to do, and this obviously impacts my feelings on the matter. Personally, I think it's irresponsible to use anything that could be this unsafe. This doesn't mean everyone needs to use FP, just that frameworks and libraries should be chosen so as to guarantee safety. There are easy-to-use libraries for all these things in every language.
In no other world of engineering is this attitude okay. If you were a civil engineer and had to hold a license to practice due to the danger your designs could present to society, this attitude would eventually cause you to lose your ability to practice. It's becoming more and more clear that software can have similar levels of impact, and software engineers should practice as scuh.
But what you're saying is as if you suggest that since the metric system is more consistent and more widely used than the English, I as a bolt distributor should start selling my bolts in metric sizes, despite the fact that the nuts that everyone has are in English sizes.
The browser vendors, at least, are working on implement their browsers in more type-safe languages (https://github.com/servo/servo), but even still they have to work with the content that is produced by thousands of different languages, frameworks, and tools, and millions of hand written HTML files, templates, and the like. Just turning on strict XML parsing doesn't make that go away, it just makes your browser fail on most websites.