From W3Tech;

> PHP is used by 77% of all the websites whose server-side programming language we know.

I had a quick look at the methodology section, but it’s not clear to me how accurate this data is. Determining whether a site uses PHP can be relatively straightforward (especially with default extensions / if Wordpress is used / etc), but if a site (potentially using a different language) is behind a reverse proxy/uses an API/etc then it is less clear. Does anyone know whether PHP is over-represented in the results because it’s easy to identify?

No doubt PHP is still huge, but 77% seems almost too huge. There is also a very good chance that PHP is actually that big and I’m just in a different crowd.

I agree. I always doubted these figures (I'm a PHP dev myself, so I wouldn't mind these figures being true). I think the methodology is shady. I wonder if they use what the server indicates. I think some servers like Apache with php mod send this information to the client in a header. But most servers don't. Therefore they maybe use this as "from all the servers giving a backend language information, PHP represents 77%" which wouldn't be surprising. The question is how many websites in your data don't give any information about the language used under the hood?

I think we should stop using these numbers. GitHub uses ruby on rails but we know it from the developer team, not from what the server tells us. How many websites communicate about their backend infrastructure?

I don't doubt Wordpress powers many websites out there. But I'm tired of these figures which don't mean anything to me. Especially that if you look for all job ads, PHP isn't so big (except in some PHP-centric countries like France).

You can't just make up numbers. If you give me statistics, give me the methodology you used and all the details. Otherwise I suggest we all start saying Haskell powers 87% of the web. After all, if you can invent what suits you, I can do the same.

Identifying the technologies behind a web site involves a lot more than looking at Apache's mod_php headers (which you can and should turn off for security reasons). The tools for figuring out what runs a site actually do a really good job by looking for multiple identifying features. Marketers and SEO people use tools like BuiltWith and Wappalyzer (and many others, most of them not free). You may not know about those tools or how well they work, but a quick browse of that space will disabuse you of the idea that these surveys just crawl looking at server headers.

Multiple independent surveys of web back-end technologies by different outlets, across many years, have reached the same conclusion: PHP powers approximately 3/4ths of public web sites/applications. I do a lot of PHP work and I see PHP used heavily in restricted/private web applications as well -- internal sites that won't show in these kinds of surveys. One school I work for has one public WordPress-powered site and several internal-only WordPress sites, and multiple internal PHP-powered sites not based on WordPress, including Moodle (learning management system) and their student management system.

The large ecosystem, relatively large population of experienced developers, and ease of deployment play into the decision process. Sometimes it comes down to hosting costs or other non-technical factors.

Deducing the number of jobs for PHP developers based on job ads will mislead you. Most jobs get filled internally, informally, or by recruiters before they get posted online (because that costs). If you don't see a lot of ads for PHP developers that might mean few jobs exists (which wouldn't match the experience of anyone who works with PHP). It may also mean the jobs got filled before the employer has to pay to advertise the job. A position for a 5+ yrs experience Elixir dev may sit open for months, but I can and have filled PHP dev openings in a few days, from a large list of applicants acquired by a free posting in a local PHP user's group forum, without having to post in public job forums or do LinkedIn email blasts.

We should also consider that web developers with more than a few years of experience have likely worked with multiple tech stacks, and those of us with 10+ years very likely cut our teeth on PHP. I started in the '90s with ASP and ColdFusion, with some Perl, and then saw employers move to PHP (and a few to Rails a few years later) mainly because ASP (which predates .NET) and ColdFusion required increasingly expensive licenses whereas PHP did not. Among experienced web developers you will find many/most of them have worked with PHP, and could work with it again, though they may prefer something else. Likewise I know COBOL and could fall back on that if more interesting work dried up for me, but I don't call myself a COBOL developer or look for jobs in that space.

Thank you for your comment. I was definitely wrong about the different methods used. This is why I love Hacker News. Always nice to learn something.

I still think the stats they provide are a bit weird since there is no "unknown" category. If they can't find the backend technology used for 5% of websites, it changes the whole result, and from what I have seen they don't provide this information.

But your really nice and detailed answer tells me I might be wrong once more.

I don’t see an “unknown” category. A large number of unknowns could skew the results if we had reason to believe those mostly represent non-PHP sites. Do we have any reason to think those unknown sites show a different distribution than the known sites? Do enough unknown sites exist to meaningfully affect the results?

Using your 5% example, supposing that 5% unknown includes no PHP sites, that only brings the PHP percentage down a little. It doesn’t change the main point that PHP dominates by a wide margin.

Well, it seems like a good part of the analyzer is about some "leaks" or specific behaviors from a language that could give us some tells about what technology is used. I checked Wappalyzer's code (at least the last commit before it went private: https://github.com/dochne/wappalyzer) and PHP gives more tells (https://github.com/dochne/wappalyzer/blob/main/src/technolog...) than Python for example (https://github.com/dochne/wappalyzer/blob/main/src/technolog...).

Some technologies seem to give more tells than others. Which means some technologies could be way more invisible than others. I am not sure we can suppose the known and unknown technologies have the same ratio.

I quickly checked some websites with BuiltWith and Wappalyzer and from my personal totally unscientific and small sample data, they seem to detect more easily PHP than other languages like Python.

Again, I don't know. But I took 5% to be optimistic. It could be 30% or 50%. And then the whole picture changes.

Edit: Funny thing, it even adds PHP to some sites I know (almost for sure) don't use PHP. Like GitHub using Ruby (true) and PHP with Drupal (???).