What does HackerNews think of webpackage?

Web packaging format

Language: Go

There's signed & bundled http requests that, I think, will do much of this job better in the long run. It also helps accelerate the everyday web browsing experience/page load speed. Webpackage.

I really like the idea of being able to take a webpage I've downloaded & hand it to my friend. Retransmissable webapps. If the app is offline capable, theycll have the full experience, then & there.

Memento's earlier & very interesting, great effort. I still think webpackage/web bundles is more likely to gain traction & to manifest into what we want.

https://github.com/WICG/webpackage

Not sure if this is entirely what you're after, but check out https://github.com/WICG/webpackage
signed web bundles [1] let you package up a page as a fixed resource, distributed with a signature to verify that the content of that page is what the author intended it to be, so a site like twitter that is embedding it can be sure that what they're embedding is always the original resource.

however, this is part of AMP, and so web developers who prefer the "do whatever the fuck you want" aspect of the internet push back against it. the ability to ensure that a web link contains the same content at a future date as it did when it was initially crawled was one of the things we destroyed along with AMP.

[1] https://github.com/WICG/webpackage

The Web Packaging specifications[1] are surprisingly close to turning URLs into a content-addressed system. Content-addressed networking is all about names for content, after all, which philosophically is a close match for what a URL is, it just so happens that HTTP uses urls to resolve servers that serve the content. But with Web Package, content is signed by the origin, and can be distributed via other means.

One of the primary uses cases for Web Package is to let two users exchange content while offline, perhaps via a usb stick or what not. This isn't part of the specification, but we could begin to imagine sites that have a list of the web packages they have available for download. And we could imagine aggregating those content indexes, and preferring to download from these mirrors over downloads from the origin's servers.

I'm hoping eventually we get a fairly content-addressed network, via urls.

[1] https://github.com/WICG/webpackage

A phyrric victory for a Web that is basically ChromeOS.

Ever heard of Web Bundles?

https://web.dev/web-bundles/

https://github.com/WICG/webpackage

data urls encode things in base64 format so bloat up the file. Also the user agent can't just seek over them, requiring it to parse the entire included base64 content. There are better ways, but sadly nothing cross platform:

* https://github.com/WICG/webpackage

* https://en.wikipedia.org/wiki/Webarchive

The AMP team is working on the exact opposite, which is adopting Web Packaging [1] that fixes the URL issue you are describing.

https://amphtml.wordpress.com/2018/11/13/developer-preview-o...

[1] - https://github.com/WICG/webpackage

Google's answer to this is a trifecta of AMP [1], Contributor [2], and Funding Choices [3] -- to deliver content signed by the publisher [4] from Google's CDN and display less-annoying ads in the meantime, or purchase an ad-free pass to the publisher through Google.

This venture is predicated on the assumption that Google and the publishers both need each other: publishers want revenue from ads, revenue whose amount is proportional to the number of viewers, while Google wants quality destinations to which it can direct traffic and/or quality sources of content which it can display in a captive newsreader.

This is a reality in a world where paid newspaper subscriptions are down, media makes money with just-in-time auctioned online display ads, and paywalls interfere with the positive effects of wide distribution, like the likelihood of new customer acquisition.

[1] https://www.ampproject.org/ [2] https://contributor.google.com/ [3] https://fundingchoices.google.com [4] https://github.com/WICG/webpackage

> If it was as robust as you make out then we wouldn't have a need for the cookie consent law

We _don't_ need that law. All it's resulted in is tons of annoying pop-ups all over the web (which I block with an extension). Browsers already have the ability to grant or deny cookie usage per-domain, which is much more user-friendly and effective.

> nor GDPR

No technically-enforceable permissions system in the world will let you control what a company does with data you've already voluntarily given them, so I don't see how better permissions would eliminate the need for GDPR.

> But one idea I have is where sites have to send metadata down (as part of the headers if like?) with each page stating what permissions that page requires to function

Sounds like you're describing [Content Security Policy][1] and [Feature Policy][2]? Obviously those aren't required by default, for backwards-compatibility reasons, but they _do_ exist.

> That way users can see what each page is doing just be looking at the requested permissions even before any popup appears

Can you clarify what you mean by this? Most of the permissions you described are very low-level; not the sort of thing most users would want to concern themselves with (and certainly not for every site they visit).

> Further to that, you can define in the browser which permissions to allow by default - this is a little like what is available already but more granular (eg the self-modifying page example earlier).

Again, most of this seems like the sort of thing users wouldn't want to concern themselves with. What benefit would result from users being able to prevent a page from modifying its own DOM, for example? For permissions that actually impact the user's security or privacy (e.g. Camera, Mic, Clipboard, Cookies, Third-Party Cookies) this already exists. (See chrome://settings/content)

Could you give some more examples of permissions you believe would be valuable to include here? Preferably stuff that has a clear, concrete impact on the user's security or privacy.

> Further to that, some interactions (the exploitable ones) will be logged. eg calls from one domain to another. So some of the more experienced users have the ability to scan for suspicious behaviour without having to trawl through thousands of HTTP get / post requests.

Is this really such a common use case that you think this needs to be built-into the browser rather than handled via an extension or Dev Tools?

I'd also like to point out that in the case of cross-domain calls, a site which wants to hide their behavior could simply proxy calls to third-party sites through their own server.

> I'd also like reminder prompts. eg if a service you allowed to make location requests does it frequently, I'd want a banner across the bottom of the page reminding users that this is happening frequently and that can only be mitigated by "trusting" that service with that API (an additional manual action on top of allowing it initial access to a particular browser feature).

Already exists. Browsers show an icon in the address bar and on the tab when a page is accessing sensitive information like your mic, camera, or location. Mobile browsers display a persistent notification.

> Another cool feature would be if versioning was built into webpages. Each page would be versioned and that version number would stored with a checksum of the page. Thus if the page changes between refreshes, the browser knows and resets all the permissions on that page to the user defaults (ie any previous prompts would then have to be re-prompted again).

So every time a site make some minor update to their code, I'd have to re-authorize all permissions on that site? Seems rather inconvenient to me, and might lead to warning blindness (which would decrease overall security). That said, the upcoming [Web Packaging Standard][3] might do the "versioning" part of this at least.

> It would also be nice to have some definable CPU and memory caps too - since each tab is basically now a virtual machine.

I'm not sure this is something the user really wants to micromanage. If a tab I'm using needs to use 100% of my CPU to do its job, I don't think I'd want it to bother me with a prompt; I'd much rather just see an indicator or something indicating high resource usage so I can intervene if necessary. With the recent rise of JS-based cryptocoin miners; something like this may end up getting implemented in the near future.

> Thus all I ask is you respect the spirit of what I'm saying rather than taking the points literally as a written gospel I'm suggesting we implement tomorrow. There will be problems implementing that verbatim but my point is to illustrate just how little protection the web offers and how much further we need to go given its current nature is basically just allowing anyone to run any untrusted application on their local machine.

Thanks for the reminder. I've gone back over all my points in this post and tried to re-imagine your suggestions in the most constructive way possible.

Sorry if I still seem skeptical. You have some good ideas, but keep in mind that these problems have already been considered in detail by numerous standards committees and browser vendors, so it's no surprise that a lot of the things you've suggested either already exist, or have good reasons for not existing.

[1]: https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP

[2]: https://github.com/WICG/feature-policy

[3]: https://github.com/WICG/webpackage

This is a remarkable deep-dive into the history of RSS and the ideas and vision behind it, and how those camps diverged. But despite beginning the article with Kevin Werbach's prediction that syndication was going to become a dominant business model on the web, this never quite came to pass, and RSS took on the role of moving media between aggregators and consumers, rather than between authors and aggregators as originally intended.

With the shift in usage to the edge, RSS found itself in the company of adhoc webpages that disseminated the same content but supported scripted adtech to enable just-in-time display advertising -- advertising which, over time, became a significant source of publisher revenue. RSS was often surfaced in a different user-agent and unable to clearly accommodate scripted ads, so first-party RSS feeds were practically giving away the content for free. With time, this greatly contributed to the waning appetite of commercial publishers for RSS, despite it remaining popular in circles were the content was meant to be spread wide at no cost. Although many news organizations still offer RSS feeds, they're not really marketed prominently, and remain a loss leader to make savvy readers happy.

Since alternate business models never achieved the same uptake, efforts like Google's AMP (and its offshoots like webpackage [1]) are a practical take on the syndication concept again, where the usage of adtech is assumed from the start, and the surfacer of the content becomes decoupled from its author.

[1] https://github.com/WICG/webpackage

I believe this sort of thing will be possible via the [Web Package Standard][1]

[1]: https://github.com/WICG/webpackage

The same way they're trying to fix this problem with AMP: https://github.com/WICG/webpackage (see also: https://amphtml.wordpress.com/2018/01/09/improving-urls-for-...)

Even if you can't package up and ship all of your traditional site to Google's CDN, you could do most of the burdensome/heavy bits. But then Google doesn't get to control your website and define the way it's allowed to look, which is what AMP is really for.

(Author here.) The "Signed HTTP Exchanges standard" is part of the Web Packaging standard, which is, indeed, the whole point of the article. https://github.com/WICG/webpackage
Its the first time I read about Web Packages[1] but when you combine it with service workers it could really make sense. Just imagine downloading an .wpk and install it to your local environment like a .deb or *.apk package. I think that could be a nice way to solve the omnipresent dependency on servers for PWAs.

https://github.com/WICG/webpackage

(Author here.) That's what's so great about the Web Packaging standard proposed in the article to replace AMP: everybody could use it. https://github.com/WICG/webpackage

Even HN could serve up prerendered Web Packages for all of the sites on the front page.

The result would be a more decentralized web.

No. In fact, the solution detailed in that article accomplishes the exact inverse of one of the letter's demands.

While the letter wants third-party content not to be surfaced in a wrapping viewport that downplays the fact that it's actually Google's AMP Newsreader, the recent AMP announcement details a planned change where emerging tech [1] will be used to make the URL appear as if the content was directly loaded from the distant origin, because the content being served has been digitally signed by that origin and its serving has been delegated to Google akin to how a run-of-the-mill CDN is delegated the authority to serve content for a domain that's 'spiritually' owned by someone else.

However, the recent AMP announcement does address a very frequent complaint about AMP. Just one that's at odds with the one the letter is requesting.

[1] https://github.com/WICG/webpackage

> If I want to prefetch results, I unerstand it means people know I'm looking at those results.

Why do that though when you can have your cake and eat it? Google's proposed solution allows you to prefetch results _and_ not allow the third-party to know you're looking at those results.

> the counter consideration is that your implicitly saying you trust Google with that info more than the actual service providers

If I'm using Google search, then yes, obviously I'm fine with Google knowing what I searched for. If I wasn't okay with that, I most certainly would not be using Google search. In contrast, I'm less likely to be okay with a random site in the search results page that I haven't clicked on knowing that I saw a link to their site in a search results page.

Note that if I were using DuckDuckGo instead and DuckDuckGo supported AMP, then my browser would prefetch from DuckDuckGo's AMP cache, not Google's. No additional information is being shared with any party who doesn't already possess that information. (DuckDuckGo already knows what I searched for. Me loading an AMP page from them related to that query reveals no additional information.)

> Indeed. That's the point of HTTPS. As a user I don't expect that contract undermined even by a claimed benevolent Google

Could you explain how Google's proposed solution here undermines HTTPS? Note that the OP talks about using the upcoming [Web Package standard][1] to distribute AMP pages. This standard would allow the integrity guarantees of HTTPS to be preserved even when the page in question is being served by Google's AMP cache rather than the original server.

[1]: https://github.com/WICG/webpackage

The meat of the story is:

"We embarked on a multi-month long effort, and today we finally feel confident that we found a solution: As recommended by the W3C TAG [1], we intend to implement a new version of AMP Cache serving based on the emerging Web Packaging standard [2]."

I'm just reading through this so I'm gleaning as I go, but it looks like the W3C TAG came out with a recommendation for 'Distributed and Syndicated Content' [1] that specifically addresses AMP by name, and recommends strategies to do this kind of content syndication in a way that preserves the original provenance of the data.

The Web Packaging Format [2] aims to, apparently [3], solve packing together resources, but, rather, HTTP request-response pairs, maybe HPACKed?, and signed and hashed for integrity, in a flat hierarchy, in a CBOR envelope, that nonetheless has MIME-like properties? I'm still digesting what's all involved.

[1] https://www.w3.org/2001/tag/doc/distributed-content/ [2] https://github.com/WICG/webpackage [3] https://github.com/WICG/webpackage/blob/master/explainer.md

Web Packaging [https://github.com/WICG/webpackage] allows the authority over an origin to delegate serving to a third party (and support things like offline serving). This is similar, but better than e.g. what a CDN does today, because the third party does not need access to the private key.