The Hacker News API is still up: https://github.com/HackerNews/API https://hacker-news.firebaseio.com/v0/topstories.json?print=...
To @dang or whoever is reading at Algolia: you can set up continuous monitoring for problems at the DNS/TCP/TLS/HTTP layers from just your URL at https://heiioncall.com/ and we’d definitely alert you if the DNS record disappears or it’s otherwise unreachable for more than X minutes. (Email me if you need help getting it set up.)
I do not foresee this happening. The moderation and voting would require a significant level of trust of the other instances, admins and moderators. Being associated with other multi-media sites introduces legal risks I would be surprised if a VC was willing to take on. Assuming you mean a bi-directional integration into ActivityPub
That said one could run something like HN and moderate it independently. Maybe Postmill [1a][1b]. There are other sites that ingest data from HN's API [2] for search, statistical analytics, etc... Something similar could probably be done that would be ingested read-only into ActivityPub from another site but I would be quite surprised to see this site implement ActivityPub directly. I don't know what the API rate limits are for this site or if Mastodon could operate within them.
[1a] - https://postmill.xyz/
https://github.com/HackerNews/API
and it is a much more certain thing. I have a (i) a model that predicts "will this headline get more than 10 votes?" and (ii) one that predicts "if this headline gets more than 10 votes does it get a ratio of comments to votes greater than the median (roughly 0.5)?"
The best model I have for (i) is still a bag of words model that doesn't try to correct for time series variations, the AuC is atrocious, maybe around 65%, but I like the model because high-scoring headlines look like a parody of high-scoring headlines, I think "Richard Stallman has died" could be the best possible headline. (It's silly to thing you could get good performance at this because it can't see if the article has a flashy picture or other attractive attributes that would raise the vote rate.) I've made other models with fancier methods but none perform better nor are more entertaining.
As for (ii) the most commented articles tend to be clickbaity so it would be irresponsible to submit a feed of high scoring articles that isn't well curated. I am getting an AuC of around 72% which is what I got with my first recommender.
https://github.com/HackerNews/API
https://www.npmjs.com/package/hackernews2nntp
https://github.com/gromnitsky/hackernews2nntp
Honorable mention for gwene.org which converts RSS feeds into NNTP and gmane.org that converts mailing lists into NNTP.
Shouldn't be needed, there's an API.
The HN API is linked to from every page on HN, at the bottom of the page in the footer. Look for the link named “API”.
As a trick, you can use the API [1]. Get the story ID in the URL of your post, and place it at the end of the API call (e.g. [2]).
If the result has `dead: true`, your post is… well… dead ;)
[1] https://github.com/HackerNews/API [2] https://hacker-news.firebaseio.com/v0/item/35700745.json?pri...
For example, for the March one it is ID 34983767 (from the algolia search or a "there's only so many of them, here's a list that I'll add to each month").
You can then get a list of all the top level comments at https://hacker-news.firebaseio.com/v0/item/34983767.json?pri...
And then pulling up a comment at https://hacker-news.firebaseio.com/v0/item/35255027.json?pri... to not have to parse any of its child comments or the HTML of the page.
(late edit: and re-reading the blog post while not trying to pay half attention to a meeting... that is what you are doing)
Currently only supports read operations, but planning to add login and commenting eventually.
I used the Algolia search API, it has extremely generous rate limits and page limits.
[0]: https://github.com/HackerNews/API [1]: https://hn.algolia.com/api [2]: https://console.cloud.google.com/bigquery?p=bigquery-public-...
There's also a Google dataset but I don't know the URL for it or if it's up to date.
Nevertheless, I needed this to make my life easier in a project I'm working on, so, here you go.
This package provides a wrapper for the Firebase Hacker News API [1] in Go.
Besides the obvious callouts to the documented endpoints, it also provides utilities functions that can be useful when working with data returned from this API.
I'm a beginner at Go, so feedback to improve my skills is more than welcome. Together with PRs to the repo, of course!
> "The v0 API is essentially a dump of our in-memory data structures. We know, what works great locally in memory isn't so hot over the network. Many of the awkward things are just the way HN works internally. Want to know the total number of comments on an article? Traverse the tree and count. Want to know the children of an item? Load the item and get their IDs, then load them. The newest page? Starts at item maxid and walks backward, keeping only the top level stories. Same for Ask, Show, etc. I'm not saying this to defend it - It's not the ideal public API, but it's the one we could release in the time we had. While awkward, it's possible to implement most of HN using it."
https://github.com/HackerNews/API
If one wants to get the comments for a post on HN from the API, they have to first fetch the post, then fetch every single comment of the post 1 by 1. So if the post has 500 comments, one has to basically send over 500 network requests......
They did say that they plan on released a better API in the future. However, as far as I know, it will be read only initially. So won't be much useful for my app.
- Parent / context are in the menu just to keep the design clean. But I understand that's not what everyone wants, so I will add a setting to show these with the other icons, instead of in the menu.
- Prev / next are moved to the "control pad" (pro feature), but my intention is for the free version to support all current HN features, so I will add those back in the next update.
- The permission to firebase is to access the HN API [1] to grab user profile info etc. The other is for the pro upgrade using ExtPay [2]
The first commit is from Oct 2, 2014 while the article was published on April 1, 2018.
However, it seems that you did not read the first paragraph of the article which states the following:
> While building hackd I faced a problem - the official Hacker News API doesn’t allow for interaction, such as upvoting, posting and commenting. I wanted hackd to be a full featured Hacker News client, so this wasn’t going to cut it.
So to answer your question:
> Didn’t HN get an official API a long time ago? Or was that read-only?
Yes, it appears that at the time of publication, the API was read-only.
https://github.com/HackerNews/API
I'm not sure what rate limiting policy is in place, but in theory you can start with a request for maxitem and from that point on just GET all items down to zero until you hit some sort of blocker.
The REST endpoints do not provide a way to query the items with variable parameters. So, I could not support it either on GraphQL side.
You can check my effort at https://github.com/hsblhsn/hn.hsblhsn.me
Paste this hacky snippet into your browser's console to save all of the links into your clipboard:
(() => {
let result = [...document.querySelectorAll('td a')]
.map((link) =>
link &&
!link.href.includes('ycombinator') &&
!link.href.includes('javascript:void')
? link.href
: ''
)
.filter(Boolean);
result = [...new Set(result)];
result.sort();
copy(result.join('\n\n'));
})();
Just to help, here's a compiled-list of all of the links shared here so far:http://journal.stuffwithstuff.com/
http://www.stargrave.org/LinksCatPersonal.html
http://xahlee.info/kbd/keyboard_hardware_and_key_choices.htm...
https://austinhenley.com/blog.html
https://bas.codes/posts/python-slicing
https://bernsteinbear.com/pl-resources/
https://blog.codinghorror.com/
https://blog.johnnyreilly.com/
https://blog.ploeh.dk/archive/
https://bloggingfordevs.com/trends/
https://bowtiedfox.substack.com/
https://briancallahan.net/blog
https://collection.mataroa.blog/
https://earthly.dev/blog/authors/adam/
https://eli.thegreenplace.net/
https://fsharpforfunandprofit.com/
https://github.com/crispgm/awesome-engineering-blogs
https://github.com/HackerNews/API
https://github.com/jkup/awesome-personal-blogs
https://github.com/markodenic/awesome-tech-blogs
https://github.com/search?q=list+of+awesome+blogs
https://headrush.typepad.com/creating_passionate_users/
https://hn.algolia.com/?query=Ask%20HN%3A%20Great%20Blogs%20...
https://journal.stuffwithstuff.com/
https://jvns.ca/blog/2016/04/09/some-of-my-favorite-blogs/
https://learnbyexample.github.io/py_resources/miscellaneous....
https://matt.might.net/articles/
https://noobmaker.substack.com/
https://paulmck.livejournal.com/
https://randomascii.wordpress.com/
https://scottlocklin.wordpress.com/
https://simpleprogrammer.com/ultimate-list-software-develope...
https://staysaasy.com/engineering/2020/05/30/Picking-Your-Te...
https://staysaasy.com/software/2022/01/17/complexity.html
https://tenthousandmeters.com/
https://www.buildthestage.com/
https://www.davidvlijmincx.com/
https://www.hanselman.com/blog/
https://www.husseinnasser.com/search
https://www.jeffgeerling.com/blog
https://www.kalzumeus.com/archive/
https://www.stochasticlifestyle.com/
https://www.theerlangelist.com/
If you're scraping HN, please wait 30 seconds (https://news.ycombinator.com/robots.txt) - our app server still runs on a single core, so we don't have a lot of performance to spare. (Hopefully that will change this year.)
If you need to check more frequently, https://github.com/HackerNews/API works fine and you can get JSON that way anyhow.
Why are you scraping? We have an API, it's linked at the bottom of every page[0].
[1]: https://github.com/HackerNews/API
[2]: https://console.cloud.google.com/marketplace/product/y-combi...
On the other hand: they gave me an idea. I can structure the crawler differently. Right now, I am now following the next page (apparently there is a limit on the number of 'next' pages). Instead, it is possible to get the maximum item id, then just decrement the item id down and down again.
> I'm not saying this to defend it - It's not the ideal public API, but it's the one we could release in the time we had. While awkward, it's possible to implement most of HN using it.
https://github.com/HackerNews/API
My words may have been a bit too harsh but I still think I have a point. The only thing I could find use of the API was to get the user's details (karma, description, date created etc). Other than that, I ended up relying on scraping.
Unfortunately, unless the JSON api offers write access, it won't be of much use for my app's use cases. Lets just hope the JSON Api isn't going to break anything major in the HTML of the site and there aren't any major re-designs of the site in the pipeline.
https://hacker-news.firebaseio.com/v0/item/28060112.json?pri...
See also https://github.com/HackerNews/API
Unfortunately (I guess this is a big reason why people don't use it), it doesn't sort the comments – if you need the orders, you'll have to parse HN HTML (or just use the official API).
Still just two requests (the HN site, the Algolia API) is much better than recursively requesting a hundred requests, so I use this approach in my client[2].
[0]: https://hn.algolia.com/api
> Any reason you're connecting to firebase rather than storing preferences locally?
All of your preferences are stored locally — the firebase connection is due to the official HN API[0] implemented on Firebase. Your HN credentials never leave the app (except for news.ycombinator.com) — it’s tucked in your keychain.
I’m using two different APIs — the inefficient API that requires you querying in a tree fashion is the HN official API[0], but there’s also the Algolia API[1], which is much faster and gives a much sensible data shape. I also do actually fetch the HN website as well — it’s needed for account/voting features. With these sources, it’s much faster than a usual client that gets it’s data from the official API.
> The HN website I think is pretty ideal for the type of content its displaying, a native app doesn't give you much over having a HN tab open.
I guess it’s a bit of difference on how one uses HN? I didn’t like HN tabs being mixed with other work—related tabs. I felt that having an app would be a perfect solution to me, but YMMV I guess.
> The only benefit I see from a native app is the ability to save articles and comments for offline reading
I guess that’s one more feature that I should add to my backlog :)
https://github.com/HackerNews/API doesn't list this as a valid endpoint. Indeed, when I try to access it via curl, I get a HTTP 405 with a "Permission denied" error (same result when I try to access nonexistent-endpoint.json).
Based on the HN search on the website, I'd expect the correct autocomplete to involve hn.algolia.com [0].
[0] https://hn.algolia.com/api points at https://hn.algolia.com/api/v1/search?query=...
To me, this points at the need for human input with a system like this. There is a Firebase endpoint, yes, and Copilot found that correctly! But then it invented a new endpoint that doesn't exist.
https://news.ycombinator.com/item?id=1390685
And there's also an API.
https://github.com/HackerNews/API
There's lots of Wikipedia-like software available, too.
I guess it depends on what you mean by "build" a community. (Though if you're trying to do it online, the key is going to be comments and maybe also individual accounts.) You could just try starting your own subreddit.
Another option is just announcing meetups -- maybe an online (or live?) book club for geeks or programmers, or a geek discussion circle.
https://github.com/HackerNews/API
Does it count like ability to download your data?
Totally hilarious that you posted this on a discussion site whose API doesn't require tokens.
Hacker News official APIs[1], on the other hands, use a specific algorithm to determine the ranking of submissions and allow you to get at most 500 stories.
[0]: https://hn.algolia.com/api/v1/search?tags=front_page&hitsPer...
No, for retrieving the Hacker News Posts we were using the public Hacker News API, which returns the posts in JSON format: https://github.com/HackerNews/API
The crawling speed of 100...1000 pages per second refers to crawling the external pages linked from Hacker news posts. As they are from different domains we can achieve a high crawling speed while being a polite crawler with a low crawling rate per domain.
Especially because the api its pretty solid you can build whatever stuff you want
https://github.com/HackerNews/API
You could implement you own tagging system and see if people bite.
For instance I built a comment notifier.
feel free to hit me up about it if you're curious
Based on the official item API, this focuses only on the main posts with at least 2 engagements. The posts in datasets are about 1.5% of the items returned
With posts as a starting point, you can easily trace the hierarchy from "kids" fields.
Potential Usage:
- Generate popular titles.
- Collect comments in a hierarchical order for further training.
- Analyze popular topics in the engineering community.
- Identify the best time to post for maximum engagement.
It's simplistic but you can still use it to write a sh*ttier version of HN.
https://github.com/HackerNews/API
I just wish they'd open source their We're-Not-Reddit behaviors library.
Also, CSS is an abomination, aesthetics have always belonged on the client side.
https://hacker-news.firebaseio.com/v0/item/22975749.json?pri...
Although, the documentation says that field is HTML, so I'm not sure what to think.
I'm using it to track common items here and on Lobste.rs (and Proggit):
Here's the endpoint for the latest 500 submissions:
https://hacker-news.firebaseio.com/v0/newstories.json
here's the one for the current top stories:
https://hacker-news.firebaseio.com/v0/topstories.json
It's actually quite nice to work with. I don't know how to keep track of comments moving from thread to thread, because that's not a metric I'm interested in, but it should be possible to track somehow.
If you need more than that, you should use the Firebase-based API (https://github.com/HackerNews/API). The public dataset is also available as a Google BigQuery table: https://bigquery.cloud.google.com/dataset/bigquery-public-da....
Edit: since this subthread is not really on topic I detached it from https://news.ycombinator.com/item?id=21617478.
Even though the Hacker News API (https://github.com/HackerNews/API) is somewhat old, it's a much more kosher way of getting data.
Even better is to use the public data dump in BigQuery (https://console.cloud.google.com/marketplace/details/y-combi...). Quick query to get all top-level comments in posts by whoishiring:
#standardSQL
WITH whoishiring_posts AS (
SELECT id from `bigquery-public-data.hacker_news.full`
WHERE `by`="whoishiring" AND type="story"
)
SELECT text
FROM `bigquery-public-data.hacker_news.full`
WHERE type="comment"
AND parent IN (SELECT id from whoishiring_posts)
Here is the start point -
https://hacker-news.firebaseio.com/v0/showstories.json?print...
More - https://github.com/HackerNews/API
This could be a plain html page, which you can bookmark and check daily.
I've never used it, but it states in the README that there is no rate limit currently (~8 months ago).
The HN API [1] has been around in various forms for years and includes the same public data that's used to generate the public pages on the HN site, but rather than returning HTML pages designed for human consumption, the API returns the data in a JSON serialized form [2] designed for machine consumption [3].
When the HN API went live, it reduced the overhead and redundant work from all the programmers having to independently crawl and parse site. The HN BigQuery dataset is the same data returned by the HN API, Google just took the next step and did the work of loading it into BigQuery.
[1] https://github.com/HackerNews/API
[2] https://en.wikipedia.org/wiki/Category:Data_serialization_fo...
See here:
I knocked this little page up tonight to try and add a bit of randomness to my HN experience, to see comments I might not normally see. I thought I'd share in case anyone else feels the same way!
Built using the HN Firebase API: https://github.com/HackerNews/API
I mean they provide free access to their content API [1]. So that's a good sign sure they want people to use the content. Although I'm sure they want you to reference the Hacker News source link in whatever purpose you use it for though.
That's also a pretty good Idea, have you got much.
Why did you prefer using the HTML format instead of HN API? https://github.com/HackerNews/API
I made one myself for about an hour or two. http://morphical.ml:4000/s/17111778
You do realize HN provides an API that allows you to request any item by using an ID? [1]
Stories, comments, jobs, Ask HNs and even polls are just items.
They're identified by their ids, which are unique integers, and
live under /v0/item/.
If you really know better than everyone else who has replied to you on this story, why don't you point out the exact law that states accessing resources over HTTP is forbidden if not initiated from another resource originating from the target server? Otherwise, I'll assume your "analysis" is simply a subjective view on how you would like the web to work. A pretty limited and unrealistic view that wouldn't work in the real world.For example: here is the link to the first story posted on HN: https://news.ycombinator.com/item?id=1
1. I don't think you can access that story by starting from the front page, because scrolling for more stories only gets you to page 25. Does that mean the intention is the story is private?
2. You can now access it by using the DOM element generated for my comment. Does that mean it's public?
[1] https://github.com/HackerNews/API [2] https://hacker-news.firebaseio.com/v0/item/15943530.json?pri... [3] https://hacker-news.firebaseio.com/v0/item/15943534.json?pri...
The problems is that the HackerNews API (https://github.com/HackerNews/API) only provides a total score for a comment, instead of up and down votes, so you'd have to modify your controversy formula to use an appropriate measure of dispersion instead (https://en.wikipedia.org/wiki/Statistical_dispersion).
I'm not a statistician myself, so I can't help further.
My own take on it in general is that for personal/research use I'm not morally opposed to scraping, even when it's in violation of the ToS, with two conditions: that it doesn't place an unreasonable burden on the server, and that it doesn't invade people's privacy. The legal significance of the ToS is murky at best (disclaimer: I'm not a lawyer) but if the site asks you specifically to stop scraping them or puts up a technical barrier you should stop (morally and, in the US at least, legally: see craigslist v 3taps)
Do you happen to have any idea what actually happened internally with this? I ask this coming from the standpoint of "ouch, another example of ignored paying customers". Obviously this is a difficult question to answer generally, but extra detail about what happened has the potential to instantly pull this specific instance out of the generic "Google support is insufficiently human" bucket, which might be interesting. (Please note that I'm asking this to get the other side of the story about this, I'm not trying to shoot the messenger :) )
OK, now for my offtopic question. I think you're probably the perfect person to ask this.
https://github.com/HackerNews/API (linked from the bottom of every HN page except the add-comment page) describes HN's Firebase-based API. The current API design tends to require a lot of discrete requests to get at high-level information due to the fact that it doesn't support batching (and the page acknowledges this, with "It's not the ideal public API, but it's the one we could release in the time we had.").
Now... that page also says "There is currently no rate limit."
For some time I've wanted to track page votes over time. These are not logged, so this operation is necessarily very realtime. There are lots of posts, and when one of them goes viral the vote goes up very quickly. Perhaps you can see where this is going :)
If I wanted to try and overcome the poor API design by requesting individual items every 500ms or 250ms, or 100ms.... or 50ms......
a) at what point am I likely to get hard IP-blocked? (I'm also wondering how bad it/I would be if I used a bunch of different IPs, at least in terms of technical load.)
b) what rate should I tend to prefer so I can be nice to HN (I'm not sure what tier they're on)?
https://github.com/HackerNews/API
The firebase API is excellent. I have been using that to keep http://searchhn.com up to date in real time.
Also big query is updated every day with all comments and posts. https://bigquery.cloud.google.com/dataset/bigquery-public-da...
This is what I started with to update the Searchera (https://searchera.io) index which powers Searchhn
You're definitely right about there being miscellanious rules in there. Something that I mentioned in passing in the article is that many stories exhibit a significant drop in position once they're 15 hours old. If you look closely at the typical story trajectories you can also see various other jumps of about 10-30 positions which I would guess are triggered by these various rules.
The stories listed in the article exhibit very different behavior where they jump hundreds of positions instantaneously. It's absolutely possible that this is triggered by some automatic mechanism but if that's the case then there's an enormous amount of signifance being assigned to the corresponding rules. If there's some random component to the ranking then I highly doubt that it would be responsible for jumps of this magnitude.
I try to emphasize in the article that I do think it's possible that there's a hidden flagging threshold that's responsible and that the data can't tell us with certainty whether or not that's the case. I just personally find it unlikely that that's what happened for all of these stories. If you ran a site like Hacker News then would you put an admin link next to each post that pushes it off of the front page? I know that I would.
Yes. There's a few HN APIs/sources out there:
As to why some posts are flagged, and some to the point of '[flagged][dead]', from as much as I've witnessed it's not all that dissimilar to why some posts or comments get up-voted and others not. Certain occurrences of flagging will catch our eye based on our own perspectives.
I haven't been convinced strongly enough that there's anything nefarious afoot to warrant diving into looking at the post statistics (though I do think it would be interesting to do so). From your initial comment I might assume that you think this might be the case. If so, I encourage you to do such a study yourself. The HN APIs will likely give you enough data to provide enough data to dig into it.
No need (for me). Recipe to roll your own comments for your DIY blog:
- write your own templates, content and engine
- crack out your python/alt-language and download the HN News API https://github.com/HackerNews/API for favs/stories posted
- find your firebase json repository https://hacker-news.firebaseio.com/v0/user/bootload.json?pri...
- download json posts by id. For example ^this post^ cf: https://hacker-news.firebaseio.com/v0/item/12904458.json?pri...
- parse and roll into your blog.
Wasn't that hard was it?
Well, then disregard what I said. What I wrote was the opinion from an end-user/page-reader perspective, not from the person who'll handle tweak the styling.
>I am not sure if this is a problem from me not being native (I'm Spanish)
I wouldn't be able to tell, not being native myself (I'm Algerian).
>What twitter does is keeping the control+developing their own code, an API/library that HN doesn't have available. And of course I'm now going down the road of asking for people's username/password of the different networks.
Pardon me, but have you looked at this https://github.com/HackerNews/API http://blog.ycombinator.com/hacker-news-api ? I'm sure you are aware Reddit has an API. Again, I'm sorry for failing to see what's the very particular problem to be solved, not being a developer and all.
From the commit notes in that repo, the only changes from the initial release in 2014 are "minor README updates."
[0] https://github.com/HackerNews/API
Also, there's nothing really wrong with having a blog with no recent updates. I know there's this idea that a blog needs to be regularly updated. But a blog with just 2 valuable posts made 2 years ago, is a valuable thing!
If you just want to write some stuff and put it online, without the feeling of some ongoing obligation -- here are a few options:
https://medium.com/@jason.sackey/apps-for-super-fast-web-pub...
I hope you will be encouraged to post some stuff.
I figure since a comment of mine sparked 'saved comments' being a thing that lightning could strike twice...but then I guess the API would need to allow authentication for private data.
The same kind of thinking (Reddit->FUSE->Files) could be applied to a Reddit -> Gopher proxy. To reduce the cost of the project, I'd make it self-host the Gopher server on the user's PC and make the API calls from there, rather than setting up gopher-hackernews.xyz:70
My gut feeling is that the slowest part of this would be the API call to HN [2] or Reddit [3].
[2]: https://github.com/HackerNews/API
This extension was created using the official Hacker News Api: https://github.com/HackerNews/API
[1] https://github.com/HackerNews/API
[2] https://github.com/algolia/hn-search
[3] https://archive.org/details/HackerNewsStoriesAndCommentsDump
[4] https://ia902503.us.archive.org/33/items/HackerNewsStoriesAn...
[5] http://shitalshah.com/p/downloading-all-of-hacker-news-posts...
But it specifically says "We hope to improve it over time, and may later enable access to private per-user data using OAuth." So I'm assuming that login, votes, comments, submission is for a future release of the API. That might be why the apps you use resort to scraping HN.
For reference, here's the blog post talking about the release of the API: http://blog.ycombinator.com/hacker-news-api
http://blog.ycombinator.com/hacker-news-api
Uses the Aylien text analysis api to summarize and the Hackernews api to get the articles
Code here: https://github.com/Bachmann1234/hn-tldr