What does HackerNews think of API?

Documentation and Samples for the Official HN API

My best guess is that someone migrated DNS providers (or some engineer playing with infrastructure as code…) and it’s just an unintentional missing DNS record, oops.

The Hacker News API is still up: https://github.com/HackerNews/API https://hacker-news.firebaseio.com/v0/topstories.json?print=...

To @dang or whoever is reading at Algolia: you can set up continuous monitoring for problems at the DNS/TCP/TLS/HTTP layers from just your URL at https://heiioncall.com/ and we’d definitely alert you if the DNS record disappears or it’s otherwise unreachable for more than X minutes. (Email me if you need help getting it set up.)

What would it take for HN to become ActivityPub compatible?

I do not foresee this happening. The moderation and voting would require a significant level of trust of the other instances, admins and moderators. Being associated with other multi-media sites introduces legal risks I would be surprised if a VC was willing to take on. Assuming you mean a bi-directional integration into ActivityPub

That said one could run something like HN and moderate it independently. Maybe Postmill [1a][1b]. There are other sites that ingest data from HN's API [2] for search, statistical analytics, etc... Something similar could probably be done that would be ingested read-only into ActivityPub from another site but I would be quite surprised to see this site implement ActivityPub directly. I don't know what the API rate limits are for this site or if Mastodon could operate within them.

[1a] - https://postmill.xyz/

[1b] - https://gitlab.com/postmill/Postmill

[2] - https://github.com/HackerNews/API

If you care about the number of votes or comments an article gets you can get it out of Firebase

https://github.com/HackerNews/API

and it is a much more certain thing. I have a (i) a model that predicts "will this headline get more than 10 votes?" and (ii) one that predicts "if this headline gets more than 10 votes does it get a ratio of comments to votes greater than the median (roughly 0.5)?"

The best model I have for (i) is still a bag of words model that doesn't try to correct for time series variations, the AuC is atrocious, maybe around 65%, but I like the model because high-scoring headlines look like a parody of high-scoring headlines, I think "Richard Stallman has died" could be the best possible headline. (It's silly to thing you could get good performance at this because it can't see if the article has a flashy picture or other attractive attributes that would raise the vote rate.) I've made other models with fancier methods but none perform better nor are more entertaining.

As for (ii) the most commented articles tend to be clickbaity so it would be irresponsible to submit a feed of high scoring articles that isn't well curated. I am getting an AuC of around 72% which is what I got with my first recommender.

There is a hacker news API so somebody could set up an NNTP server with an HN newsgroup. Looks like it's already been done, at least for personal use. If a newsgroup is read-only and only on one NNTP server, does it count as usenet? Is that like the sound of one hand clapping?

https://github.com/HackerNews/API

https://www.npmjs.com/package/hackernews2nntp

https://github.com/gromnitsky/hackernews2nntp

Honorable mention for gwene.org which converts RSS feeds into NNTP and gmane.org that converts mailing lists into NNTP.

Author should have a look at https://github.com/HackerNews/API instead of scraping HN

The HN API is linked to from every page on HN, at the bottom of the page in the footer. Look for the link named “API”.

Your app looks great! Re data: we just get the front page every few minutes from https://github.com/HackerNews/API The table is almost 10MB right now, so pretty small
I don’t know why, but I agree it can be annoying…

As a trick, you can use the API [1]. Get the story ID in the URL of your post, and place it at the end of the API call (e.g. [2]).

If the result has `dead: true`, your post is… well… dead ;)

[1] https://github.com/HackerNews/API [2] https://hacker-news.firebaseio.com/v0/item/35700745.json?pri...

You can make the intermediate step a bit more structured too via https://github.com/HackerNews/API

For example, for the March one it is ID 34983767 (from the algolia search or a "there's only so many of them, here's a list that I'll add to each month").

You can then get a list of all the top level comments at https://hacker-news.firebaseio.com/v0/item/34983767.json?pri...

And then pulling up a comment at https://hacker-news.firebaseio.com/v0/item/35255027.json?pri... to not have to parse any of its child comments or the HTML of the page.

(late edit: and re-reading the blog post while not trying to pay half attention to a meeting... that is what you are doing)

Built a terminal browser of HN based of HN API https://github.com/HackerNews/API and replbuilder: https://github.com/Aperocky/replbuilder

Currently only supports read operations, but planning to add login and commenting eventually.

In case no one post some hard number, you can try to calculate them yourself using https://github.com/HackerNews/API (If does not include [dead] comments, neither page views.)
There are a few sources. There's the official API [0], the Algolia search API [1], and the BigQuery dataset which is pretty up to date [2].

I used the Algolia search API, it has extremely generous rate limits and page limits.

[0]: https://github.com/HackerNews/API [1]: https://hn.algolia.com/api [2]: https://console.cloud.google.com/bigquery?p=bigquery-public-...

There's an API[0] but it's frustratingly limited in capabilities (albeit not rate-limited.) You'll have to iterate all post IDs, download each post as JSON and get the titles that way.

There's also a Google dataset but I don't know the URL for it or if it's up to date.

[0]https://github.com/HackerNews/API

It's accessible through the API: https://github.com/HackerNews/API Although it doesn't seem to be sorted by score.
I know, I know, you don't need an external package just to interface with the HN API.

Nevertheless, I needed this to make my life easier in a project I'm working on, so, here you go.

This package provides a wrapper for the Firebase Hacker News API [1] in Go.

Besides the obvious callouts to the documented endpoints, it also provides utilities functions that can be useful when working with data returned from this API.

I'm a beginner at Go, so feedback to improve my skills is more than welcome. Together with PRs to the repo, of course!

[1] https://github.com/HackerNews/API

They actually even admit that the API garbage. They say:

> "The v0 API is essentially a dump of our in-memory data structures. We know, what works great locally in memory isn't so hot over the network. Many of the awkward things are just the way HN works internally. Want to know the total number of comments on an article? Traverse the tree and count. Want to know the children of an item? Load the item and get their IDs, then load them. The newest page? Starts at item maxid and walks backward, keeping only the top level stories. Same for Ask, Show, etc. I'm not saying this to defend it - It's not the ideal public API, but it's the one we could release in the time we had. While awkward, it's possible to implement most of HN using it."

https://github.com/HackerNews/API

If one wants to get the comments for a post on HN from the API, they have to first fetch the post, then fetch every single comment of the post 1 by 1. So if the post has 500 comments, one has to basically send over 500 network requests......

They did say that they plan on released a better API in the future. However, as far as I know, it will be read only initially. So won't be much useful for my app.

- Good point, I'll add a setting in the next update to always show icons.

- Parent / context are in the menu just to keep the design clean. But I understand that's not what everyone wants, so I will add a setting to show these with the other icons, instead of in the menu.

- Prev / next are moved to the "control pad" (pro feature), but my intention is for the free version to support all current HN features, so I will add those back in the next update.

- The permission to firebase is to access the HN API [1] to grab user profile info etc. The other is for the pro upgrade using ExtPay [2]

[1] https://github.com/HackerNews/API

[2] https://github.com/glench/ExtPay

Link at the bottom of HackerNews that says “API” points to this GitHub repository [1].

The first commit is from Oct 2, 2014 while the article was published on April 1, 2018.

However, it seems that you did not read the first paragraph of the article which states the following:

> While building hackd I faced a problem - the official Hacker News API doesn’t allow for interaction, such as upvoting, posting and commenting. I wanted hackd to be a full featured Hacker News client, so this wasn’t going to cut it.

So to answer your question:

> Didn’t HN get an official API a long time ago? Or was that read-only?

Yes, it appears that at the time of publication, the API was read-only.

[1] https://github.com/HackerNews/API

Maybe used api[0] to get The top stories and then just parsed all the domains.

[0] https://github.com/HackerNews/API

HN does have a REST API which is quite easy to use.

https://github.com/HackerNews/API

I'm not sure what rate limiting policy is in place, but in theory you can start with a request for maxitem and from that point on just GET all items down to zero until you hit some sort of blocker.

Thanks! The GraphQL API is just a wrapper of HackerNews REST API (doc: https://github.com/HackerNews/API)

The REST endpoints do not provide a way to query the items with variable parameters. So, I could not support it either on GraphQL side.

You can check my effort at https://github.com/hsblhsn/hn.hsblhsn.me

I've been using the Hacker News API[0] and Python's Flask web server to create my own curated version of the website. Not only does it place new articles, since the last time I refreshed, at the top of the list and marks them with an asterisk but it also features a carefully considered blocklist of terms and combinations of terms that I tend to fall into a rabbit hole on. Also some subjects, particularly highly charged political content that while important and deserving of careful consideration I would simply prefer to discuss elsewhere, do not show up in my feed.

[0]https://github.com/HackerNews/API

EDIT:

Paste this hacky snippet into your browser's console to save all of the links into your clipboard:

  (() => {
    let result = [...document.querySelectorAll('td a')]
        .map((link) =>
            link &&
            !link.href.includes('ycombinator') &&
            !link.href.includes('javascript:void')
                ? link.href
                : ''
        )
        .filter(Boolean);
    result = [...new Set(result)];
    result.sort();
    copy(result.join('\n\n'));
  })();


Just to help, here's a compiled-list of all of the links shared here so far:

http://ayende.com/blog

http://blog.cleancoder.com/

http://journal.stuffwithstuff.com/

http://www.stargrave.org/LinksCatPersonal.html

http://xahlee.info/kbd/keyboard_hardware_and_key_choices.htm...

https://0xd34df00d.me/

https://2ality.com/

https://andymatuschak.org/

https://austinhenley.com/blog.html

https://bas.codes/

https://bas.codes/posts/python-slicing

https://bernsteinbear.com/pl-resources/

https://blog.aawadia.dev/

https://blog.acolyer.org/

https://blog.benjojo.co.uk/

https://blog.codinghorror.com/

https://blog.esteetey.dev/

https://blog.johnnyreilly.com/

https://blog.kwatafana.org/

https://blog.ploeh.dk/archive/

https://bloggingfordevs.com/trends/

https://blogsurf.io/

https://bowtiedfox.substack.com/

https://brandur.org/articles

https://briancallahan.net/blog

https://brooker.co.za/blog/

https://ciechanow.ski/

https://collection.mataroa.blog/

https://danluu.com/

https://drewdevault.com/

https://dsebastien.net/

https://dusted.codes/

https://earthly.dev/blog/authors/adam/

https://eli.thegreenplace.net/

https://fabiensanglard.net/

https://fasterthanli.me/

https://flak.tedunangst.com/

https://fsharpforfunandprofit.com/

https://github.com/crispgm/awesome-engineering-blogs

https://github.com/HackerNews/API

https://github.com/jkup/awesome-personal-blogs

https://github.com/markodenic/awesome-tech-blogs

https://github.com/search?q=list+of+awesome+blogs

https://headrush.typepad.com/creating_passionate_users/

https://hn.algolia.com/?query=Ask%20HN%3A%20Great%20Blogs%20...

https://joelonsoftware.com/

https://journal.stuffwithstuff.com/

https://jrsinclair.com/

https://justine.lol/

https://jvns.ca/

https://jvns.ca/blog/2016/04/09/some-of-my-favorite-blogs/

https://jwstanly.com/blog/

https://kerkour.com/

https://kinduff.com/

https://learnbyexample.github.io/py_resources/miscellaneous....

https://lemire.me/blog/

https://lmy.medium.com/

https://martinfowler.com/

https://matklad.github.io/

https://matt.might.net/articles/

https://michelenasti.com/

https://modfoss.com/

https://nickp.svbtle.com/

https://noobmaker.substack.com/

https://nullprogram.com/

https://paulmck.livejournal.com/

https://poor.dev/blog

https://prog21.dadgum.com/

https://randomascii.wordpress.com/

https://scottlocklin.wordpress.com/

https://simonwillison.net/

https://simpleprogrammer.com/ultimate-list-software-develope...

https://staysaasy.com/

https://staysaasy.com/engineering/2020/05/30/Picking-Your-Te...

https://staysaasy.com/software/2022/01/17/complexity.html

https://tenthousandmeters.com/

https://unixsheikh.com/

https://vadimkravcenko.com/

https://www.buildthestage.com/

https://www.codingshorts.io/

https://www.davidvlijmincx.com/

https://www.fluentcpp.com/

https://www.go350.com/

https://www.hanselman.com/blog/

https://www.husseinnasser.com/search

https://www.jeffgeerling.com/blog

https://www.joshwcomeau.com/

https://www.kalzumeus.com/archive/

https://www.stochasticlifestyle.com/

https://www.swyx.io/rss

https://www.taniarascia.com/

https://www.theerlangelist.com/

https://www.youtube.com/channel/UC_ML5xP23TOWKUcc-oAE_Eg

https://www.yusufaytas.com/

https://www.zhenghao.io/

FYI, this uses the Hacker News API which I didn't know existed until researching this: https://github.com/HackerNews/API
Quite surprised that you're scraping HN and parse the DOM. Maybe should try use the API? https://github.com/HackerNews/API
> I generally wait about 5 seconds between checks of a profile

If you're scraping HN, please wait 30 seconds (https://news.ycombinator.com/robots.txt) - our app server still runs on a single core, so we don't have a lot of performance to spare. (Hopefully that will change this year.)

If you need to check more frequently, https://github.com/HackerNews/API works fine and you can get JSON that way anyhow.

>for something like Hacker News it's simply future-proofing in case they decide to block scrapers at some point in the future

Why are you scraping? We have an API, it's linked at the bottom of every page[0].

0: https://github.com/HackerNews/API

You may not need scraping. HN has a rudimentary but functioning API [1], and there is dump on BigQuery [2].

[1]: https://github.com/HackerNews/API

[2]: https://console.cloud.google.com/marketplace/product/y-combi...

No; a copy of the HN database is synced regularly to Firebase (https://github.com/HackerNews/API), but IIRC the site itself runs on a single process on a single machine with a standby ready.

edit: Yup. https://news.ycombinator.com/item?id=28479595

didn't know about it. https://github.com/HackerNews/API . Checked it out, the api wouldn't have helped me: the API reponse for this flagged story doesn't indicate, that it has been flagged. https://hacker-news.firebaseio.com/v0/item/29103056.json?pri...

On the other hand: they gave me an idea. I can structure the crawler differently. Right now, I am now following the next page (apparently there is a limit on the number of 'next' pages). Instead, it is possible to get the maximum item id, then just decrement the item id down and down again.

Sorry, didn't mean admitting to "absolute garbage" in a literal way. Mostly referring to statement such as:

> I'm not saying this to defend it - It's not the ideal public API, but it's the one we could release in the time we had. While awkward, it's possible to implement most of HN using it.

https://github.com/HackerNews/API

My words may have been a bit too harsh but I still think I have a point. The only thing I could find use of the API was to get the user's details (karma, description, date created etc). Other than that, I ended up relying on scraping.

Unfortunately, unless the JSON api offers write access, it won't be of much use for my app's use cases. Lets just hope the JSON Api isn't going to break anything major in the HTML of the site and there aren't any major re-designs of the site in the pipeline.

Hmm, it seems to be working. There are some nice examples on https://github.com/HackerNews/API
A bit offtopic, but I don't see much people knowing/using the Algolia API[0]. It's much better to use than the HN official API[1], since it returns the whole tree data in one request.

Unfortunately (I guess this is a big reason why people don't use it), it doesn't sort the comments – if you need the orders, you'll have to parse HN HTML (or just use the official API).

Still just two requests (the HN site, the Algolia API) is much better than recursively requesting a hundred requests, so I use this approach in my client[2].

[0]: https://hn.algolia.com/api

[1]: https://github.com/HackerNews/API

[2]: https://github.com/goranmoomin/HackerNews

Thanks for the feature requests! Will definitely add a text zoom feature and a ‘My Posts’ category.

> Any reason you're connecting to firebase rather than storing preferences locally?

All of your preferences are stored locally — the firebase connection is due to the official HN API[0] implemented on Firebase. Your HN credentials never leave the app (except for news.ycombinator.com) — it’s tucked in your keychain.

[0]: https://github.com/HackerNews/API

> Does the Algolia hackernews api ( the specific API this client uses ) still require you to query in a tree like fashion, every comment item id if you want an entire page of comments?

I’m using two different APIs — the inefficient API that requires you querying in a tree fashion is the HN official API[0], but there’s also the Algolia API[1], which is much faster and gives a much sensible data shape. I also do actually fetch the HN website as well — it’s needed for account/voting features. With these sources, it’s much faster than a usual client that gets it’s data from the official API.

> The HN website I think is pretty ideal for the type of content its displaying, a native app doesn't give you much over having a HN tab open.

I guess it’s a bit of difference on how one uses HN? I didn’t like HN tabs being mixed with other work—related tabs. I felt that having an app would be a perfect solution to me, but YMMV I guess.

> The only benefit I see from a native app is the ability to save articles and comments for offline reading

I guess that’s one more feature that I should add to my backlog :)

[0]: https://github.com/HackerNews/API

[1]: https://hn.algolia.com/api

Is there actually a search.json endpoint?

https://github.com/HackerNews/API doesn't list this as a valid endpoint. Indeed, when I try to access it via curl, I get a HTTP 405 with a "Permission denied" error (same result when I try to access nonexistent-endpoint.json).

Based on the HN search on the website, I'd expect the correct autocomplete to involve hn.algolia.com [0].

[0] https://hn.algolia.com/api points at https://hn.algolia.com/api/v1/search?query=...

To me, this points at the need for human input with a system like this. There is a Firebase endpoint, yes, and Copilot found that correctly! But then it invented a new endpoint that doesn't exist.

Interesting idea. (Years ago someone once took the source code for Slashdot and created their own clone.) And a while back someone asked if the source code for Hacker News was openly available.

https://news.ycombinator.com/item?id=1390685

And there's also an API.

https://github.com/HackerNews/API

There's lots of Wikipedia-like software available, too.

I guess it depends on what you mean by "build" a community. (Though if you're trying to do it online, the key is going to be comments and maybe also individual accounts.) You could just try starting your own subreddit.

Another option is just announcing meetups -- maybe an online (or live?) book club for geeks or programmers, or a geek discussion circle.

Last time I "archived" my account data on HN I used https://github.com/HackerNews/API which seems to be working good enough for my needs.
There is public API for HN data

https://github.com/HackerNews/API

Does it count like ability to download your data?

> As for API tokens, that's unfortunately just the current trend of basically every other site with user-generated content.

Totally hilarious that you posted this on a discussion site whose API doesn't require tokens.

https://github.com/HackerNews/API

If I understand correctly, Algolia front-page API[0] only allows you to get at most 34 stories (either with `sort_by` points or submitted date).

Hacker News official APIs[1], on the other hands, use a specific algorithm to determine the ranking of submissions and allow you to get at most 500 stories.

[0]: https://hn.algolia.com/api/v1/search?tags=front_page&hitsPer...

[1]: https://github.com/HackerNews/API

>> you were crawling news.ycombinator.com, right?

No, for retrieving the Hacker News Posts we were using the public Hacker News API, which returns the posts in JSON format: https://github.com/HackerNews/API

The crawling speed of 100...1000 pages per second refers to crawling the external pages linked from Hacker news posts. As they are from different domains we can achieve a high crawling speed while being a polite crawler with a low crawling rate per domain.

I notice there's an API https://github.com/HackerNews/API but don't see any of the core site code. Is there any way people can contribute back?
Not sure you mean, content is from the API https://github.com/HackerNews/API
I'm in the core site minimalism crowd.

Especially because the api its pretty solid you can build whatever stuff you want

https://github.com/HackerNews/API

You could implement you own tagging system and see if people bite.

For instance I built a comment notifier.

http://hacknotescenter.com/

feel free to hit me up about it if you're curious

The very least public APIs are exposed via firebase.

https://github.com/HackerNews/API

OP: Hackernews offers an excellent API via Firebase: https://github.com/HackerNews/API. Unfortunately, their fetch method doesn't offer any way to filter by item types. So collecting all posts take excessively long thanks to all the comments.

Based on the official item API, this focuses only on the main posts with at least 2 engagements. The posts in datasets are about 1.5% of the items returned

With posts as a starting point, you can easily trace the hierarchy from "kids" fields.

Potential Usage:

- Generate popular titles.

- Collect comments in a hierarchical order for further training.

- Analyze popular topics in the engineering community.

- Identify the best time to post for maximum engagement.

You can only sort by popularity, which maps to 'score' in the data structure that the API sorts. You can see that here: https://github.com/HackerNews/API The highest position on front page is not, AFAICT stored in any way and is not searchable.
HackerNews has an open API (https://github.com/HackerNews/API). It doesn't say anything specific about the legality and there have been many HackerNews clones in the past. So as far as I know it should not be a problem, but if anyone can correct me I would love to hear so!
Use this an example, but for real stuff prefer using the official API https://github.com/HackerNews/API
I set up a job posting sentiment analysis ML app on the Who's Hiring post recently and use a the public firebase REST API: https://github.com/HackerNews/API . It mostly works for my usecase but its search capabilities are severely lacking over REST so might not be the best in your case. I never got around to trying a firebase client so I have no clue if it interacts with a different interface, might be worth taking a quick look at.
What is the difference between the Algolia HN API vs the Hacker News API (https://github.com/HackerNews/API)? The Hacker News API doesn't have any rate limits but appears to be similar to the Algolia API. Does Algolia have its own HN dataset?
[flagged] is a visible indicator, so you can just do that analysis yourself. Either use the API https://github.com/HackerNews/API or, if that doesn't include the flag status, scrape it from the website. Then classify whatever number of flagged stories you end up collecting based on their topic.
Don't forget about their API on github.

It's simplistic but you can still use it to write a sh*ttier version of HN.

https://github.com/HackerNews/API

I just wish they'd open source their We're-Not-Reddit behaviors library.

Just use one of those CSS extensions and roll your own using HN's API: https://github.com/HackerNews/API

Also, CSS is an abomination, aesthetics have always belonged on the client side.

If you look at the comment through the Hacker News API, there's two spaces before the last sentence.

https://hacker-news.firebaseio.com/v0/item/22975749.json?pri...

Although, the documentation says that field is HTML, so I'm not sure what to think.

https://github.com/HackerNews/API

Not to discount your effort, but is there a benefit for scraping the data when HN offers an official API[1]? Does the API not expose all the data you need?

[1] https://github.com/HackerNews/API

It's a shame favorites aren't exposed in the official HN API: https://github.com/HackerNews/API - this is a smart workaround.
This is a console application that I wrote for browsing HN in a terminal. It has an ncurses UI and uses libcurl to query the HN API [0]. Currently works on Linux and Mac. I also ported it to web with Emscripten's Fetch API so it can be easily demonstrated [1].

[0] - https://github.com/HackerNews/API

[1] - https://hnterm.ggerganov.com

HN has an API: https://github.com/HackerNews/API

I'm using it to track common items here and on Lobste.rs (and Proggit):

http://gerikson.com/hnlo/

Here's the endpoint for the latest 500 submissions:

https://hacker-news.firebaseio.com/v0/newstories.json

here's the one for the current top stories:

https://hacker-news.firebaseio.com/v0/topstories.json

It's actually quite nice to work with. I don't know how to keep track of comments moving from thread to thread, because that's not a metric I'm interested in, but it should be possible to track somehow.

Once per 30 seconds. That's in our robots.txt: https://news.ycombinator.com/robots.txt. We've been working for a long time on (edit: what we expect to be) some serious performance improvements that might allow us to relax that limit. For now though, HN's process still runs on a single core and we don't have much performance to spare.

If you need more than that, you should use the Firebase-based API (https://github.com/HackerNews/API). The public dataset is also available as a Google BigQuery table: https://bigquery.cloud.google.com/dataset/bigquery-public-da....

Edit: since this subthread is not really on topic I detached it from https://news.ycombinator.com/item?id=21617478.

While the article seem intended as a web scraping tutorial, it's good to remember that there is an official HN api [1] in case you really want to do data science on the site.

[1] https://github.com/HackerNews/API

HTML scraping should be the last option you consider to get data after all else fails.

Even though the Hacker News API (https://github.com/HackerNews/API) is somewhat old, it's a much more kosher way of getting data.

Even better is to use the public data dump in BigQuery (https://console.cloud.google.com/marketplace/details/y-combi...). Quick query to get all top-level comments in posts by whoishiring:

    #standardSQL
    WITH whoishiring_posts AS (
      SELECT id from `bigquery-public-data.hacker_news.full`
      WHERE `by`="whoishiring" AND type="story"
    )

    SELECT text
    FROM `bigquery-public-data.hacker_news.full`
    WHERE type="comment"
    AND parent IN (SELECT id from whoishiring_posts)
You can build a simple webpage for yourself using the APIs

Here is the start point -

https://hacker-news.firebaseio.com/v0/showstories.json?print...

More - https://github.com/HackerNews/API

This could be a plain html page, which you can bookmark and check daily.

Did you try the HackerNews API? [0].

I've never used it, but it states in the README that there is no rate limit currently (~8 months ago).

0. https://github.com/HackerNews/API

The Internet is written in ink. You should assume that any and all public posts you make have already been replicated and archived by countless parties in countless ways by the time you hit delete. HN public postings are no different.

The HN API [1] has been around in various forms for years and includes the same public data that's used to generate the public pages on the HN site, but rather than returning HTML pages designed for human consumption, the API returns the data in a JSON serialized form [2] designed for machine consumption [3].

When the HN API went live, it reduced the overhead and redundant work from all the programmers having to independently crawl and parse site. The HN BigQuery dataset is the same data returned by the HN API, Google just took the next step and did the work of loading it into BigQuery.

[1] https://github.com/HackerNews/API

[2] https://en.wikipedia.org/wiki/Category:Data_serialization_fo...

[3] https://en.wikipedia.org/wiki/Machine_to_machine

Not directly, but you may find their API useful: https://github.com/HackerNews/API
And here's yet another searchable/filterable HN archive https://hn.algolia.com. You can find more info on the (free) HN API here https://github.com/HackerNews/API
There’s an API[0]. If no one else has done so already, you could write your own notification system.

0: https://github.com/HackerNews/API

It's using the official Hacker News API (https://github.com/HackerNews/API), not scraping web sites.
Yes. Some sites as well as apps use it (so scraping the web pages is not required).

See here:

https://github.com/HackerNews/API

HN's comments section is filled with arcane information, strange links and great insights (and sometimes really weird stuff.) Often the comments are more interesting than the links they're in response to.

I knocked this little page up tonight to try and add a bit of randomness to my HN experience, to see comments I might not normally see. I thought I'd share in case anyone else feels the same way!

Built using the HN Firebase API: https://github.com/HackerNews/API

Thanks for having a look. I'm not a lawyer either, but it appears they just want the rights to do whatever they want with the content from Hacker News. They don't seem to explicitly forbid the use of content from the site.

I mean they provide free access to their content API [1]. So that's a good sign sure they want people to use the content. Although I'm sure they want you to reference the Hacker News source link in whatever purpose you use it for though.

That's also a pretty good Idea, have you got much.

[1] https://github.com/HackerNews/API

Good job.

Why did you prefer using the HTML format instead of HN API? https://github.com/HackerNews/API

Well, you could always try creating your own HN frontend[1]

I made one myself for about an hour or two. http://morphical.ml:4000/s/17111778

[1]https://github.com/HackerNews/API

The API is linked at the bottom of this page, https://github.com/HackerNews/API
> I think most people would assume that documents that can only be accessed by editing an ID were not meant to be accessed. And that really is the end of the analysis.

You do realize HN provides an API that allows you to request any item by using an ID? [1]

    Stories, comments, jobs, Ask HNs and even polls are just items. 
    They're identified by their ids, which are unique integers, and 
    live under /v0/item/.
If you really know better than everyone else who has replied to you on this story, why don't you point out the exact law that states accessing resources over HTTP is forbidden if not initiated from another resource originating from the target server? Otherwise, I'll assume your "analysis" is simply a subjective view on how you would like the web to work. A pretty limited and unrealistic view that wouldn't work in the real world.

For example: here is the link to the first story posted on HN: https://news.ycombinator.com/item?id=1

1. I don't think you can access that story by starting from the front page, because scrolling for more stories only gets you to page 25. Does that mean the intention is the story is private?

2. You can now access it by using the DOM element generated for my comment. Does that mean it's public?

[1] https://github.com/HackerNews/API

The last Who Is Hiring went on for something like 5 pages, so it was a bit clunky to search for, in-browser, but the API makes it pretty easy to scrape/search whatever you like: https://github.com/HackerNews/API
@siddhant's recommendation to use the algolia search is probably the best way for simple searching of what you want. You could also always use the API and build a UI that better suits your needs, should existing ones not suffice.

https://github.com/HackerNews/API

One way to calculate this would be to use this controversy formula (https://math.stackexchange.com/a/318510/10887) on all comments of a user whose total karma exceeds a certain threshold, say 1000, and then add up those controversy scores.

The problems is that the HackerNews API (https://github.com/HackerNews/API) only provides a total score for a comment, instead of up and down votes, so you'd have to modify your controversy formula to use an appropriate measure of dispersion instead (https://en.wikipedia.org/wiki/Statistical_dispersion).

I'm not a statistician myself, so I can't help further.

In a couple places it sounds like you're interested in scraping HN -- if that's the case there is an official API: https://github.com/HackerNews/API

My own take on it in general is that for personal/research use I'm not morally opposed to scraping, even when it's in violation of the ToS, with two conditions: that it doesn't place an unreasonable burden on the server, and that it doesn't invade people's privacy. The legal significance of the ToS is murky at best (disclaimer: I'm not a lawyer) but if the site asks you specifically to stop scraping them or puts up a technical barrier you should stop (morally and, in the US at least, legally: see craigslist v 3taps)

Hi! One related question, and one offtopic question.

Do you happen to have any idea what actually happened internally with this? I ask this coming from the standpoint of "ouch, another example of ignored paying customers". Obviously this is a difficult question to answer generally, but extra detail about what happened has the potential to instantly pull this specific instance out of the generic "Google support is insufficiently human" bucket, which might be interesting. (Please note that I'm asking this to get the other side of the story about this, I'm not trying to shoot the messenger :) )

OK, now for my offtopic question. I think you're probably the perfect person to ask this.

https://github.com/HackerNews/API (linked from the bottom of every HN page except the add-comment page) describes HN's Firebase-based API. The current API design tends to require a lot of discrete requests to get at high-level information due to the fact that it doesn't support batching (and the page acknowledges this, with "It's not the ideal public API, but it's the one we could release in the time we had.").

Now... that page also says "There is currently no rate limit."

For some time I've wanted to track page votes over time. These are not logged, so this operation is necessarily very realtime. There are lots of posts, and when one of them goes viral the vote goes up very quickly. Perhaps you can see where this is going :)

If I wanted to try and overcome the poor API design by requesting individual items every 500ms or 250ms, or 100ms.... or 50ms......

a) at what point am I likely to get hard IP-blocked? (I'm also wondering how bad it/I would be if I used a bunch of different IPs, at least in terms of technical load.)

b) what rate should I tend to prefer so I can be nice to HN (I'm not sure what tier they're on)?

This is awesome ! Congrats..

https://github.com/HackerNews/API

The firebase API is excellent. I have been using that to keep http://searchhn.com up to date in real time.

Also big query is updated every day with all comments and posts. https://bigquery.cloud.google.com/dataset/bigquery-public-da...

This is what I started with to update the Searchera (https://searchera.io) index which powers Searchhn

I'm not sure if we are ready to document the process, but I can tell you that having an API based on Firebase helps a lot:

- https://github.com/HackerNews/API

Very cool! If you don't want to worry about crawling HN you can use the API https://github.com/HackerNews/API
Good points! I use the official Hacker News API [1] and my requests come from an ip address that is completely disconnected from my personal account. Even if the API usage were a red flag, there would be no way to automatically connect it to me.

You're definitely right about there being miscellanious rules in there. Something that I mentioned in passing in the article is that many stories exhibit a significant drop in position once they're 15 hours old. If you look closely at the typical story trajectories you can also see various other jumps of about 10-30 positions which I would guess are triggered by these various rules.

The stories listed in the article exhibit very different behavior where they jump hundreds of positions instantaneously. It's absolutely possible that this is triggered by some automatic mechanism but if that's the case then there's an enormous amount of signifance being assigned to the corresponding rules. If there's some random component to the ranking then I highly doubt that it would be responsible for jumps of this magnitude.

I try to emphasize in the article that I do think it's possible that there's a hidden flagging threshold that's responsible and that the data can't tell us with certainty whether or not that's the case. I just personally find it unlikely that that's what happened for all of these stories. If you ran a site like Hacker News then would you put an admin link next to each post that pushes it off of the front page? I know that I would.

[1] - https://github.com/HackerNews/API

As to why it was flagged, the submission has no direct tech link, and while it's politically relevant and of interest to some, it wouldn't surprise me that some users have flagged it, seeing it unlikely to produce a constructive discussion while producing a lot of flames and uncivil behavior.

As to why some posts are flagged, and some to the point of '[flagged][dead]', from as much as I've witnessed it's not all that dissimilar to why some posts or comments get up-voted and others not. Certain occurrences of flagging will catch our eye based on our own perspectives.

I haven't been convinced strongly enough that there's anything nefarious afoot to warrant diving into looking at the post statistics (though I do think it would be interesting to do so). From your initial comment I might assume that you think this might be the case. If so, I encourage you to do such a study yourself. The HN APIs will likely give you enough data to provide enough data to dig into it.

- https://github.com/HackerNews/API

- https://hn.algolia.com/api

I think you might be able to get at some of that through one of the HN APIs, but it would be aggregate in the sense that you would get the total score of a story, not the votes that comprise it.

- https://github.com/HackerNews/API

- https://hn.algolia.com/api

" rel="nofollow">https://comments.network/comments.js">

No need (for me). Recipe to roll your own comments for your DIY blog:

- write your own templates, content and engine

- crack out your python/alt-language and download the HN News API https://github.com/HackerNews/API for favs/stories posted

- find your firebase json repository https://hacker-news.firebaseio.com/v0/user/bootload.json?pri...

- download json posts by id. For example ^this post^ cf: https://hacker-news.firebaseio.com/v0/item/12904458.json?pri...

- parse and roll into your blog.

Wasn't that hard was it?

>By embed I meant to embed the content; I've heard from quite a few people that they love that they can personalize it with simple CSS.

Well, then disregard what I said. What I wrote was the opinion from an end-user/page-reader perspective, not from the person who'll handle tweak the styling.

>I am not sure if this is a problem from me not being native (I'm Spanish)

I wouldn't be able to tell, not being native myself (I'm Algerian).

>What twitter does is keeping the control+developing their own code, an API/library that HN doesn't have available. And of course I'm now going down the road of asking for people's username/password of the different networks.

Pardon me, but have you looked at this https://github.com/HackerNews/API http://blog.ycombinator.com/hacker-news-api ? I'm sure you are aware Reddit has an API. Again, I'm sorry for failing to see what's the very particular problem to be solved, not being a developer and all.

Since the Hacker News API (https://github.com/HackerNews/API) used in this scraping is being brought up again, I'll ask a burning question: is development of the API dead?

From the commit notes in that repo, the only changes from the initial release in 2014 are "minor README updates."

I just dug a little into the HN API (https://github.com/HackerNews/API). Unfortunately, they don't publish comment ratings... :-( Unless that is changed, implementing an automatic "best comment finder" would become rather more tedious... (And perhaps even turn out to be AI complete ;-) )
Surprised that nobody has mentioned it since we're here, but the Hacker News API is powered by Firebase: https://github.com/HackerNews/API
That sounds like a cool idea. HN has an API [0] ... not sure if this is possible or allowed though.

[0] https://github.com/HackerNews/API

Also, there's nothing really wrong with having a blog with no recent updates. I know there's this idea that a blog needs to be regularly updated. But a blog with just 2 valuable posts made 2 years ago, is a valuable thing!

If you just want to write some stuff and put it online, without the feeling of some ongoing obligation -- here are a few options:

https://medium.com/@jason.sackey/apps-for-super-fast-web-pub...

I hope you will be encouraged to post some stuff.

Cool, I saw there are API's https://github.com/HackerNews/API .... Anybody implemented ?
Putting in the request now for HN to update their API to provide this: https://github.com/HackerNews/API

I figure since a comment of mine sparked 'saved comments' being a thing that lightning could strike twice...but then I guess the API would need to allow authentication for private data.

I have too many projects, one that I shelved was "Reddit-as-a-filesystem", because I was messing around with Dokany [1] at the time.

The same kind of thinking (Reddit->FUSE->Files) could be applied to a Reddit -> Gopher proxy. To reduce the cost of the project, I'd make it self-host the Gopher server on the user's PC and make the API calls from there, rather than setting up gopher-hackernews.xyz:70

My gut feeling is that the slowest part of this would be the API call to HN [2] or Reddit [3].

[2]: https://github.com/HackerNews/API

[3]: https://www.reddit.com/dev/api

[1]: https://github.com/dokan-dev/dokany

Major problem with it going offline, hence this new tool, which may or may not go offline at some point again. I am sure you could build an extension that polls the HN API every couple of seconds or so to check for replies and have it show a popup when you get a notification as well as send an email.

https://github.com/HackerNews/API

HN API? https://github.com/HackerNews/API Haven't used it yet, so no comment on how well it works/how far back it goes)
This is a simple extension for Google Chrome Browser that let you read the top news from Y Combinator Hacker News.

This extension was created using the official Hacker News Api: https://github.com/HackerNews/API

They've been telling client developers that they should move from screen-scraping to using the API (https://github.com/HackerNews/API) for a while now. I don't know if that's the problem with this specific client, but you may want to check with the developers and see if it is, and if so, what their ETA is for getting onto the API.
There's the main HackerNews API [1] via firebase.com, and there's also the Algolia HN Search API [2]. Over the years I've seen quite a few collections of data [3, 4, 5, 6], but how complete they are and whether or not they've been maintained is unknown.

[1] https://github.com/HackerNews/API

[2] https://github.com/algolia/hn-search

[3] https://archive.org/details/HackerNewsStoriesAndCommentsDump

[4] https://ia902503.us.archive.org/33/items/HackerNewsStoriesAn...

[5] http://shitalshah.com/p/downloading-all-of-hacker-news-posts...

[6] https://news.ycombinator.com/item?id=7835605

Since the official API (https://github.com/HackerNews/API) does not support authentication, how are you collecting and storing auth credentials for login/posting?
There's API for HN (https://github.com/HackerNews/API), it's not that hard to make one yourself. This is Hacker News after all.
We offer a lot of data (in JSON and XML) via our Firebase powered API:

https://github.com/HackerNews/API

As far as I know, HN offers an official API here: https://github.com/HackerNews/API

But it specifically says "We hope to improve it over time, and may later enable access to private per-user data using OAuth." So I'm assuming that login, votes, comments, submission is for a future release of the API. That might be why the apps you use resort to scraping HN.

For reference, here's the blog post talking about the release of the API: http://blog.ycombinator.com/hacker-news-api

You're using the Hacker News Search API from Algolia (which is great), but you'll probably want to read about the main HN API:

http://blog.ycombinator.com/hacker-news-api

https://news.ycombinator.com/item?id=8422599

https://github.com/HackerNews/API

There are similar projects out there but I wanted to play with some APIS and to try out Python 3s async IO library.

Uses the Aylien text analysis api to summarize and the Hackernews api to get the articles

Code here: https://github.com/Bachmann1234/hn-tldr

http://aylien.com/ https://github.com/HackerNews/API

The issue tracker on https://github.com/HackerNews/API appears to be disabled, though the summary says "Documentation, Samples, and Issue Tracking for the Official HN API."