What does HackerNews think of bert?

Ask HN: Why is OpenAI still called OpenAI? | Oct 2023

Expand Context ↕

OpenAI was most certainly not the first organization to develop (or distribute) an LLM model:

https://github.com/google-research/bert

https://arxiv.org/abs/1810.04805

Failing that, Seq2seq predates OpenAI's incorporation:

https://google.github.io/seq2seq/

https://github.com/google/seq2seq

https://arxiv.org/abs/1409.3215

AlphaCode Attention Visualization | Dec 2022

Expand Context ↕

> Has Google/Alphabet publicly released any of their AI models yet?

You mean NLP field changing models from Google like BERT [1]? or Transformers paper [2]? or T5 model [3] (used by company doing ChatGPT like search currently on the front page on HN)?

1. https://arxiv.org/abs/1810.04805 code+models: https://github.com/google-research/bert

2. https://arxiv.org/abs/2112.04426

3. https://arxiv.org/abs/1910.10683 code+models: https://github.com/google-research/text-to-text-transfer-tra...

The fastest tool for querying large JSON files is written in Python (benchmark) | Apr 2022

Expand Context ↕

> resulting in large programs with lots of boilerplate

That was what I was trying to say when I said "the code required to implement the challenges is large enough that they are considered too inconvenient to use". This makes sense to me.

Thank you for this benchmark! I'll probably switch to spyql now from jq.

> So, orjson is part of the reason why a python-based tool outperforms tools written in C, Go, etc and deserves credit.

Yes, I definitely think this is worth mentioning upfront in the future, since, IIUC, orison's core uses Rust (the serde library, specifically). The initial title gave me the impression that a pure-Python json parsing-and-querying solution was the fastest out there.

A parallel I think is helpful to think about is saying something like "the fastest BERT implementation is written Python[0]". While the linked implementation is written in Python, it offloads the performance critical parts to C/C++ through TensorFlow.

I'm not sure how such claims advance our understanding of the tradeoffs of programming languages. I initially thought that I was going to change my mind about my impression that "python is not a good tool to implement fast parsing/querying", but now I haven't, so I do think the title is a bit misleading.

[0] https://github.com/google-research/bert

A Primer in BERTology: What We Know About How Bert Works | Nov 2020

" BERT is a method of pre-training language representations, meaning that we train a general-purpose "language understanding" model on a large text corpus (like Wikipedia), and then use that model for downstream NLP tasks that we care about (like question answering). BERT outperforms previous methods because it is the first unsupervised, deeply bidirectional system for pre-training NLP."

https://github.com/google-research/bert

Deep Bidirectional Transformers for Language Understanding [video] | Oct 2020

This video explains a legendary paper, BERT. It leverages the Transformer encoder and comes up with an innovative way to pre-training language models (masked language modeling). BERT has a significant influence on how people approach NLP problems and inspires a lot of following studies and BERT variants.

Code https://github.com/google-research/bert (TensorFlow) https://github.com/huggingface/transformers (PyTorch)

Ask HN: I'm away from the software industry since 2017; what has happened since? | Feb 2020

GPT-2

https://openai.com/blog/gpt-2-1-5b-release/

https://talktotransformer.com/

https://github.com/openai/gpt-2

(GPT-2 docker image needs a version tweak to get it working)

and BERT:

https://github.com/google-research/bert

Google brings in BERT to improve its search results | Oct 2019

The project mentioned (BERT) in the article is really interesting, you should look into it.

https://github.com/google-research/bert

The Unreasonable Effectiveness of Deep Feature Extraction | Feb 2019

Expand Context ↕

I don't think you could get access to the actual models that are being used to run e.g. Google Translate, but if you just want a big pretrained model as a starting point, their research departments release things pretty frequently.

For example, https://github.com/google-research/bert (the multilingual model) might be a pretty good starting point for a translator. It will probably still be a lot of work to get it hooked up to a decoder and trained, though.

There's probably a better pretrained model out there specifically for translation, but I'm not sure where you'd find it.