What does HackerNews think of fastText?
Library for fast text representation and classification.
- accuracy generally a few points better than NB
- better generalisability due to the word embeddings acting as a bottleneck on expressiveness compared to NB or logistic regression which essentially model all words / bigrams as independent
- trained with cross-entropy, meaning that model scores can be used more effectively as a 'confidence' - e.g. for spam if you want to say something like "if prediction score > X, then filter", Naive Bayes is not ideal due to the 'naive' assumption which makes the scores very un-calibrated (it tends to give extremely high or low confidence scores, which gets worse with document length).
- is completely linear (or at least log-linear like NB), so explainability is super simple.
disclaimer: I haven't really thought about NLP for about 3 years so there may be something better than this now
Fasttext is also available in the popular NLP Python library gensim, with a good demo notebook: https://radimrehurek.com/gensim/models/fasttext.html
And of course, if you have a GPU, recurrent neural networks (or other deep learning architectures) are the endgame for the remaining 10% of problems (a good example is SpaCy's DL implementation: https://spacy.io/). Or use those libraries to incorporate fasttext for text encoding, which has worked well in my use cases.
It's worth noting for future reference that in terms of supervised learning of labels given a text document input, fasttext (https://github.com/facebookresearch/fastText) is leagues ahead of conventional approaches in both accuracy and training speed, and there is a Python interface (https://github.com/salestock/fastText.py) for use with Django/Flask (unfortunately, recent fasttext changes have broken the interface for now).
The world owes a big THANK YOU to Tomáš Mikolov, one of the creators of Word2Vec[0] and fastText[1], and also to Radim Řehůřek, the interviewer, who is the creator of gensim[1].
The number of software developers and researchers in industry and academia who rely on the work of these two individuals is large and growing every day.
[0] https://code.google.com/p/word2vec/
The big result here is the 15,000x speedup compared to a neural network, and which increases as the size of the dataset increases. But this doesn't mean neural networks are worthless. From the paper:
Although deep neural networks have in theory much higher representational power than shallow models, it is not clear if simple text classification problems such as sentiment analysis are the right ones to evaluate them.