What does HackerNews think of duckling?

Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.

Language: Haskell

> There are fantastic deterministic libraries out there that can turn strings like "next Tuesday" into timestamps with extremely high accuracy.

Which libraries? I know of Duckling [0] but what others?

0. https://github.com/facebook/duckling

For the reasons others have said I don't see it replacing 'traditional' scraping soon. But I am looking forward to it replacing current methods of extracting data from the scraped content.

I've been using Duckling [0] for extracting fuzzy dates and times from text. It does a good job but I needed a custom build with extra rules to make that into a great job. And that's just for dates, 1 of 13 dimensions supported. Being able to use an AI that handles them with better accuracy will be fantastic.

Does a specialised model trained to extract times and dates already exist? It's entity tagging but a specialised form (especially when dealing with historical documents where you may need Gregorian and Julian calendars).

[0] https://github.com/facebook/duckling

It also powers the backend of Wit.ai which FB owns. Wit's open-source entity parser, duckling, is written entirely in Haskell. https://github.com/facebook/duckling
> It appears that the intent is to focus on pain points in the Haskell toolchain and libraries.

Good. I set myself the challenge of compiling a Haskell program [1] during the Christmas holidays. It was meant to be a "one mince pie" challenge, but after an hour I discovered the VM I used didn't have enough RAM (during compilation we were approaching 4GB), then I ran out of disk space as stack approaches 5GB & I had other stuff installed. Once a few hours had gone by (this program isn't fast to compile) I had a working program. I now have to figure out if I can distribute just the resulting binary to other servers, or if it needs other software like GHC installing. Having finished the pack of mince pies, that can wait to another day.

I know when I first started compiling C/C++ software there was a learning curve and it took hours the first time, but I found it easier to get started. With Haskell, the way one version of GHC is installed first and then Stack installs a completely isolated version is confusing; plus the inscrutable error messages (haven't got it to hand, but one means OOM but doesn't say that - it takes a Google to find the GitHub issue to work that out).

And this is before I try and experiment/decide to learn some Haskell. Apart from the error messages they're not issues with Haskell per se, but they contribute to the experience of it.

1. https://github.com/facebook/duckling

There are two parts to this: (1) labeling something as a date or time and (2) normalizing it to a time stamp. The first part is the tagging. The second part is temporal normalization.

There are several libraries for temporal normalization:

- Duckling: https://github.com/facebook/duckling - JChronic: https://github.com/samtingleff/jchronic - There's also Chronic (Ruby version that jchronic was made from).

Stanford NLP and SpaCy also do tagging: - https://github.com/stanfordnlp/stanfordnlp - https://spacy.io/usage/linguistic-features#named-entities

Edit: Stanford NLP does not do temporal normalization. Added SpaCy

> Everything in NLP nowadays involve ML.

Some really nice projects do NLP without using ML at all, for instance Duckling [1] (a library made by facebook to find entities in a text) works a 100% with parsing rules, and is surprisingly efficient.

I agree with your point though, most of the time there is ML at some point in your pipeline so you can't really avoid learning it !

[1] https://github.com/facebook/duckling

Facebook Duckling https://github.com/facebook/duckling

rewritten from clojure, used to power smart products at facebook