What does HackerNews think of base2048?

Binary encoding optimised for Twitter

Language: JavaScript

Adding to this here [1] is a chart that shows how several encodings affect size ratios of utf-8, 16 and 32. Of course that gets into the discussions [2][3] of using utf in passwords and I have no idea how many password schemes support this beyond what I use.

[1] - https://github.com/qntm/base2048

[2] - https://security.stackexchange.com/questions/85663/is-it-a-g...

[3] - https://security.stackexchange.com/questions/4943/will-using...

They mention base64 encoding messages to evade filters. There were actually other base{n} methods [1] created specifically for Twitter to be more space optimized though not as readily available to operating systems. I guess this is less useful if they are really expanding the text limit to 4k soon but figured I would add it in the event they add a parser for base64.

This assumes most people would be able to:

    npm install base2048
I think it might be interesting if Mastodon added a function to automatically detect and decode base2048 for browsers in javascript since JS is required to view a Mastodon site anyway. Bots would then have to adopt this logic, rendering most bots useless until they adapt and evolve. But I am not a developer and maybe this is just not possible.

[1] - https://github.com/qntm/base2048

See also https://github.com/qntm/base2048. "Base2048 is a binary encoding optimised for transmitting data through Twitter."
I think the last line is a replay, encoded in base2048: https://github.com/qntm/base2048

But when I tried your replay, it didn't work. Perhaps HN's text editor mangled something.

There's base2048 [0], which can cram 11 bits into each code point, or 110 bytes into 80 characters.

[0] https://github.com/qntm/base2048

This is what you're looking for:

https://github.com/qntm/base2048

It can store 385 bytes per tweet. This link includes a bit more technical explanation of how Twitter counts characters towards the limit. Apparently, using the entire range of unicode characters does not improve compression because of the double weighting of emojis and other characters as described in TFA. It links to a base131072 encoding which can only store 297 bytes per tweet.

There are several of these. base65536 is the one that seems to pop up the most often on HN, although base2048 is more useful for Twitter. On the GitHub page the dev helpfully links to the various implementations: https://github.com/qntm/base2048
Or you can use base2048 [1] to compress it down to 3 tweets (4175 nucleobases * 2 bits per nucleobase / 3080 bits per base2048 tweet = 2.7 tweets).

[1] https://github.com/qntm/base2048/

Superseded by Base2048 [1] by now.

[1] https://github.com/qntm/base2048