twuuenc the uuencode of web2.0

01 Jun 2009

Ataraxia Consulting


For far too long now, you have been limited by those 140 characters of microblogging sites like twitter. I present you with twuuenc, take your tweet that is longer than 140 characters and stuff it into fewer unicode characters.

Take for instance the beginning of the Gettysburg Address: Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.

That’s a total of 175 characters, twitter just won’t have it. But if you run it through twuuenc you get:

ɹҺξɌǶᖜᘫᓿϧѶ✢Εǚᖣ¼Ј|ᓅɋöᖛӪ❡Ѯ9ᒡ✲ᐑᖨᙃ¥ѰҺΨᙦɖǞᗍʼО;◆ɌɺǪᙃ¦јѺĖ◐ΑȕөŤѐƢᘟ◐ýǪᗌᘋŇğ◆ᘕ˼ᖸᒋӽШѥÛᙵŶᖨ᙮ŰŇ{ᒧɋɸᕰᔋоҀƞ✲❼ȖᓫᓿѠᑺ◊◡ǹᖰᔌ❚ŇᓹĆ✮Pᖐᕣ°ЈѥĆ✒˘ǒᘫᕀҀƞ`ᘶýᕤᕤᗟ

That tweet is only 128 characters (130 with markers), there’s still 12 characters left for you to insert a wise crack!

What’s better is that this allows you send binary data over twitter, imagine the possibilities! Just for starters, let’s add some additional compression to the address through Zlib

ңᙦ˥ᖐąᒟǐТ;ӤУᗔ○Õ¢Ñ◀ᓫӆᕪ✼ᙜ❉ӆʃ%ᗽᐉᘰҫᑻǧɪȰŠᖔᙱ¯ķ▭ᑜᔨɳĺᘗŐ◇ᔇђʗᑵËᘎᙁᑶʓǿ⟘nΗˀЧЏȜ➒⟘Ɲᖁʉᘧ➳ːȰ✩❈❢✜¨Ȁѣ➫ᗧңᔯᓞϓ❯ᒑȰ❰˒Ӝń

Now our tweet only takes up 93 characters (95 with markers), You have a full 47 characters to be clever!

You can also optionally include the markers around the message to signify that the following message should be twuuenc decoded. A message wrapped in ☹ means it is twuuenc encoded but not compressed; while a message wrapped in ☺ means it’s encoded and compressed with zlib.

The alphabet twuuenc uses only has 2048 characters, but if you can get that up to 4096 that’s another whole bit you can store per character in your tweet.

You can find the source for the encoding and decoding here with an MIT license. The code relies on http://code.google.com/p/python-bitstring/ which is also in the git repo (similarly licensed).