Wednesday, April 30, 2008

the value of a reliable email archiving solution

the picture says it all really, but this is the bit i love:
" In 1994, the Clinton administration reacted to the previous year's court decision by rolling out an automated e-mail-archiving system to work with the Lotus-Notes-based e-mail software that was in use at the time. The system automatically categorized e-mails based on the requirements of the FRA and PRA, and it included safeguards to ensure that e-mails were not deliberately or unintentionally altered or deleted.
When the Bush administration took office, it decided to replace the Lotus Notes-based e-mail system used under the Clinton Administration with Microsoft Outlook and Exchange. The transition broke compatibility with the old archiving system, and the White House IT shop did not immediately have a new one to put in its place.

Instead, the White House has instituted a comically primitive system called "journaling," in which (to quote from a recent Congressional report) "a White House staffer or contractor would collect from a 'journal' e-mail folder in the Microsoft Exchange system copies of e-mails sent and received by White House employees." These would be manually named and saved as ".pst" files on White House servers. "

Sunday, April 27, 2008

a new unit: The Wikipedia

question, of 'how do people find the time to contribute to free projects?'
answer: they're watching less tv...
we now have a unit that defines the attention span of a group: the wikipedia, which is the amount of person-hours of thought invested in wikipedia, about 100 million hours. there's about 2000 wikipedia's spent watching TV each year in the US. **each year**!

Personally I don't watch very much TV anymore. I go to the movies and watch DVD's occasionally, but I try to avoid actually watching TV because I find it an almost complete waste of time (with notable exceptions like Chasers war on everything, the daily show etc).

So what do I do with all this extra time? (I used to spend 20-30 hours a week watching TV). Well, mainly i use it to hang out with Real People (tm) instead of the idiot box. I think this is a good thing. If you find yourself watching an hour or more of TV a day several times a week, then take the pepsi challenge, pack your TV up and do away with it for a month. you'll be surprised what will happen, maybe you'll reconnect with your friends or spouse or kids, maybe you'll just get the motivation to go for that exercise you've been meaning to do. try it...

Sunday, April 20, 2008

Kitten Auth Busted!

last week /. rocked to the news that bots had been able to get significant success rates cracking the CAPTCHA's that protect gmail & livemail. People half-seriously joked about replacing text-based CAPTCHA's with kitten-auth or hot-or-not-auth, despite the high success rate of even a random bot attack.

now behold! Kitten Auth Busted! We only reveal this video of a live Kitten Auth CAPTCHA being broken now that the author of Kitten Auth has been notified of the vulnerability and the hole plugged. Previously there was no variation in the hash of the same image between multiple displays, so if a human being taught the bot to recognise different animals, it could then recognise and correctly categorise the cute furry animals to get past kitten-auth and spam the hapless creator with viagra ads & offers of penis enlargement.

The vuln has now been fixed with every image containing a random number embedded, which makes it impossible to recognise an image by it's hash alone.

The exploit was written in Javascript and run from inside GreaseMonkey, but not by me, by the anonymous uber-h4xx0r, captain meat...
(note the video has been sped up, the first section is training the bot, after which it begins spamming constantly until disabled)

Getting things done

One of the talks at barcamp was on Getting Things Done, and they used a variation of the above flowchart to demonstrate a workflow for personal items. I haven't read the book yet (although it is on my toread list...), but the flow used in GTD above made me think about my own personal paradigm for dealing with todo's.

So, work first. Currently we use Jira, just cos it rocks, and that takes care of assigning tasks among our team, recording requirements, work done, progress comments etc. However I find that the best way for me personally to deal with tasks is to make a pipeline of tasks in my immediate horizon (which is a subset of my currently assigned tasks), and record them in a low-tech format, currently a plain-text document called todo.txt which I leave open in textpad (textpad is never closed in my session ;-p ), ordered by descending priority.

Then under each item I add a breakdown of the step-by-step tasks (down to an hour or two each) that I'll need to do to complete that task. As I complete sub-items I delete them from the task (for greater satisfaction they can be moved to a separate list at the bottom of the page of down things, that way you can look back over what you've done, feels good.). This gives a really fast way to track immediate tasks as well as a handy scratch pad for dumping text. you don't get a lot of semantic tools, but if you're like me there's some mornings where you come in and you're like "what the hell am I supposed to be doing?". keeping todo.txt helps me answer that question and push my context back onto the stack.

For my personal life, the usual answer is that I don't keep todo lists (although I know I should...). When I do though (usually when there's lots to get done and time-constraints requiring juggling), I just use the old pen & paper. I write three headings, short term (today or this week), medium term (next few weeks or months) and long term (one year to lifetime). Then I just brain-dump all the things I currently want to do under these headings. Sometimes I need to break it out a bit if there's a goal I want to achieve that doesn't fit into one of those timelines.

The short term goals are the most satisfying. when you can get home at the end of a day and tick off all the items on your list, that's sex on wheels man. Medium term goals help focus the short-term goals and provide fodder for further short-term goals. Long term goals are more strategic and tend to reflect values, eg - getting married, having kids, becoming financially independent. I really should make them less fuzzy and more measurable though...

Measurable outcomes is key. This is easy with short-term goals, but with medium- and long-term goals it gets harder. You need to keep the pressure on by referring back to the goals periodically otherwise you can't measure your performance (or lack thereof) against them, and likely you're just spinning your wheels as time goes by. It's a good idea to make these goals highly visible and stick them on your wall/mirror/door etc. when I was at uni I had my unit-per-year plan in the front of my folder and avidly crossed out the units as I completed them. Big feeling of satisfaction!

A few comments on how all this relates to GTD. I think the work flow above is very similar to the process of information-sifting that any decent geek does almost without thinking, that is, to cast the net as broad as possible and then filter it for "interesting" items to either action now, later, or to pass on to another. It's good to think about all the things you could/should do, but then you have to prune to the things that you realistically can do, and then prioritise actually doing them. This helps triage between things that you vaguely think would be a good idea, and the things that you actually should do.

And if you keep on getting things done, then you're guaranteed to be better off than if you hadn't done anything at all, and ideally this should build momentum in your personal life for getting the things that you want.

Lastly, don't forget to add "do nothing" to your immediate todo list every now and then... :-)

Thursday, April 17, 2008

musings on the latest in CAPTCHA technology...

with the recent reports of bots being able to crack the CAPTCHA's on windows live and gmail, the cutting edge in human recognition has shifted to the next generation of image recognition combined with advanced heuristics.
the first example is kitten auth, one example of using an AI hard problem to defeat captcha bots, using a database of cute furry kittens to detect humans:
also good is hot captcha, backed by the hot or not database
both of these schemes use a grid of 9 images, 3 of which are of the target (either hot, or a kitten, depending on auth scheme). Using the combinatorial formula 9 choose 3 where ordering isn't relevant, we get a random probability of choosing the correct 3 images of 1 in 386 (point who cares...). So if you combine hotcaptha or kitten auth with an IP-blocking scheme where say, you have a pool of blocked IP's, after three failed attempts you add that IP to the pool which expires after a given timeout (half an hour?).
Then a random bot will have a 1 / 128 chance of cracking your captcha, a success rate of less than 1%, which is probably enough for most spam bots, but better than the 15% sucess rate for the latest round of captcha bots against OCR captchas.
To attack this captcha, it would help if you had... , oh say a massively distributed botnet at your disposal and some kind of distributed multimedia database to record the random correct categorisation of images to improve your hit rate. you would end up duplicating the multimedia database of the target captcha, but once you did your hit rate would climb towards 100%.
So at the end of the day even these kinds of captchas are vulnerable, but would be good as a failover from your regular captcha mechanism in the event it gets broken...
what we really need is some kind of open-ended input based on an image that relies on human "common sense". eg - a picture of george bush where 'miserable failure', 'ass-muppet' or 'clown' would be valid inputs to verify a human being. We're stuck in an arms race, and as long as spam is profitable spammers will be paying crackers to do AI...
(note: this is all written with tongue very firmly in cheek... ;-p )