Archive for the 'Web' Category

Ubiquity

A mate of mine working in the US sent me a link to a Mozilla Labs tool called Ubiquity.

Essentially, the idea behind it is to let users describe what they want to do with the Internet rather than where they want to go.  The web at the moment is really based around sites that you have to actually visit to be able to use the information.  Their idea is to skip this visiting stuff and let the tool (or commands written for it) do all the work for you, giving you only the information you were after.  They have a really good description on their blog post (linked above), and I can’t really think of a better way to describe it than them (which is probably good), so have a read if you’re interested.

It’s really lightweight and very easy to install, but at the moment it only works for Firefox.  It’s also surprisingly easy to write new commands.  I’ve been a .Net kid for a while now but it only took me a couple of hours to hack up a new command.  Admittedly the Ubiquity language is Javascript so it’s not terribly unfamiliar.  Still, that’s a pretty short learning curve.

Commands have the ability to give you a realtime ajax preview before you actually execute the command.  In most cases, I don’t even use the execute functionality and I suspect that for the small things, most people won’t.

Anyway, when I was trying it, the built-in “define” command wasn’t working for me.  It would look up a word when you pressed enter to execute the command, but it didn’t give me a preview.  So, in order to get a nicer dictionary lookup and at the same time try out this funky little thing, I decided to create my own.  Of course once I’d got mine working, the define command started working properly.  I like mine better - it gives prettier results.  It’s not just because I made it myself… but that’s the main reason.

You can go here to have a look at it.  Subscribe to it if you want to use it or hey, just steal the code or whatever, I don’t care, it’s public license - do what you want.

So in summary, it’s easy to use and I think it has the potential to be something really big.  If you have a look at the examples of what they’re ultimately trying to be able to do, it’s pretty impressive.  Imagine typing “find restaurants near me with reviews over 4 stars” and being given a google map with markers and summarised reviews of each.  Very useful, no?

Hmm, combine that with voice recognition and you’ve got a genuine futuristic computer from the movies!  Rad.

Google Streetview in Australia

So I discovered that Google Streetview is now available in Australia.

For example, here’s the Queen Street Mall in Brisbane.

I’m fairly blown away by the sheer number of streets that are covered to be honest.  This map gives you an idea, but once you get to the capital cities, you can see just how dense the coverage is.

3D CAPTCHA

In a previous post, I talked about the limitations with CAPTCHA systems and proposed a partially-automated turing test to keep non-humans out.

I had a few discussions about this with a friend and he was more interested in the CG Kittens idea I very briefly alluded to in the main text. To summarise, the idea was that if the kittens in KittenAuth were 3D models rendered on the fly, you’d get an essentially unlimited number of images.

Well someone has gone and done something very similar. On a small site called SpamFizzle, there’s a description of a 3D CAPTCHA design that renders simple objects and asks the user questions about them. It looks like a great idea to me. It could require some fine-tuning, but I think the premise is sound.

I’d love to credit the author some more, but there are no details on that page. I presume it’s the same author as the only other page on the site, Michael G. Kaplan.

Regardless, it’s a great idea and worth a read if you’re interested.

Translink Fail

The Queensland Government recently introduced the Go card to provide a single intelligent ticketing mechanism for (almost) all public transport in South East Queensland.

The technology was developed by Cubic Transportation Systems and similar cards are in use all over the world.  The idea is when you get onto a bus or a train or anything, you touch your Go card to the sensor.  When you get off, you touch it again and the appropriate amount of money gets debited from your card balance.  Presuming it works, it’s a sensible system in my opinion.

Despite catching a bus to (not from) work nearly every working day, I had originally avoided the new system for a few reasons.  The main one was that it provided no financial benefit to me.  There’s a refundable deposit that’s payable when you buy a card, and the cost of an individual one-way ticket was the same whether you used the card or paid cash on the bus.  Discounts only came when you used it more than 6 times in a week.  I very, very rarely travel by bus more than half a dozen times a week.  In early August however, the fares will come down for the Go card only.  This makes it more attractive, so I went to purchase one.

The TransLink website provides an online web ticketing service that lets you purchase a card online.  Presumably they send it out to you but I didn’t get that far because frankly, I was too scared.  Let me show you.

After a couple of short screens asking you about the type of card you want, you come across this screen (click to enlarge):

TransLink Online Web Ticketing - First Screen

Notice the “Billing Account Question” at the bottom.  There’s no more information on what this is for, but I presume it’s some kind of verification question you have to answer in order to make payments or maybe changes to your billing details.  The default question is, “What is my name?”.  That’s probably the worst security question I’ve ever heard! Ok, I’m generous, so I’ll give them the benefit of the doubt here and assume that this isn’t used for anything important.  You can change it anyway, and if you’re sensible, you probably will.

Let’s look at the next screen:

TransLink Online Web Ticketing - Second Screen

The first thing I noticed was that there was another “Cardholder Question”.  Is this different from the other one?  Again, there’s no help available to tell you what it’s for.  At least the question is slightly more difficult to guess this time.  I wasn’t terribly concerned at this point, so I continued.

Here’s the next screen:

TransLink Online Web Ticketing - Third Screen

Now I’m quite concerned.  Firstly, it appears that despite this being a Queensland Government website, I’m suddenly being charged in pounds.  On one of the first screens, I was told that the charge was $5 so I could probably assume that they just got the currency symbol wrong, but this is a big deal.  What if I am going to end up paying the equivalent of just over $10? I had a look at the address bar to make sure I was still in the right place, and yes, it’s an Australian domain.  I’m growing more and more reluctant to sign up to this thing.  Of course by this stage, I’ve already given them my credit card details, and who knows whether they’ve been stored.

So next, I clicked on the terms and conditions link at the bottom of the page.  Here’s what the pop-up window said:

TransLink Online Web Ticketing - Terms and Conditions

So that’s it.  I’m done.  No way I’m going to buy online using a credit card from a site with that many problems. The other thing that the terms and conditions error showed me was that they appear to be using Lotus-Domino version 4.6.7aThe current stable version is version 8.  And does that “a” indicate an alpha version?  The Wikipedia page on Lotus Domino doesn’t even recognise the software before version 5, and the page on Lotus Notes suggests that version 4.6.7 was released sometime prior to 1999.  I’d hate to think what kind of exploits could be carried out on that server.  Colour me scared.

Now, I’m sure I could have continued on my merry way, bought the card, and everything would have worked out fine, but I wasn’t convinced that the transaction would work or even that my information was safe.  SSL or no, the currency problems and the information gathered from that error page just scare me too much.

To be honest, I’m not sure I’m comfortable buying the card at all any more.  The cards have to be registered, so I assume I have to give them some kind of personal information.  With web software that old, I simply can’t trust that it’s safe.

I certainly hope they sort all this out soon if they plan to decommission their other ticketing options.

Damo

Doomed from the start

There’s no shortage of people lambasting the recording industry for keeping their business model firmly behind the times.  The ability to download music online seems to have presented the industry with a challenge it simply can’t accept.  They have a hammer that’s been serving them well for a very long time, and by god they’ll bash away at this problem even though it no longer in any way resembles a nail.

I’m not going to rehash the arguments or the history here, but needless to say, the recording industry has very firmly clung to the the premise that no matter how much money you pay, they must retain control over what you do with that music.

Enter a new business model that is doomed from the start.  Lala.com presents a model that is essentially a rental scheme.  I found out about this site via Slashdot and Michael Robertson.  Lala has a large number of songs on it (over 5 million) that you can search, listen to from start to finish once, and then add to your playlist for 10c.  Once the song is on your list, you can play it whenever you like.

Here’s the catch: you can only listen to the song via the Lala website.

Ok, that’s not entirely true, you can (sometimes) pay more money and buy the track in mp3 format, but the new business model Lala is going for is clearly listening to your 10c tracks via their website.  They get to control everything because you stream the music.  You can’t put it on any devices or burn it to CD.

So why is this doomed from the start?  To be fair, I’m sure there would be some people who would be happy to pay such a small amount of money to be able to listen to their music from any Internet-connected computer, but I’m not one of them.

I generally listen to music in several places using different devices.  In the car, I’ll listen to music on the radio, on CD, or on my iPod.  At the gym, I’ll listen to their music or my iPod.  At parties, I’ll listen to music from a stereo via an iPod, CD, or yes, a computer.  And sure, sometimes I’ll listen to music at home or at work from a computer.  The vast majority of my music-listening is done via a little plastic disk or a little portable music device.

Now, there’s some criticism of the restrictions forced upon you from online music stores like iTunes.  iTunes lets you download music, but the files contain DRM that restrict what you can do with it.  According to the website, you’re allowed to burn it to CD as many times as you like, and copy it to as many iPods as you like.  You can only put it on up to 5 computers though.  Not much of a restriction - who has more than 5 computers?

Lala on the other hand won’t let you burn to CD and won’t let you put a track on a music device, but they’ll let you play it on as many computers as you like (via their website).  For me, that means I can no longer listen to my music in the car, at the gym, or anywhere without an Internet connection.  That’s a deal-breaker for me, and I’m sure for a lot of people.

The argument is between consumers who want to be able to purchase something without strings, and record companies who want to control their rights and insist that you’re only buying a license to play the music they still own.  Launching a website that tightens control over what you can do with the music is stupid.  It’s giving people the exact opposite of what they want.  Mark my words, this idea will fail, and it will fail hard.

Update: Ars Technica, Wired, and CNet have written about this beta of Lala.com now as well.  The general feeling seems to be similar to my own - they’re unsure of the 10c streaming model.  Also, see the comments section of this post for a reply from one of Lala.com’s employees.

New Template

If you’ve visited here before, you’ll have noticed that the template I’m using has changed.

While the old one looked good to me for a while, the more I looked at it, the more I thought it just looked too busy.  There were borders all over the place and gradients and background textures and all sorts of things.  It was too much.

So I’ve gone the minimalist approach and have chosen a template that’s very simple.  I hope you like it.  I think it makes the posts easier to read.

Damo

CNN T-Shirts

From a few places comes news that CNN is selling t-shirts with news headlines on them.  An interesting/unusual idea, sure, but they’ve done one thing very wrong.

The text on the shirt is part of the URL.

All of which means I can presumably order (but at least create an image of) a shirt that says, “CNN slept with your mum” or “I like short shorts”.  The shirts come with a line underneath the headline that says, “I just saw it on CNN.com” and the date of the news story.  A person could have fun with that…

Now I’m all for RESTful architecture when it’s appropriate, but there should really be some checking to make sure the headline exists…

Damo

CAPTCHA is Dead, Long Live PAPTCHA?

Slashdot today carries a link to a story claiming that the CAPTCHA algorithm for Hotmail (or Windows Live Hotmail or whatever it’s called now) has been defeated by a spambot and the exploits have started.  So that’s Gmail, Yahoo Mail, and now Hotmail.

CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) is a great idea, but if it doesn’t work, then it doesn’t work.

CAPTCHAs were developed to tell humans apart from software.  They’re essentially a Turing Test across a very limited domain, and because of the limited domain, they’re much easier to attack.  In the case of a standard warped-text CAPTCHA, the attacker knows that the challenge will be an image with a certain number of letters and/or numbers, and that it will be warped in one or more ways.  The software can be written with this in mind.  Additionally, even if there is only a miniscule success rate, it’s often worthwhile for a spammer, particularly if attempts can be automated and run several times a second.

So what’s the solution?

Slashdot made a tongue-in-cheek reference to Kitten Auth, suggested in 2006.  It may have been a playful suggestion, but I think they’re on the right track.  Kitten Auth basically presents the user with a number of pictures of cute fluffy animals, and tells the user to select all the kittens.  The premise is the same as the text-based CAPTCHAs - easy for humans, hard for computers - but it doesn’t use text, making OCR useless.

Something like Kitten Auth could work as long as there’s no predictability.  If the same images are repeatedly used, a brute force attack would work.  If you needed to select three kittens out of nine pictures, all you need is one random success and bam, you have copies of three images that are kittens.  Given enough time, the software could learn enough images to be viable as a solution.

Alternatively, if OCR can be trained to learn letters and numbers that are very warped and modified, then why not pictures of kittens?  It’s harder, sure, but if we mere mortals can tell a kitten apart from a possum, then why not a computer? These spammers and malware authors are pretty determined you know.

So what else?

Maybe the problem with CAPTCHAs is the “CA” part.  Completely Automated.  What about PAPTCHA? Partially Automated. Sure, it ruins the contrived acronym, but it might be more effective.

Arguably, Kitten Auth is already an PAPTCHA.  The pictures of kittens can’t really be completely automated unless there are 3D models of kittens rendered from different angles with different lighting each time… hmm… that’s an idea… but I digress.

If Microsoft and Google and Yahoo were to put some effort into changing their “PTCHA” regularly, by real people, maybe there’s a solution.

Here’s how it could work:

  • Twenty people, armed with cameras, walk the streets for a few hours taking photos of random objects or scenery.
  • They get back to the office and upload the photos to today’s collection.
  • They link each photo to some standard questions (e.g. “what is the main object in this photo?”) and provide acceptable responses.
  • They provide additional specific questions for each photo (e.g. “How many white horses are there in the field?”) and provide acceptable responses.
  • One or more other staff members look at the photo and each question for quality control.  They can add more acceptable answers, remove them, or reject photos or questions outright.
  • Photos are retired after a time to prevent them being learned.

As a very rough estimate, I’d expect that a person would be able to add at least fifty photos with ten questions each every day.  With 20 people, that equals 10,000 new PTCHAs every day - 50,000 per working week. Surely that’d be enough.  Is 20 people too many?  Even with five people you’d have 12,500 new challenges every week.  If you expire the questions after a month, you’d still have an incredibly large number to choose from.

Current CAPTCHAs effectively have an infinite number of possibilities, however they’re still in a narrow domain.  By expanding the domain to include any question about any photo, there’s no pattern to learn - no possible algorithm to solve the problem.

Is it foolproof?  Definitely not.  However, I’d suggest that implemented properly (and that means a lot of QA), it would be a lot harder to break than current CAPTCHA methods.

There could be a business in this you know… I’d be interested to know what you think!

Damo

Edit: I’ve been having a discussion with a friend of mine who has outlined exactly why 50,000 new challenges per week is not enough.  In short, if x people are creating these challenges, then some fraction of x can be employed to decipher them (answering is quicker than asking).  The answers get added to a massive database along with copies of the images, and there’ll be enough solutions saved to give some malicious code a decent success rate.  If the image and question match one in the database, then the answer will be there.

Repetition of challenges is therefore a significant problem.  A challenge that presents an “image and question” that is repeated every 200,000 requests (4 weeks of 50,000 per week) is far too repetitive.  If the malicious code runs one request every fifteen minutes on 1,000 nodes, you’d have seen every challenge in just over 2 days.

So to overcome this, here are some ideas:

  • Use existing CAPTCHA technology such as warping the question text and putting it directly on the photo in a semi-random place.  You’d get no exact repeats.  The obvious problem is that this may still allow a malicious program to recognise sections of the photo that haven’t been altered.  With every photo and answer saved, there’s still a one in ten chance (given 10 questions per photo) of getting the question right.  Very unacceptable.
  • Warp not only the text, but the image as well.  Obviously it’d still need to be recognisable, so overlaying a random, semitransparent pattern or something might be all you could do.  It might be enough to slow down matching of the image though.
  • Include a bevy of questions that bear no relation to the image.  These could be added to any of the images.  For example, you could have a picture of a field of horses which renders with the question, “How many legs are most people born with?”

So now I have a system where a modified image is rendered with an overlayed warped-text question which may or may not have anything to do with the image.

Of course all I’m really doing is adding complexity, but as long as it’s complex enough to withstand attacks for the length of time it’s used (one month in my example), it should work.

My other suggestion, the CG kittens, got more interest.  In this case, there would be essentially no repeated images.  You’d probably only need a handful of animal models with a few variables set at random to make it feasible.  Perhaps fur colour, lighting, camera position, and some posture or face variables.