THERE WERE TWO talks on Captchas at Defcon 16, and they both said the same thing, Captchas are pointless and stupid. This was backed up with a lot of science and code, but the end result is the same, if you have a clue and some time, you can break them on a whim.
The easiest way to crack them is to make a table of what the Captchas are, basically get one of every possibility and look it up. Impossible? Well, PHP-Nuke uses a six-digit numerical Captcha without any injection of randomness, meaning 10^6 possibilities. since you can refresh the page and get another one, you just hit refresh and scrape the pics. Make a checksum of the result, and voila, this Captcha is a simple database lookup away.
Picking on an open source project is easy, banks would never do that, right? I mean, Paypal would never..... they use a 5 digit alphanumeric Captcha meaning 36^5 possibilities. You can refresh it too for another try. Damn secure there guys, a decent botnet can rip this apart in a few hours. Luckily they have all sorts of disclaimers in their EULA, so their problems are likely your problems.
Audio Captchas are also vulnerable to what is called Hamming distances. Using tools made for bioinformatics, you can simply look at the .WAVs and pick it appart. Livingston distances, basically fuzzy Hamming, make things a lot more accurate.
But that is the obvious ways, what if they Captcha makers have a clue, not that any seem to at the moment. Some use a set URL for a set Captcha, so you can just look at the return URL in the code to see the result. Duh.
Others have miserable random number generators. With the appropriate mathematical knowledge, you can reconstruct a 5 digit Rand() based Captcha in 12-13 samples, then it is downhill from there. Random() is a little better, but not enough to make things hard for hackers.
The next way to do things is to do use off the shelf OCR software, and that makes life very easy. The hard part is to clean up the image and train the OCR, something that can be done quite easily with simple Gimp filters. If the Captcha puts in thin lines or dots, a blur filter removes them easily. Edge finding also does wonders, as does contrast controls. Grids are fall prey to algorithmic assault, and in general, a little thinking will break most of them.
The last bastion is called cultural knowledge based Captchas, basically a 'hot or not' as a security scheme. If you are presented with four pictures, three kittens and one puppy, being able to pick out the puppy is hard to do via image recognition. Brute forcing is easy here, it is very hard to have an image library big enough to keep a botnet from downloading them all. Any image library you can buy, so can the bad guys.
In the end, Captchas are easy to break. It is just a matter of how hard it is to do, and percentages are everything. If you have a 1 in 10^5 chance of guessing the result, even a botnet isn't going to have much success pounding away. Any technique that can up things to only a 20 per cent success rate will sound stupid to a person, but 10K machines will eat that for lunch. Game over for Captchas. µ
Elon Musk brands Martin Tripp a "saboteur"
Pulling a gun on a car...oh America
Look What You Made Her Do
Store your Bitcoin. Surf the web.