The History Of CAPTCHA- Characteristics, Inventorship claims, Relation , Accessibility, paid services, CAPTCHAs schemas, Notable attacks, NB Inspire
The term was coined in 2003 by Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford. The most common type of CAPTCHA (displayed as Version 1.0) was first invented in 1997 by two groups working in parallel. This form of CAPTCHA requires someone to correctly evaluate and enter a sequence of letters or numbers perceptible in a distorted image displayed on their screen. Because the test is administered by a computer, in contrast to the standard Turing test that is administered by a human, a CAPTCHA is sometimes described as a reverse Turing test.
This user identification procedure has received many criticisms, especially from people with disabilities, but also from other people who feel that their everyday work is slowed down by distorted words that are difficult to read. It takes the average person approximately 10 seconds to solve a typical CAPTCHA.
History
Since the early days of the Internet, users have wanted to make text illegible to computers. The first such people were hackers, posting about sensitive topics to Internet forums they thought were being automatically monitored on keywords. To circumvent such filters, they replaced a word with look-alike characters. HELLO could become |-|3|_|_() or )-(3££0, as well as numerous other variants, such that a filter could not possibly detect all of them. This later became known as leetspeak.
One of the earliest commercial uses of CAPTCHAs was in the Gausebeck–Levchin test. In 2000, idrive.com began to protect its signup page with a CAPTCHA and prepared to file a patent[5] on this seemingly novel technique. In 2001, PayPal used such tests as part of a fraud prevention strategy in which they asked humans to "retype distorted text that programs have difficulty." PayPal cofounder and CTO Max Levchin helped commercialize this early use.
A popular deployment of CAPTCHA technology, reCAPTCHA, was acquired by Google in 2009. In addition to preventing bot fraud for its users, Google used reCAPTCHA and CAPTCHA technology to digitize the archives of The New York Times and books from Google Books in 2011.
Inventorship claims
Two teams have claimed to be the first to invent the CAPTCHAs used widely on the web today. The first team with Mark D. Lillibridge, Martin Abadi, Krishna Bharat, and Andrei Broder, used CAPTCHAs in 1997 at AltaVista to prevent bots from adding Uniform Resource Locator (URLs) to their web search engine. Looking for a way to make their images resistant to optical character recognition (OCR) attack, the team looked at the manual of their Brother scanner, which had recommendations for improving OCR's results (similar typefaces, plain backgrounds, etc.). The team created puzzles by attempting to simulate what the manual claimed would cause bad OCR.
The second team to claim to be the first to invent CAPTCHAs with Luis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford, first described publication of CAPTCHAs in a 2003 and subsequently received much coverage in the popular press. Their notion of CAPTCHA covers any program that can distinguish humans from computers.
The controversy of inventorship has been resolved by the existence of a 1997 priority date patent application by Eran Reshef, Gili Raanan and Eilon Solan (second group) who worked at Sanctum on Application Security Firewall. Their patent application details that "The invention is based on applying human advantage in applying sensory and cognitive skills to solving simple problems that prove to be extremely hard for computer software. Such skills include, but are not limited to processing of sensory information such as identification of objects and letters within a noisy graphical environment". Lillibridge, Abadi, Bharat, and Broder (first group) published their patent in 1998. Both patents predate other publications by several years, though they do not use the term CAPTCHA, they describe the ideas in detail and precisely depict the graphical CAPTCHAs used in the Web today.
Characteristics
CAPTCHAs are, by definition, fully automated, requiring little human maintenance or intervention to administer, producing benefits in cost and reliability.
The algorithm used to create the CAPTCHA must be made public, though it may be covered by a patent. This is done to demonstrate that breaking it requires the solution to a difficult problem in the field of artificial intelligence (AI) rather than just the discovery of the (secret) algorithm, which could be obtained through reverse engineering or other means.
Modern text-based CAPTCHAs are designed such that they require the simultaneous use of three separate abilities—invariant recognition, segmentation, and parsing—to correctly complete the task with any consistency.
*Invariant recognition refers to the ability to recognize the large amount of variation in the shapes of letters. There is an overwhelmingly large number of versions of each character that a human brain can successfully identify. The same is not true for a computer, and teaching it to recognize all those differing formations is a challenging task.
*Segmentation, or the ability to separate one letter from another, is also made difficult in CAPTCHAs, as characters are crowded together with no white space in between.
*Context is also critical. The CAPTCHA must be understood holistically to correctly identify each character. For example, in one segment of a CAPTCHA, a letter might look like an "m". Only when the whole word is taken into context does it become clear that it is a u and an n.
Each of these problems poses a significant challenge for a computer, even in isolation. The presence of all three at the same time is what makes CAPTCHAs difficult to solve.
Unlike computers, humans excel at this type of task. While segmentation and recognition are two separate processes necessary for understanding an image for a computer, they are part of the same process for a person. For example, when an individual understands that the first letter of a CAPTCHA is an a, that individual also understands where the contours of that a are, and also where it melds with the contours of the next letter. Additionally, the human brain is capable of dynamic thinking based upon context. It is able to keep multiple explanations alive and then pick the one that is the best explanation for the whole input based upon contextual clues. This also means it will not be fooled by variations in letters.
Relation to AI
While used mostly for security reasons, CAPTCHAs also serve as a benchmark task for artificial intelligence technologies. According to an article by Ahn, Blum and Langford, "any program that passes the tests generated by a CAPTCHA can be used to solve a hard unsolved AI problem."
They argue that the advantages of using hard AI problems as a means for security are twofold. Either the problem goes unsolved and there remains a reliable method for distinguishing humans from computers, or the problem is solved and a difficult AI problem is resolved along with it. In the case of image and text based CAPTCHAs, if an AI were capable of accurately completing the task without exploiting flaws in a particular CAPTCHA design, then it would have solved the problem of developing an AI that is capable of complex object recognition in scenes.
Accessibility
CAPTCHAs based on reading text — or other visual-perception tasks — prevent blind or visually impaired users from accessing the protected resource. However, CAPTCHAs do not have to be visual. Any hard artificial intelligence problem, such as speech recognition, can be used as the basis of a CAPTCHA. Some implementations of CAPTCHAs permit users to opt for an audio CAPTCHA, though a 2011 paper demonstrated a technique for defeating the popular schemes at the time.
For non-sighted users (for example blind users, or color blind people on a color-using test), visual CAPTCHAs present serious problems. Because CAPTCHAs are designed to be unreadable by machines, common assistive technology tools such as screen readers cannot interpret them. Since sites may use CAPTCHAs as part of the initial registration process, or even every login, this challenge can completely block access. In certain jurisdictions, site owners could become targets of litigation if they are using CAPTCHAs that discriminate against certain people with disabilities. For example, a CAPTCHA may make a site incompatible with Section 508 in the United States. In other cases, those with sight difficulties can choose to identify a word being read to them.
While providing an audio CAPTCHA allows blind users to read the text, it still hinders those who are both blind and deaf. According to sense.org.uk, about 4% of people over 60 in the UK have both vision and hearing impairments. There are about 23,000 people in the UK who have serious vision and hearing impairments. According to The National Technical Assistance Consortium for Children and Young Adults Who Are Deaf-Blind (NTAC), the number of deafblind children in the USA increased from 9,516 to 10,471 during the period 2004 to 2012. Gallaudet University quotes 1980 to 2007 estimates which suggest upwards of 35,000 fully deafblind adults in the USA. Deafblind population estimates depend heavily on the degree of impairment used in the definition.
The use of CAPTCHA thus excludes a small percentage of users from using significant subsets of such common Web-based services as PayPal, Gmail, Orkut, Yahoo!, many forum and weblog systems, etc.
Even for perfectly sighted individuals, new generations of graphical CAPTCHAs, designed to overcome sophisticated recognition software, can be very hard or impossible to read.
A method of improving CAPTCHA to ease the work with it was proposed by ProtectWebForm and named "Smart CAPTCHA". Developers are advised to combine CAPTCHA with JavaScript. Since it is hard for most bots to parse and execute JavaScript, a combinatory method which fills the CAPTCHA fields and hides both the image and the field from human eyes was proposed.
One alternative method involves displaying to the user a simple mathematical equation and requiring the user to enter the solution as verification. Although these are much easier to defeat using software, they are suitable for scenarios where graphical imagery is not appropriate, and they provide a much higher level of accessibility for blind users than the image-based CAPTCHAs. These are sometimes referred to as MAPTCHAs (M = "mathematical"). However, these may be difficult for users with a cognitive disorder.
Other kinds of challenges, such as those that require understanding the meaning of some text (e.g., a logic puzzle, trivia question, or instructions on how to create a password) can also be used as a CAPTCHA. Again, there is little research into their resistance against countermeasures.
Cheap or unwitting human labor
It is possible to subvert CAPTCHAs by relaying them to a sweatshop of human operators who are employed to decode CAPTCHAs. A 2005 paper from a W3C working group stated that such an operator could verify hundreds per hour. In 2010, the University of California at San Diego conducted a large scale study of CAPTCHA farms and found out that the retail price for solving one million CAPTCHAs was as low as $1,000.
Another technique that has been described of using a script to re-post the target site's CAPTCHA as a CAPTCHA to a site owned by the attacker, which unsuspecting humans visit and correctly solve within a short while for the script to use. This technique is likely to be economically unfeasible for most attackers due to the cost of attracting enough users and running a popular site.
Outsourcing to paid services
There are multiple Internet companies like 2Captcha and DeathByCaptcha that offer human and machine backed CAPTCHA solving services for as low as US$0.50 per 1000 solved CAPTCHAs. These services offer APIs and libraries that enable users to integrate CAPTCHA circumvention into the tools that CAPTCHAs were designed to block in the first place.
Insecure implementation
Howard Yeend has identified two implementation issues with poorly designed CAPTCHA systems:
*Some CAPTCHA protection systems can be bypassed without using OCR simply by reusing the session ID of a known CAPTCHA image.
*CAPTCHAs residing on shared servers also present a problem; a security issue on another virtual host may leave the CAPTCHA issuer's site vulnerable.
Sometimes, if part of the software generating the CAPTCHA is client-side (the validation is done on a server but the text that the user is required to identify is rendered on the client side), then users can modify the client to display the unrendered text. Some CAPTCHA systems use MD5 hashes stored client-side, which may leave the CAPTCHA vulnerable to a brute-force attack.
Notable attacks
Some notable attacks against various CAPTCHAs schemas include:
*Mori et al. published a paper in IEEE CVPR'03 detailing a method for defeating one of the most popular CAPTCHAs, EZ-Gimpy, which was tested as being 92% accurate in defeating it. The same method was also shown to defeat the more complex and less-widely deployed Gimpy program 33% of the time. However, the existence of implementations of their algorithm in actual use is indeterminate at this time.
*PWNtcha has made significant progress in defeating commonly used CAPTCHAs, which has contributed to a general migration towards more sophisticated CAPTCHAs.
Podec, a trojan discovered by the security company Kaspersky, forwards CAPTCHA requests to an online human translation service that converts the image to text, fooling the system. Podec targets Android mobile devices.
Alternative CAPTCHAs schemas
With the demonstration that text distortion based CAPTCHAs are vulnerable to machine learning based attacks, some researchers have proposed alternatives including image recognition CAPTCHAs which require users to identify simple objects in the images presented. The argument in favor of these schemes is that tasks like object recognition are typically more complex to perform than text recognition and therefore should be more resilient to machine learning based attacks. Here are some of notable alternative CAPTCHA schemas:
*Chew et al. published their work in the 7th International Information Security Conference, ISC'04, proposing three different versions of image recognition CAPTCHAs, and validating the proposal with user studies. It is suggested that one of the versions, the anomaly CAPTCHA, is best with 100% of human users being able to pass an anomaly CAPTCHA with at least 90% probability in 42 seconds.
*Datta et al. published their paper in the ACM Multimedia '05 Conference, named IMAGINATION (IMAGE Generation for Internet Authentication), proposing a systematic way to image recognition CAPTCHAs. Images are distorted in such a way that state-of-the-art image recognition approaches (which are potential attack technologies) fail to recognize them.
*Microsoft (Jeremy Elson, John R. Douceur, Jon Howell, and Jared Saul) claim to have developed Animal Species Image Recognition for Restricting Access (ASIRRA) which ask users to distinguish cats from dogs. Microsoft had a beta version of this for websites to use. They claim "Asirra is easy for users; it can be solved by humans 99.6% of the time in under 30 seconds. Anecdotally, users seemed to find the experience of using Asirra much more enjoyable than a text-based CAPTCHA." This solution was described in a 2007 paper to Proceedings of 14th ACM Conference on Computer and Communications Security (CCS). However, this project was closed in October 2014 and is no longer available.
Post a Comment
0Comments