reCAPTCHA Inc.[1]is aCAPTCHAsystem owned byGoogle.It enables web hosts to distinguish between human and automated access to websites. The original version asked users to decipher hard-to-read text or match images. Version 2 also asked users to decipher text or match images if the analysis of cookies and canvas rendering suggested the page was being downloaded automatically.[2]Since version 3, reCAPTCHA will never interrupt users and is intended to run automatically when users load pages or click buttons.[3]

reCAPTCHA Inc.
Original author(s)
Developer(s)Google
Initial releaseMay 27, 2007;17 years ago(2007-05-27)
TypeClassic version:CAPTCHA
New version:Behavioral analysis
Websitegoogle.com/recaptcha

The original iteration of the service was amass collaborationplatform designed for the digitization of books, particularly those that were too illegible to bescanned by computers.The verification prompts utilized pairs of words from scanned pages, with one known word used as a control for verification, and the second used tocrowdsourcethe reading of an uncertain word.[4]reCAPTCHA was originally developed byLuis von Ahn,David Abraham,Manuel Blum,Michael Crawford, Ben Maurer, Colin McMillen, and Edison Tan atCarnegie Mellon University'smainPittsburghcampus.[5]It was acquired byGooglein September 2009.[6]The system helped to digitize the archives ofThe New York Times,and was subsequently used byGoogle Booksfor similar purposes.[7]

The system was reported as displaying over 100 million CAPTCHAs every day,[8]on sites such asFacebook,TicketMaster, Twitter,4chan,CNN.com,StumbleUpon,[9]Craigslist(since June 2008),[10]and the U.S. National Telecommunications and Information Administration'sdigital TV converter boxcoupon program website (as part of theUS DTV transition).[11]

In 2014, Google pivoted the service away from its original concept, with a focus on reducing the amount of user interaction needed to verify a user, and only presenting human recognition challenges (such as identifying images in a set that satisfy a specific prompt) if behavioral analysis suspects that the user may be a bot.

In October 2023, it was found that OpenAI'sGPT-4chatbot could solve CAPTCHAs.[12]

Origin

edit

Distributed Proofreaderswas the first project to volunteer its time to decipher scanned text that could not be read byoptical character recognition(OCR) programs. It works withProject Gutenbergto digitizepublic domainmaterial and uses methods quite different from reCAPTCHA.

The reCAPTCHA program originated with Guatemalancomputer scientistLuis von Ahn,[13]and was aided by aMacArthur Fellowship.An early CAPTCHA developer, he realized "he had unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles".[14]

Operation

edit

ReCAPTCHA v1 (human-assisted OCR)

edit
An example of how a reCAPTCHA challenge looked in 2007,[15]containing the words "following" and "finding". The waviness and horizontal stroke were added to increase the difficulty of breaking the CAPTCHA with a computer program.
Image identification CAPTCHA, which requires users to select the appropriate images to verify they are human

Scanned text is subjected to analysis by two different OCRs. Any word that is deciphered differently by the two OCR programs or that is not in an English dictionary is marked as "suspicious" and converted into a CAPTCHA. The suspicious word is displayed, out of context, sometimes along with a control word already known. If the human types the control word correctly, then the response to the questionable word is accepted as probably valid. If enough users were to correctly type the control word, but incorrectly type the second word which OCR had failed to recognize, then the digital version of documents could end up containing the incorrect word. The identification performed by each OCR program is given a value of 0.5 points, and each interpretation by a human is given a full point. Once a given identification hits 2.5 points, the word is considered valid. Those words that are consistently given a single identity by human judges are later recycled as control words.[16]If the first three guesses match each other but do not match either of the OCRs, they are considered a correct answer, and the word becomes a control word.[17]When six users reject a word before any correct spelling is chosen, the word is discarded as unreadable.[17]

The original reCAPTCHA method was designed to show the questionable words separately, as out-of-context correction, rather than in use, such as within a phrase of five words from the original document.[18]Also, the control word might mislead the context for the second word, such as a request of "/metal/ /fife/" being entered as "metalfile"due to the logical connection of filing with a metal tool being considered more common than the musical instrument"fife".[citation needed]

In 2012, reCAPTCHA began using photographs taken fromGoogle Street Viewproject, in addition to scanned words.[19]It will ask the user to identify images of crosswalks, street lights, and other objects. It has been hypothesized that the data is used byWaymo(a Google subsidiary) to train autonomous vehicles, though an unnamed representative has denied this, claiming the data was only being used to improve Google Maps as of mid-2021.[20]

Google charges for the use of reCAPTCHA on websites that make over a million reCAPTCHA queries a month.[21]

No CAPTCHA reCAPTCHA (v2+)

edit
The NoCAPTCHA reCAPTCHA

In 2013, reCAPTCHA began implementingbehavioral analysisof the browser's interactions to predict whether the user was a human or a bot. The following year, Google began to deploy a new reCAPTCHA API, featuring the "no CAPTCHA reCAPTCHA" —where users deemed to be of low risk only need to click a singlecheckboxto verify their identity. A CAPTCHA may still be presented if the system is uncertain of the user's risk; Google also introduced a new type of CAPTCHA challenge designed to be more accessible to mobile users, where the user must select images matching a specific prompt from a grid.[2][22]

In 2017, Google introduced a new "invisible" reCAPTCHA, where verification occurs in the background, and no challenges are displayed at all if the user is deemed to be of low risk.[23][24][25]According to former Google "click fraudczar "Shuman Ghosemajumder,this capability "creates a new sort of challenge that very advanced bots can still get around, but introduces a lot less friction to the legitimate human."[25]

reCAPTCHA v1 was declaredend-of-lifeand shut down on March 31, 2018.[26]

Implementation

edit

The reCAPTCHA tests are displayed from the central site of the reCAPTCHA project, which supplies the words to be deciphered. This is done through aJavaScriptAPIwith the server making a callback to reCAPTCHA after the request has been submitted. The reCAPTCHA project provides libraries for various programming languages and applications to make this process easier. reCAPTCHA is a free-of-charge service provided to websites for assistance with the decipherment,[27]but the reCAPTCHA software is notopen-source.[28]

Also, reCAPTCHA offers plugins for several web-application platforms includingASP.NET,Ruby,andPHP,to ease the implementation of the service.[29]

Security

edit
An example of how reCAPTCHA challenges were presented in 2010,[30]containing the words "and chisels"

The main purpose of aCAPTCHAsystem is to block spambots while allowing human users. On December 14, 2009, Jonathan Wilkins released a paper describing weaknesses in reCAPTCHA that allowed bots to achieve a solve rate of 18%.[31][32][33]

On August 1, 2010, Chad Houck gave a presentation to theDEF CON18 Hacking Conference detailing a method to reverse the distortion added to images which allowed a computer program to determine a valid response 10% of the time.[34][35]The reCAPTCHA system was modified on July 21, 2010, before Houck was to speak on his method. Houck modified his method to what he described as an "easier" CAPTCHA to determine a valid response 31.8% of the time. Houck also mentioned security defenses in the system, including a high-security lockout if an invalid response is given 32 times in a row.[36]

On May 26, 2012, Adam, C-P, and Jeffball of DC949 gave a presentation at the LayerOne hacker conference detailing how they were able to achieve an automated solution with an accuracy rate of 99.1%.[37]Their tactic was to use techniques from machine learning, a subfield of artificial intelligence, to analyze the audio version of reCAPTCHA which is available for the visually impaired. Google released a new version of reCAPTCHA just hours before their talk, making major changes to both the audio and visual versions of their service. In this release, the audio version was increased in length from 8 seconds to 30 seconds and is much more difficult to understand, both for humans as well as bots. In response to this update and the following one, the members of DC949 released two more versions of Stiltwalker which beat reCAPTCHA with an accuracy of 60.95% and 59.4% respectively. After each successive break, Google updated reCAPTCHA within a few days. According to DC949, they often reverted to features that had been previously hacked.

On June 27, 2012, Claudia Cruz, Fernando Uceda, and Leobardo Reyes published a paper showing a system running on reCAPTCHA images with an accuracy of 82%.[38]The authors have not said if their system can solve recent reCAPTCHA images, although they claim their work to beintelligent OCRand robust to some, if not all changes in the image database.

In an August 2012 presentation given at BsidesLV 2012, DC949 called the latest version "unfathomably impossible for humans" —they were not able to solve them manually either.[37]The web accessibility organization WebAIM reported in May 2012, "Over 90% of respondents [screen reader users] find CAPTCHA to be very or somewhat difficult".[39]

Criticism

edit

The original iteration of reCAPTCHA was criticized as being a source ofunpaid workto assist in transcribing efforts.[40]

Google profits from reCAPTCHA users as free workers to improve its AI research.[41]

Privacy

edit

The current iteration of the system has been criticized for its reliance ontracking cookiesand promotion ofvendor lock-inwith Google services; administrators are encouraged to include reCAPTCHA tracking code on all pages of their website to analyze the behavior and "risk" of users, which determines the level of friction presented when a reCAPTCHA prompt is used.[42]Google stated in itsprivacy policythat user data collected in this manner is not used for personalized advertising. It was also discovered that the system favors those who have an activeGoogle accountlogin, and displays a higher risk towards those using anonymizing proxies and VPN services.[23]

Concerns were raised regarding privacy when Google announced reCAPTCHA v3.0, as it allows Google to track users on non-Google websites.[23]

In April 2020,Cloudflareswitched from reCAPTCHA to hCaptcha, citing privacy concerns over Google's potential use of the data they recollect through reCAPTCHA fortargeted advertising[43]and to cut down on operating costs since a considerable portion of Cloudflare's customers are non-paying customers. In response, Google toldPC Magazinethat the data from reCAPTCHA is never used for personalized advertising purposes.[21]

Accessibility

edit

Google's help center states that reCAPTCHA is notsupportedfor thedeafblindcommunity,[44]effectively locking such users out of all pages that use the service. However, reCAPTCHA does currently have the longest list of accessibility considerations of any CAPTCHA service.[45]

Interface

edit

In one of the variants of CAPTCHA challenges, images are not incrementally highlighted, but fade out when clicked, and replaced with a new image fading in, resemblingwhack-a-mole.

Criticism has been aimed at the long duration taken for the images to fade out and in.[46]

Derivative projects

edit

reCAPTCHA also created the Mailhide project, which protectsemail addresseson web pages from beingharvestedbyspammers.[47]By default, the email address was converted into a format that did not allow acrawlerto see the full email address; for example, "[email protected]" would have been converted to "[email protected]". The visitor would then click on the "..." and solve the CAPTCHA to obtain the full email address. One could also edit the pop-up code so that none of the addresses were visible. Mailhide was discontinued in 2018 because it relied on reCAPTCHA v1.[48]

References

edit
  1. ^"Recaptcha Inc".OpenCorporates.August 28, 2007.Archivedfrom the original on August 20, 2023.RetrievedAugust 20,2023.
  2. ^abShet, Vinay (December 3, 2014)."Are you a robot? Introducing 'CAPTCHA the ReCAPTCHA PREDATORS".Archivedfrom the original on September 3, 2020.RetrievedFebruary 24,2021.
  3. ^"reCAPTCHA v3".Archivedfrom the original on September 25, 2020.RetrievedSeptember 8,2020.
  4. ^Ahn, Luis von (December 6, 2011),Massive-scale online collaboration,archivedfrom the original on July 15, 2020,retrievedApril 14,2020
  5. ^"reCAPTCHA: About Us".Archived fromthe originalon June 11, 2010.RetrievedAugust 14,2018.
  6. ^"Teaching computers to read: Google acquires reCAPTCHA".Archivedfrom the original on May 19, 2013.RetrievedSeptember 16,2009.
  7. ^"Deciphering Old Texts, One Woozy, Curvy Word at a Time".The New York Times.March 28, 2011.Archivedfrom the original on November 17, 2017.RetrievedNovember 20,2017.
  8. ^"reCAPTCHA FAQ".Archivedfrom the original on July 5, 2010.RetrievedJune 12,2011.
  9. ^Rubens, Paul (October 2, 2007)."Spam weapon helps preserve books".BBC.Archivedfrom the original on May 18, 2013.RetrievedOctober 3,2007.
  10. ^"Fight Spam, Digitize Books".Craigslist Blog. June 2008.Archivedfrom the original on July 6, 2010.RetrievedJune 17,2008.
  11. ^"TV Converter Box Program".dtv2009.gov.Archived fromthe originalon November 4, 2009.
  12. ^Edwards, Benj (October 2, 2023)."Dead grandma locket request tricks Bing Chat's AI into solving security puzzle".Ars Technica.Archivedfrom the original on October 10, 2023.RetrievedOctober 25,2023.
  13. ^""Full Interview: Luis von Ahn on Duolingo", Spark, November 2011 ".Canadian Broadcasting Corporation. November 30, 2011.Archivedfrom the original on June 3, 2012.RetrievedJuly 10,2013.
  14. ^Hutchinson, Alex (March 12, 2009)."Human Resources: The job you didn't even know you had".The Walrus.Archivedfrom the original on December 3, 2015.RetrievedDecember 7,2015.
  15. ^"reCAPTCHA: Using Captchas To Digitize Books".TechCrunch.September 16, 2007.Archivedfrom the original on June 3, 2017.RetrievedJune 25,2017.
  16. ^Timmer, John (August 14, 2008)."CAPTCHAs work? for digitizing old, damaged texts, manuscripts".Ars Technica.Archivedfrom the original on January 24, 2009.RetrievedDecember 9,2008.
  17. ^abLuis; Maurer, Ben; McMillen, Colin; Abraham, David; Blum, Manuel (2008). "reCAPTCHA: Human-Based Character Recognition via Web Security Measures"".Science.321(5895): 1465–1468.Bibcode:2008Sci...321.1465V.CiteSeerX10.1.1.141.6563.doi:10.1126/science.1160379.PMID18703711.S2CID18371056.
  18. ^""questionable validity of results if words are presented out of context", Google Groups, August 29, 2008 ".Archivedfrom the original on April 30, 2011.RetrievedJuly 10,2013.
  19. ^Perez, Sarah (March 29, 2012)."Google Now Using ReCAPTCHA To Decode Street View Addresses".TechCrunch.Archivedfrom the original on August 23, 2012.RetrievedJuly 10,2013.
  20. ^Vega, Edward (May 14, 2021)."Why captchas are getting harder".Vox.Archivedfrom the original on April 15, 2022.RetrievedApril 15,2022.
  21. ^ab"Cloudflare Dumps Google's ReCAPTCHA Over Privacy Concerns, Costs".PCMag.Archivedfrom the original on July 19, 2020.RetrievedJuly 18,2020.
  22. ^Greenberg, Andy (December 3, 2014)."Google Can Now Tell You're Not a Robot with Just One Click".Wired.Archivedfrom the original on October 2, 2015.RetrievedOctober 1,2015.
  23. ^abcSchwab, Katharine (June 27, 2019)."Google's new reCAPTCHA has a dark side".Fast Company.Archivedfrom the original on June 28, 2019.RetrievedApril 8,2020.
  24. ^Amadeo, Ron (March 9, 2017)."Google's reCAPTCHA turns 'invisible,' will separate bots from people without challenges".Ars Technica.Archivedfrom the original on August 6, 2020.RetrievedApril 14,2020.
  25. ^ab"Google just made the internet a tiny bit less annoying".Popular Science.March 10, 2017.Archivedfrom the original on February 5, 2021.RetrievedApril 5,2017.
  26. ^"Google reCAPTCHA v1 API Shutting Down in March 2018".ProgrammableWeb.Archivedfrom the original on June 20, 2020.RetrievedApril 14,2020.
  27. ^"FAQ".reCAPTCHA.net. Archived fromthe originalon July 16, 2012.
  28. ^"reCAPTCHA: Stop Spam, Read Books".Archivedfrom the original on June 19, 2020.RetrievedJanuary 14,2014.
  29. ^"Developer's Guide—reCAPTCHA".Google Inc.Archivedfrom the original on November 24, 2017.RetrievedJanuary 14,2014.
  30. ^Greenberg, Andy (June 18, 2010)."Those Scrambled Word Tests For Stopping Spambots Are Tough For Humans Too".Forbes.Archivedfrom the original on September 9, 2017.RetrievedSeptember 10,2017.
  31. ^"Strong CAPTCHA Guidelines"(PDF).Archived(PDF)from the original on July 23, 2011.RetrievedJanuary 31,2011.
  32. ^"Google's reCAPTCHA busted by new attack".The Register.Archivedfrom the original on August 10, 2017.RetrievedAugust 10,2017.
  33. ^"Google's reCAPTCHA dented".Archivedfrom the original on March 10, 2010.RetrievedJanuary 31,2011.
  34. ^"Def Con 18 Speakers".defcon.org.Archivedfrom the original on October 20, 2010.RetrievedNovember 17,2010.
  35. ^"Decoding reCAPTCHA Paper".Chad Houck. Archived fromthe originalon August 19, 2010.
  36. ^"Decoding reCAPTCHA Power Point".Chad Houck. Archived fromthe originalon October 24, 2010.
  37. ^ab"Project Stiltwalker".Archivedfrom the original on July 2, 2012.RetrievedMay 28,2012.
  38. ^Claudia Cruz-Perez; Oleg Starostenko; Fernando Uceda-Ponga; Vicente Alarcon-Aquino; Leobardo Reyes-Cabrera (June 30, 2012). "Breaking reCAPTCHAs with Unpredictable Collapse: Heuristic Character Segmentation and Recognition". In Carrasco-Ochoa, Jesús Ariel; Martínez-Trinidad, José Francisco; Olvera López, José Arturo; Boyer, Kim L (eds.).Pattern Recognition.Lecture Notes in Computer Science. Vol. 7329. México. pp. 155–165.doi:10.1007/978-3-642-31149-9_16.ISBN978-3-642-31148-2.S2CID29097170.{{cite book}}:CS1 maint: location missing publisher (link)
  39. ^"Screen Reader User Survey #4 Results".Archivedfrom the original on December 10, 2017.RetrievedApril 19,2013.
  40. ^Harris, David L. (January 23, 2015)."Massachusetts woman's lawsuit accuses Google of using free labor to transcribe books, newspapers".Boston Business Journal.Archivedfrom the original on April 28, 2015.RetrievedSeptember 4,2015.
  41. ^"No CAPTCHA: yet another ruse devised by Google to extract free digital labor from you".Archivedfrom the original on November 12, 2020.RetrievedDecember 3,2020.
  42. ^Taylor, Chris (February 26, 2024)."Stop giving your website data away!".Prosopo.
  43. ^"Moving from reCAPTCHA to hCaptcha".The Cloudflare Blog.April 8, 2020.Archivedfrom the original on August 12, 2020.RetrievedJuly 18,2020.
  44. ^"What is CAPTCHA? - G Suite Admin Help".Archivedfrom the original on August 6, 2020.RetrievedMay 11,2020.
  45. ^"WCAG 1.1: Text Alternatives [Article]".October 6, 2020.Archivedfrom the original on November 26, 2020.RetrievedDecember 10,2020.
  46. ^"ReCaptcha extremly [sic] slow fading · Issue #268 · google/recaptcha".GitHub.Archivedfrom the original on October 14, 2020.RetrievedOctober 14,2020.
  47. ^"Mailhide: Free Spam Protection".Archivedfrom the original on January 2, 2012.RetrievedMay 15,2011.
  48. ^"Mailhide: Service discontinued".Archivedfrom the original on November 7, 2012.RetrievedMarch 3,2019.

[1]

Further reading

edit
edit
  1. ^Earn with Captcha[1]