Jump to content

Link rot

From Wikipedia, the free encyclopedia
Page Not Found
A rotten link usually leads to an error message

Link rot(also calledlink death,link breaking,orreference rot) is the phenomenon ofhyperlinkstending over time to cease to point to their originally targetedfile,web page,orserverdue to that resource being relocated to a new address or becoming permanently unavailable. A link that no longer points to its target, often called abroken,dead,ororphanedlink, is a specific form ofdangling pointer.

The rate of link rot is a subject of study and research due to its significance to theinternet's ability to preserve information. Estimates of that rate vary dramatically between studies. Information professionals have warned that link rot could make important archival data disappear, potentially impacting the legal system and scholarship.

Commonly, broken website links may immediately redirect the user to the home page of the website, confusing users even more and resulting in it being difficult to obtain the URL of the broken link.

Prevalence[edit]

A number of studies have examined the prevalence of link rot within theWorld Wide Web,in academic literature that usesURLsto cite web content, and withindigital libraries.

A 2002 study suggested that link rot within digital libraries is considerably slower than on the web, finding that about 3% of the objects were no longer accessible after one year[1](equating to ahalf-lifeof nearly 23 years).

A 2003 study found that on the Web, about one link out of every 200 broke each week,[2]suggesting ahalf-lifeof 138 weeks. This rate was largely confirmed by a 2016–2017 study of links inYahoo! Directory(which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.[3]

A 2004 study showed that subsets of Web links (such as those targeting specific file types or those hosted by academic institutions) could have dramatically different half-lives.[4]The URLs selected for publication appear to have greater longevity than the average URL. A 2015 study by Weblock analyzed more than 180,000 links from references in the full-text corpora of three major open access publishers and found a half-life of about 14 years,[5]generally confirming a 2005 study that found that half of theURLscited inD-Lib Magazinearticles were active 10 years after publication.[6]Other studies have found higher rates of link rot in academic literature but typically suggest a half-life of four years or greater.[7][8]A 2013 study inBMC Bioinformaticsanalyzed nearly 15,000 links in abstracts from Thomson Reuters'sWeb of Sciencecitation index and found that the median lifespan of web pages was 9.3 years, and just 62% were archived.[9]A 2021 study of external links inNew York Timesarticles published between 1996 and 2019 found a half-life of about 15 years (with significant variance among content topics) but noted that 13% of functional links no longer lead to the original content—a phenomenon calledcontent drift.[10]

A 2013 study found that 49% of links in U.S. Supreme court opinions are dead.[11]

A 2023 study looking at United StatesCOVID-19dashboards found that 23% of the state dashboards available in February 2021 were no longer available at the previous URLs in April 2023.[12]

Pew Researchfound that, in 2023, 38% of pages from 2013 went missing. Also in 2023, 54% ofEnglish Wikipediaarticles had a dead link in the 'references' section and 23% ofnews articleslinked to a dead URL.[13]

Causes[edit]

Link rot can result from several occurrences. A target web page may be removed. The server that hosts the target page could fail, be removed from service, or relocate to a newdomain name.As far back as 1999, it was noted that with the amount of material that can be stored on a hard drive, "a single disk failure could be like the burning of the library at Alexandria."[14]A domain name's registration may lapse or be transferred to another party. Some causes will result in the link failing to find any target and returning an error such asHTTP 404.Other causes will cause a link to target content other than what was intended by the link's author.

Other reasons for broken links include:

  • the restructuring of websites that causes changes in URLs (e.g.domain.net/pine_treemight be moved todomain.net/tree/pine)
  • relocation of formerly free content to behind apaywall[12]
  • a change in server architecture that results in code such asPHPfunctioning differently
  • dynamic page content such as search results that changes by design
  • deletion of the target page and/or its content
  • the presence of user-specific information (such as a login name) within the link
  • deliberate blocking bycontent filtersorfirewalls
  • the expiration of adomain name registration

Prevention and detection[edit]

Strategies for preventing link rot can focus on placing content where its likelihood of persisting is higher, authoring links that are less likely to be broken, taking steps to preserve existing links, or repairing links whose targets have been relocated or removed.[citation needed]

The creation of URLs that will not change with time is the fundamental method of preventing link rot. Preventive planning has been championed byTim Berners-Leeand other web pioneers.[15]

Strategies pertaining to the authorship of links include:

Strategies pertaining to the protection of existing links include:

The detection of broken links may be done manually or automatically. Automated methods includeplug-insforcontent management systemsas well as standalone broken-link checkers such as likeXenu's Link Sleuth.Automatic checking may not detect links that return asoft 404or links that return a200 OKresponse but point to content that has changed.[25]

See also[edit]

Further reading[edit]

  • Markwell, John; Brooks, David W. (2002). "Broken Links: The Ephemeral Nature of Educational WWW Hyperlinks".Journal of Science Education and Technology.11(2): 105–108.doi:10.1023/A:1014627511641.S2CID60802264.
  • Gomes, Daniel; Silva, Mário J. (2006)."Modelling Information Persistence on the Web"(PDF).Proceedings of the 6th International Conference on Web Engineering.ICWE'06. Archived fromthe original(PDF)on 2011-07-16.Retrieved14 September2010.
  • Dellavalle, Robert P.; Hester, Eric J.; Heilig, Lauren F.; Drake, Amanda L.; Kuntzman, Jeff W.; Graber, Marla; Schilling, Lisa M. (2003)."Going, Going, Gone: Lost Internet References".Science.302(5646): 787–788.doi:10.1126/science.1088234.PMID14593153.S2CID154604929.
  • Koehler, Wallace (1999). "An Analysis of Web Page and Web Site Constancy and Permanence".Journal of the American Society for Information Science.50(2): 162–180.doi:10.1002/(SICI)1097-4571(1999)50:2<162::AID-ASI7>3.0.CO;2-B.
  • Sellitto, Carmine (2005)."The impact of impermanent Web-located citations: A study of 123 scholarly conference publications"(PDF).Journal of the American Society for Information Science and Technology.56(7): 695–703.CiteSeerX10.1.1.473.2732.doi:10.1002/asi.20159.

References[edit]

  1. ^Nelson, Michael L.; Allen, B. Danette (2002)."Object Persistence and Availability in Digital Libraries".D-Lib Magazine.8(1).doi:10.1045/january2002-nelson.Archivedfrom the original on 2020-07-19.Retrieved2019-09-24.
  2. ^Fetterly, Dennis; Manasse, Mark; Najork, Marc; Wiener, Janet (2003)."A large-scale study of the evolution of web pages".Proceedings of the 12th international conference on World Wide Web.Archivedfrom the original on 9 July 2011.Retrieved14 September2010.
  3. ^van der Graaf, Hans."The half-life of a link is two year".ZOMDir's blog.Archivedfrom the original on 2017-10-17.Retrieved2019-01-31.
  4. ^abKoehler, Wallace (2004)."A longitudinal study of web pages continued: a consideration of document persistence".Information Research.9(2).Archivedfrom the original on 2017-09-11.Retrieved2019-01-31.
  5. ^"All-Time Weblock Report".August 2015. Archived fromthe originalon 4 March 2016.Retrieved12 January2016.
  6. ^abMcCown, Frank; Chan, Sheffan; Nelson, Michael L.; Bollen, Johan (2005)."The Availability and Persistence of Web References in D-Lib Magazine"(PDF).Proceedings of the 5th International Web Archiving Workshop and Digital Preservation (IWAW'05).Archived fromthe original(PDF)on 2012-07-17.Retrieved2005-10-12.
  7. ^Spinellis, Diomidis(2003)."The Decay and Failures of Web References".Communications of the ACM.46(1): 71–77.CiteSeerX10.1.1.12.9599.doi:10.1145/602421.602422.S2CID17750450.Archivedfrom the original on 2020-07-23.Retrieved2007-09-29.
  8. ^Steve Lawrence;David M. Pennock;Gary William Flake;et al. (March 2001). "Persistence of Web References in Scientific Research".Computer.34(3): 26–31.CiteSeerX10.1.1.97.9695.doi:10.1109/2.901164.ISSN0018-9162.WikidataQ21012586.
  9. ^Hennessey, Jason; Xijin Ge, Steven (2013)."A Cross Disciplinary Study of Link Decay and the Effectiveness of Mitigation Techniques".BMC Bioinformatics.14(Suppl 14): S5.doi:10.1186/1471-2105-14-S14-S5.PMC3851533.PMID24266891.
  10. ^"What the ephemerality of the Web means for your hyperlinks".Columbia Journalism Review.Archivedfrom the original on 2021-08-02.Retrieved2021-08-02.
  11. ^Garber, Megan (2013-09-23)."49% of the Links Cited in Supreme Court Decisions Are Broken".The Atlantic.Retrieved2024-01-10.
  12. ^abAdams, Aaron M.; Chen, Xiang; Li, Weidong; Chuanrong, Zhang (27 July 2023)."Normalizing the pandemic: exploring the cartographic issues in state government COVID-19 dashboards".Journal of Maps.19(5): 1–9.doi:10.1080/17445647.2023.2235385.
  13. ^Chapekis, Athena; Bestvater, Samuel; Remy, Emma; Rivero, Gonzalo (May 17, 2024)."When Online Content Disappears".Pew Research Center.RetrievedMay 19,2024.
  14. ^McGranaghan, Matthew (1999)."The Web, Cartography and Trust".Cartographic Perspectives(32): 3–5.doi:10.14714/CP32.624.
  15. ^Berners-Lee, Tim(1998)."Cool URIs Don't Change".Archivedfrom the original on 2000-03-02.Retrieved2019-01-31.
  16. ^abKille, Leighton Walter (8 November 2014)."The Growing Problem of Internet" Link Rot "and Best Practices for Media and Online Publishers".Journalist's Resource, Harvard Kennedy School.Archivedfrom the original on 12 January 2015.Retrieved16 January2015.
  17. ^Sicilia, Miguel-Angel, et al. "Decentralized Persistent Identifiers: a basic model for immutable handlersArchived2023-05-10 at theWayback Machine."Procedia computer science 146 (2019): 123-130.
  18. ^"Internet Archive: Digital Library of Free Books, Movies, Music & Wayback Machine".2001-03-10.Archivedfrom the original on 26 January 1997.Retrieved7 October2013.
  19. ^Eysenbach, Gunther; Trudel, Mathieu (2005)."Going, going, still there: Using the WebCite service to permanently archive cited web pages".Journal of Medical Internet Research.7(5): e60.doi:10.2196/jmir.7.5.e60.PMC1550686.PMID16403724.
  20. ^Zittrain, Jonathan; Albert, Kendra; Lessig, Lawrence (12 June 2014)."Perma: Scoping and Addressing the Problem of Link and Reference Rot in Legal Citations"(PDF).Legal Information Management.14(2): 88–99.doi:10.1017/S1472669614000255.S2CID232390360.Archived(PDF)from the original on 1 November 2020.Retrieved10 June2020.
  21. ^"Harvard University's Berkman Center Releases Amber, a" Mutual Aid "Tool for Bloggers & Website Owners to Help Keep the Web Available | Berkman Center".cyber.law.harvard.edu.Archivedfrom the original on 2016-02-02.Retrieved2016-01-28.
  22. ^"Arweave - A community-driven ecosystem".arweave.org.Archivedfrom the original on 2023-03-15.Retrieved2023-03-15.
  23. ^Rønn-Jensen, Jesper (2007-10-05)."Software Eliminates User Errors And Linkrot".Justaddwater.dk.Archivedfrom the original on 11 October 2007.Retrieved5 October2007.
  24. ^Mueller, John (2007-12-14)."FYI on Google Toolbar's Latest Features".Google Webmaster Central Blog.Archivedfrom the original on 13 September 2008.Retrieved9 July2008.
  25. ^Bar-Yossef, Ziv; Broder, Andrei Z.; Kumar, Ravi; Tomkins, Andrew (2004). "Sic transit gloria telae: towards an understanding of the Web's decay".Proceedings of the 13th international conference on World Wide Web – WWW '04.pp. 328–337.CiteSeerX10.1.1.1.9406.doi:10.1145/988672.988716.ISBN978-1581138443.

External links[edit]