Wikipedia:OABOT
OAbotis a tool to easily edit articles to make academic citations link open access publications (seelist of edits made).
Wikipedia links to hundreds of thousands of paywalled sources. Our community does not prohibit or even discourage citing paywalled sources, but at the same time there is absolutely no prohibition on surfacingopen access(OA) versions right alongside those citations, aslong as the link does not violate any copyrights.Indeed, a good citation will have as much information as possible to let the reader find (and use) it in the way that is easiest for them.
Bot
editWorkflow
editThe bot looks forCS1 citation templates,and for each of them:
- parses the citation usingwikiciteparser
- queriesthe Dissemin APIandUnsubwith the metadata it has extracted
- translate the
pdf_url
it returns to a parameter of the citation (|arxiv=
,|pmc=
,|doi=
or|url=
as a fallback) - if there is no such parameter in the template, and if no link is already free to read, it adds it to the template.
Examples
edit- Adding a free to read
|url=
:- Before:Groussard, M.; Rauchs, G.; Landeau, B.; Viader, F.; Desgranges, B.; Eustache, F.; Platel, H. (2010). "The neural substrates of musical memory revealed by fMRI and two semantic tasks".NeuroImage.53(4): 1301–1309.doi:10.1016/j.neuroimage.2010.07.013.PMID20627131.S2CID8955075.
- After:Groussard, M.; Rauchs, G.; Landeau, B.; Viader, F.; Desgranges, B.; Eustache, F.; Platel, H. (2010)."The neural substrates of musical memory revealed by fMRI and two semantic tasks"(PDF).NeuroImage.53(4): 1301–1309.doi:10.1016/j.neuroimage.2010.07.013.PMID20627131.S2CID8955075.
- Adding a
|citeseerx=
:- Before:Selinger, Peter (2011). "A survey of graphical languages for monoidal categories".New Structures for Physics.Lecture Notes in Physics. Vol. 813. Springer. pp. 289–233.
- After:Selinger, Peter (2011). "A survey of graphical languages for monoidal categories".New Structures for Physics.Lecture Notes in Physics. Vol. 813. Springer. pp. 289–233.CiteSeerX10.1.1.216.4918.
- Signalling openness of an existing DOI with
|doi-access=free
:- Before:Lambek, Joachim(1972). "Bicommutators of nice injectives".Journal of Algebra.21:60–73.doi:10.1016/0021-8693(72)90034-8.ISSN0021-8693.MR0301052.
- After:Lambek, Joachim(1972)."Bicommutators of nice injectives".Journal of Algebra.21:60–73.doi:10.1016/0021-8693(72)90034-8.ISSN0021-8693.MR0301052.
Code
editYou are very welcome to contribute to the code (for instance by pull requests on GitHub) and join the development team on wmflabs. You canrequest access to the Tools project.
If you want to make suggestions or report bugs, please add a task to thePhabricator project.
Questions
editHow does the bot work?
editOABOT extracts the citations from an article and searches various indexes, apis, and repositories for versions of non-OA article which are free to read. OABOT can use theDisseminbackend to find these versions from sources likeCrossRef,BASE,DOAIandSHERPA/RoMEO.When it finds an alternative version, it checks to see if it is already in the citation. If not there, it adds a free-to-read link to the citation. This helps readers access full text.
What kind of links does the bot add?
editThe bot adds a link with one the following parameters:
|arxiv=
|hdl=
|doi=
|pmc=
|citeseerx=
|url=
The bot only uses|url=
if none of the other more specific parameters is known or applicable.
The bot only adds a parameter if it does not contain anything before (so, the bot does not erase any information from the templates).
What kinds of links won't the bot add?
edit- The bot won't add a link to a version not in CrossRef, BASE, DOAI, or SHERPA/RoMEO (it's not an open-web search for any version or pdf, it only draws from curated sources).
- The bot won't add a link to an alternative version of a source that is already signaled as free to read (that is, ifappears in the rendered source).
- The bot won't generally replace an existing
|url=
with a different one, or add a second|url=
. - The bot will ignore sources in free form: it only considers citation templates.
- The bot will try not to add redundant links, such as links to publisher versions already linked through a DOI.
What repositories is the bot querying and pulling from?
editThe bot currently queries:
- Dissemin.Dissemin relies on several sources, includingZenodo,ORCID,andBASE,seehttps://dev.dissem.in/datasources.html
- Unpaywall(formerly OAdoi).Unpaywallcrawls the sources listed athttps://api.unpaywall.org/data/sources.csv
In the future we could addInternet Archive Scholar(or any others, likeCORE,SHARE Notify,Handle.net,MLA CORE,CHORUS), once their indexes provide additional benefit and have a workable API.
What's the copyright status of the proposed links?
editThe bot adds links togratiscopies offered by repositories and publishers under a variety of licenses: some are notfreely licensedor don't have apublic licenseat all, for examplebronze open accesscopies by publishers or some archival copies. Publishers and repositories obtain the right to do so in a variety of ways.
Our sources (listed above) only link reputablearchivesandopen access repositories,typically run by libraries or research institutions, which are not known to violatecopyrightlaw. For example, underEuropean Union copyright law,which is more restrictive than thecopyright law of the United States,a secondary publication right or othercopyright limitationexists invariouscountries(including Belgium, France, Germany, the Netherlands,Slovenia,Bulgaria), allowing repositories to obtain and provide a license. Such jurisdictions also tend to host the bigger repositories (likeHAL).
However, mistakes are always possible. If you know or reasonably suspect a publisher or repository to have provided a work in error,do not link it.
Finally, publishers don't always endorse the existing laws of all countries, and may profess to have the right to prevent such archival efforts. You can learn the publisher's opinion from any copyright statements available at the DOI's location and from the SHERPA/RoMEO summary of each journal's policy.
For additional information see also:
Why did the bot not add this identifier?
editOABOT tries to perform the minimum changes required to make a citation open access.
The identifier you have in mind may not be known to provide an open access copy, or it may be one ofmany identifiersnot currently supported. Alternatively, another identifier is present which already auto-links the title and guarantees the open access status of the work (most commonly it'sPubMed Central).
Why did the bot remove a doi-access parameter?
editThe work is now considered closed access atUnpaywall,so we're no longer sure that the DOI actually provides a full text PDF. Usually this happens forbronze open access(gratis, non-libre) works, such as works temporarily made accessible at the height of the COVID pandemic.
The status of works with a freeCreative Commons licenseor hosted by anopen access repositorytends to be more durable.
How do I stop the bot from removing a link?
editAs discussed above, the bot tries to avoid touching citations which already clearly provide an open access copy.
The best way to ensure a citation keeps linking your preferred copy is to add a direct link to an archived PDF or an open access repository identifier. For example, if you provide aPubMed Centralidentifier,{{cite journal}}will keep linking the PMC copy, which is often a publisher-provided copy of the published version, even if the doi-access parameter changes.
A publisher-provided copy can be linked more permanently by adding the URL of anInternet Archivepreserved version, which can often be found throughhttps://fatcat.wikisearch oridentifier lookup(or even aGoogle Scholarsearch): seeexample edit.If no archived copy is available, but the publisher provides aCreative Commonslicensed copy, you can manually download that and archive it onZenodo(Dissemincan be used for this; if you upload directly to Zenodo, don't forget to use the publisher's DOI, otherwise Unpaywall won't match the copy), and link the Zenodo copy in the URL parameter.
Why did the bot remove an URL?
editThe URL may be redundant with an identifier parameter (for example the DOI) or may need to be removed in order to provide the best known open access copy.
Many existing URLs need to be removed in order to be able to follow the recommendations forConvenience linksandAccess indicators for url-holding parameters.In hundreds of thousands of cases a redundant and paywalled URL has been added to{{cite journal}}due to a bug in VisualEditor/Citoid (T232771) and not a conscious choice by the person who added the citation.
In other cases, the URL may have changed, for example because anopen repositorychanged URL structure (and we're unable to usehandle.net identifiersfor it) or because the canonical location changed (for example, a copy preserved by theInternet Archivemay be reachable from multiple URLs under web.archive.org, archive.org or scholar.archive.org, as well as partnering libraries like biodiversitylibrary.org).
Why does the oabot tool make edits the bot doesn't?
editTheoabot toolallows users to perform edits which are not yet allowed forUser:OAbotto run automatically, such as certain link removals or additions.
I am a publisher. How do I make sure OAbot recognizes my full texts?
editYou should make sure that
- You comply with theGoogle Scholar guidelinesfor exposing your full texts. In particular, the landing page for articles that are free to read should contain the meta tag
citation_pdf_url
with a direct link to a PDF file. - Zoterois able to import metadata and the full text from any landing page. This should be straightforward if you comply with Google Scholar's guidelines. Otherwise, you can fix the Zotero translator yourself bysubmitting a pull request to Zotero.
In addition, it also is useful if you make sure that
- All your fully open-access journals are registered inDOAJ.
- The CrossRef metadata includes the correct license for each article: it should be straightforward to tell whether the article is free to read simply looking at this piece of information.
Once you comply with these guidelines, the bot should mark your DOIs as free to read in Wikipedia, with a green lock:
- Lambek, Joachim(1972), "Bicommutators of nice injectives",Journal of Algebra,21:60–73,doi:10.1016/0021-8693(72)90034-8,ISSN0021-8693,MR0301052
I run a repository. How do I make sure OAbot can add links to my repository?
edit- Get avalidOAI-PMHinterface which should be harvested byBASE
- Comply with theGoogle Scholar guidelinesfor exposing your full texts. In particular, the landing page for articles that are free to read should contain the meta tag
citation_pdf_url
with a direct link to a PDF file. - Zoteroshould be able to retrieve metadata and the full text from any landing page. This should be straightforward if you comply with Google Scholar's guidelines. Otherwise, you can fix the Zotero translator yourself bysubmitting a pull request to Zotero.
I am a researcher. How do I make sure OAbot finds full texts for my papers?
editMake sure all your papers are deposited in a mature repository (that complies with the guidelines above) such asZenodo.You can usehttp://dissem.in/for that. Other large repositories such asPubMed Central,arXivorHALwill work too. The repository should give free access to the full text (not just the abstract). Records with ongoing embargoes are not considered.
Full texts stored on personal homepages will generally not be considered.
How many links should the bot add?
editThe bot only adds 1 link, even if it finds multiple alternative versions. For example, if OABOT finds a preprint on ArXiv and a post-print on a university repository, and a PDF on the author's website, then it chooses only one, based on a ranking algorithm in Dissemin.
What does the citation look like?
editWhen the URL parameter is changed, the citation doesn't have any additional text or graphical elements, just an additional link.
Can we signal the version type (preprint, postprint, published version)?
editAt the moment, no. For most repositories this metadata just doesn't exist or isn't well-curated.
How can the bot be localized/globalized to work on any wiki?
editThe bot can function on any wiki, but it is limited by whether or not they use the CS1 citation templates and in the same way.
Edge cases for future development
editOABot will find situations where there is already a url present which is not open, but the bot can locate a free-to-read version. In some cases we can add the secondary link as an identifier, but there are edge cases we need consensus on where the bot behavior is undetermined:
- When the
|url=
matches an existing identifier:- Say we have
|doi=10.1004/1543
and|url=http://doi.org/10.1004/1543
.Can we overwrite|url=
to put a free-to-read repository there?
- Say we have
- When we can't match the
|url=
with an existing identifier but OABot finds a repository version:- For instance if we find
|url=http:// sciencedirect /science/article/pii/S1535610816303981
,we won't overwrite|url=
automatically, but we would like to add the free repository URL somewhere else. If the free URLs we want to add stem from few repositories, is it appropriate to create templates for these specific repositories, and add them as|id={{my repository|12345}}
?
- For instance if we find
Next steps
edit- Localize/Globalize bot ontranslatewiki.net
- Integrate with Wikidata
- Use acitation parserto format references without templates
Resources
edit- Pywikibot framework documentation
- wikiciteparser,a Python parser for citation templates based on mwparserfromhell
- Wikicite– a way to record and track clusters of related articles
See also:
People
edit- Ocaasi (WMF),Jake Orlowitz, founder ofThe Wikipedia Library
- Pintoch(talk), from theDissemin project,main developer
- symac,coordinator for French speaking TWL and pywikibot-owner
- Andrew Su
- James Webber
- a3nm
- Sckott
- Christian Pietsch,responsible for APIs atBASE (search engine)