OpenAI Codex

OpenAI Codexis anartificial intelligencemodel developed byOpenAI.It parses natural language and generatescodein response. It powersGitHub Copilot,a programmingautocompletiontool for selectIDEs,likeVisual Studio CodeandNeovim.^[1]Codex is a descendant of OpenAI'sGPT-3model,fine-tunedfor use in programming applications.

OpenAI released anAPIfor Codex inclosed beta.^[1]In March 2023, OpenAI shut down access to Codex.^[2]Due to public appeals from researchers, OpenAI reversed course.^[3]The Codex model can still be used by researchers of the OpenAI Research Access Program.^[4]

Capabilities[edit]

Based on GPT-3, aneural networktrained on text, Codex was additionally trained on 159 gigabytes ofPythoncode from 54 millionGitHubrepositories.^[5]^[6]A typical use case of Codex is for a user to type a comment, such as "//compute the moving average of an array for a given window size",then use the AI to suggest a block of code that satisfies that comment prompt.^[7]OpenAI stated that Codex can complete approximately 37% of requests and is meant to make human programming faster rather than to replace it. According to OpenAI's blog, Codex excels most at "mapping... simple problems to existing code", which they describe as "probably the least fun part of programming".^[8]^[9]Jeremy Howard,co-founder ofFast.ai,stated that "Codexis a way of getting code written without having to write as much code ", and that" it is not always correct, but it is just close enough ".^[10]According to a paper written by OpenAI researchers, when Codex attempted each test case 100 times, it generated working solutions for 70.2% of prompts.^[11]

OpenAI claims that Codex can create code in over a dozen programming languages, includingGo,JavaScript,Perl,PHP,Ruby,Shell,Swift,andTypeScript,though it is most effective in Python.^[1]According toVentureBeat,demonstrations uploaded by OpenAI showed impressivecoreference resolutioncapabilities. The demonstrators were able to create abrowser gamein JavaScript and generate data science charts usingmatplotlib.^[9]

A very powerful language model called OpenAI Codex was created expressly to generate code in response to natural language commands. It is capable of understanding and producing code in a multitude of areas because it is compatible with a large number of programming languages and libraries. Codex is a useful tool for developers who want to optimize their coding processes because it can debug, parse natural language inquiries, and provide code completions.^[12]

OpenAI showed that Codex can interface with services and apps such asMailchimp,Microsoft Word,Spotify,andGoogle Calendar.^[9]^[13]Microsoftisreportedly interested in exploring^[vague]Codex's capabilities.^[13]

Issues[edit]

OpenAI demonstrations showcased flaws such as inefficient code and one-off quirks in code samples.^[9]In an interview withThe Verge,OpenAIchief technology officer Greg Brockmansaid that "sometimes [Codex] doesn't quite know exactly what you're asking" and that it can require some trial and error.^[13]OpenAI researchers found that Codex struggles with multi-step andhigher-level^{[clarification needed]}prompts, often failing or yielding counter-intuitive behavior. Additionally, they brought up several safety issues, such as over-reliance by novice programmers, biases based on the training data, and security impacts due to vulnerable code.^[11]

VentureBeatstated that because Codex is trained on public data, it could be vulnerable to "data poisoning" via intentional uploads of malicious code.^[9]According to a study by researchers fromNew York University,approximately 40% of code generated byGitHub Copilot(which uses Codex) in scenarios relevant to high-riskCWEsincluded glitches or other exploitable design flaws.^[14]

Copyright[edit]

TheFree Software Foundationexpressed concerns that code snippets generated by Copilot and Codex couldviolate copyright,in particular the condition of theGPLthat requiresderivative worksto be licensed under equivalent terms.^[15]Issues they raised include whether training on public repositories falls intofair useor not, how developers could discover infringing generated code, whether trainedmachine learningmodels could be considered modifiable source code or a compilation of the training data, and if machine learning models could themselves be copyrighted and by whom.^[15]^[16]An internal GitHub study found that approximately 0.1% of generated code contained direct copies from the training data. In one example the model outputted the training data code implementing thefast inverse square rootalgorithm, including comments and an incorrectcopyright notice.^[7]

In response, OpenAI stated that "legal uncertainty on the copyright implications of training AI systems imposes substantial costs on AI developers and so should be authoritatively resolved."^[7]

The copyright issues with Codex have been compared to theAuthors Guild, Inc. v. Google, Inc.court case, in which judges ruled thatGoogle Books's use of text snippets from millions ofscanned booksconstituted fair use.^[7]^[17]However, use of text snippets from books provides for a reliable reference of the copyright owner, as opposed to compiled works used for the training algorithm data where the final output is made without any such reference.

References[edit]

^^a ^b ^cZaremba, Wojciech(August 10, 2021)."OpenAI Codex".OpenAI.Archivedfrom the original on 2023-02-03.Retrieved2021-09-03.
^Kemper, Jonathan (2023-03-22)."OpenAI kills its Codex code model, recommends GPT3.5 instead".THE DECODER.Archivedfrom the original on 2023-06-01.Retrieved2023-03-29.
^Logan Kilpatrick [@OfficialLoganK] (March 22, 2023)."Hey Carolyn, we will continue to support Codex access via our Researcher Access Program. Sorry for any confusion and hopefully the research is going well!"(Tweet).Retrieved2023-04-08– viaTwitter.
^"Researcher Access Program application".openai.com.Archivedfrom the original on 2023-10-10.Retrieved2023-04-08.
^Wiggers, Kyle (July 8, 2021)."OpenAI warns AI behind GitHub's Copilot may be susceptible to bias".VentureBeat.Archivedfrom the original on 2023-02-03.Retrieved2021-09-03.
^Alford, Anthony (August 31, 2021)."OpenAI Announces 12 Billion Parameter Code-Generation AI Codex".InfoQ.Archivedfrom the original on 2022-07-09.Retrieved2021-09-03.
^^a ^b ^c ^dAnderson, Tim; Quach, Katyanna (July 6, 2021)."GitHub Copilot auto-coder snags emerge, from seemingly spilled secrets to bad code, but some love it".The Register.Archivedfrom the original on 2023-06-02.Retrieved2021-09-04.
^Dorrier, Jason (August 15, 2021)."OpenAI's Codex Translates Everyday Language Into Computer Code".SingularityHub.Archivedfrom the original on 2023-05-26.Retrieved2021-09-03.
^^a ^b ^c ^d ^eDickson, Ben (August 16, 2021)."What to expect from OpenAI's Codex API".VentureBeat.Archivedfrom the original on 2023-02-03.Retrieved2021-09-03.
^Metz, Cade (September 9, 2021)."A.I. Can Now Write Its Own Computer Code. That's Good News for Humans".The New York Times.Archivedfrom the original on 2022-03-30.Retrieved2021-09-16.
^^a ^bChen, Mark; Tworek, Jerry; Jun, Heewoo; Yuan, Qiming; Pinto, Henrique Ponde de Oliveira; Kaplan, Jared; Edwards, Harri; Burda, Yuri; Joseph, Nicholas; Brockman, Greg; Ray, Alex (2021-07-14). "Evaluating Large Language Models Trained on Code".arXiv:2107.03374[cs].
^"Best AI Headshot Generators".Retrieved2024-03-12.
^^a ^b ^cVincent, James (August 10, 2021)."OpenAI can translate English into code with its new machine learning software Codex".The Verge.Archivedfrom the original on 2021-09-02.Retrieved2021-09-03.
^Pearce, Hammond; Ahmad, Baleegh; Tan, Benjamin; Dolan-Gavitt, Brendan; Karri, Ramesh (2021-12-16). "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions".arXiv:2108.09293[cs.CR].
^^a ^bKrill, Paul (August 2, 2021)."GitHub Copilot is 'unacceptable and unjust,' says Free Software Foundation".InfoWorld.Archivedfrom the original on 2021-09-03.Retrieved2021-09-03.
^Robertson, Donald (2021-07-28)."FSF-funded call for white papers on philosophical and legal questions around Copilot: Submit before Monday, August 23, 2021".Free Software Foundation.Archivedfrom the original on 2021-08-11.Retrieved2021-09-04.
^Barber, Gregory (July 12, 2021)."GitHub's Commercial AI Tool Was Built From Open Source Code".WIRED.Archivedfrom the original on 2021-07-25.Retrieved2021-09-04.

[OAI-1] Zaremba, Wojciech(August 10, 2021)."OpenAI Codex".OpenAI.Archivedfrom the original on 2023-02-03.Retrieved2021-09-03.

[2] Kemper, Jonathan (2023-03-22)."OpenAI kills its Codex code model, recommends GPT3.5 instead".THE DECODER.Archivedfrom the original on 2023-06-01.Retrieved2023-03-29.

[3] Logan Kilpatrick [@OfficialLoganK] (March 22, 2023)."Hey Carolyn, we will continue to support Codex access via our Researcher Access Program. Sorry for any confusion and hopefully the research is going well!"(Tweet).Retrieved2023-04-08– viaTwitter.

[4] "Researcher Access Program application".openai.com.Archivedfrom the original on 2023-10-10.Retrieved2023-04-08.

[VB-bias-5] Wiggers, Kyle (July 8, 2021)."OpenAI warns AI behind GitHub's Copilot may be susceptible to bias".VentureBeat.Archivedfrom the original on 2023-02-03.Retrieved2021-09-03.

[IQ-6] Alford, Anthony (August 31, 2021)."OpenAI Announces 12 Billion Parameter Code-Generation AI Codex".InfoQ.Archivedfrom the original on 2022-07-09.Retrieved2021-09-03.

[RegTA-7] Anderson, Tim; Quach, Katyanna (July 6, 2021)."GitHub Copilot auto-coder snags emerge, from seemingly spilled secrets to bad code, but some love it".The Register.Archivedfrom the original on 2023-06-02.Retrieved2021-09-04.

[SH-8] Dorrier, Jason (August 15, 2021)."OpenAI's Codex Translates Everyday Language Into Computer Code".SingularityHub.Archivedfrom the original on 2023-05-26.Retrieved2021-09-03.

[VB-9] Dickson, Ben (August 16, 2021)."What to expect from OpenAI's Codex API".VentureBeat.Archivedfrom the original on 2023-02-03.Retrieved2021-09-03.

[NYT-10] Metz, Cade (September 9, 2021)."A.I. Can Now Write Its Own Computer Code. That's Good News for Humans".The New York Times.Archivedfrom the original on 2022-03-30.Retrieved2021-09-16.

[arXiv-11] Chen, Mark; Tworek, Jerry; Jun, Heewoo; Yuan, Qiming; Pinto, Henrique Ponde de Oliveira; Kaplan, Jared; Edwards, Harri; Burda, Yuri; Joseph, Nicholas; Brockman, Greg; Ray, Alex (2021-07-14). "Evaluating Large Language Models Trained on Code".arXiv:2107.03374[cs].

[12] "Best AI Headshot Generators".Retrieved2024-03-12.

[Verge-13] Vincent, James (August 10, 2021)."OpenAI can translate English into code with its new machine learning software Codex".The Verge.Archivedfrom the original on 2021-09-02.Retrieved2021-09-03.

[RegTC-14] Pearce, Hammond; Ahmad, Baleegh; Tan, Benjamin; Dolan-Gavitt, Brendan; Karri, Ramesh (2021-12-16). "Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions".arXiv:2108.09293[cs.CR].

[IW-FSF-15] Krill, Paul (August 2, 2021)."GitHub Copilot is 'unacceptable and unjust,' says Free Software Foundation".InfoWorld.Archivedfrom the original on 2021-09-03.Retrieved2021-09-03.

[FSF-16] Robertson, Donald (2021-07-28)."FSF-funded call for white papers on philosophical and legal questions around Copilot: Submit before Monday, August 23, 2021".Free Software Foundation.Archivedfrom the original on 2021-08-11.Retrieved2021-09-04.

[WIRED-17] Barber, Gregory (July 12, 2021)."GitHub's Commercial AI Tool Was Built From Open Source Code".WIRED.Archivedfrom the original on 2021-07-25.Retrieved2021-09-04.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]