Jump to content

GPT-J

From Wikipedia, the free encyclopedia
GPT-J
Developer(s)EleutherAI
Initial releaseJune 9, 2021;3 years ago(2021-06-09)
Type
LicenseOpen-source
Website6b.eleuther.aiEdit this on Wikidata

GPT-JorGPT-J-6Bis an open-sourcelarge language model(LLM) developed byEleutherAIin 2021.[1]As the name suggests, it is agenerative pre-trained transformermodel designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.[2]

Architecture[edit]

GPT-J is aGPT-3-like model with 6 billion parameters.[3]Like GPT-3, it is anautoregressive,decoder-onlytransformermodel designed to solvenatural language processing(NLP) tasks by predicting how a piece of text will continue.[1]

Its architecture differs from GPT-3 in three main ways.[1]

Beyond that, the model has 28 transformer layers and 16 attention heads. Its vocabulary size is 50257tokens,the same size asGPT-2's.[2]It has acontext window[broken anchor]size of 2048 tokens.[6]

It was trained onthe Piledataset,[2][3]using the Mesh Transformer JAX library inJAXto handle the parallelization scheme.[2][7]

Performance[edit]

GPT-J was designed to generate English text from a prompt. It was not designed for translating or generating text in other languages or for performance without firstfine-tuningthe model for a specific task.[2]Nonetheless, GPT-J performs reasonably well even without fine-tuning, even in translation (at least from English to French).[8]

When neither is fine-tuned, GPT-J-6B performs almost as well as the 6.7 billion parameter GPT-3 (Curie) on a variety of tasks.[3]It even outperforms the 175 billion parameter GPT-3 (Davinci) on code generation tasks.[9]With fine-tuning, it outperforms an untuned GPT-3 (Davinci) on a number of tasks.[1]

Like all LLMs, it is not programmed to give factually accurate information, only to generate text based on probability.[2]

Applications[edit]

The untuned GPT-J is available on EleutherAI's website,[10]NVIDIA's Triton Inference Server,[11]and NLP Cloud's website.[12]Cerebras[1]andAmazon Web Services[13][14]offer services to fine-tune the GPT-J model for company-specific tasks.Graphcoreoffers both fine-tuning and hosting services for the untuned GPT-J, as well as offering to host the fine-tuned models after they are produced.[15]CoreWeave offers hosting services for both the untuned GPT-J and fine-tuned variants.[16][17]

In March 2023,Databricksreleased Dolly, anApache-licensed,instruction-following model created by fine-tuning GPT-J on theStanford Alpacadataset.[18]NovelAI's Sigurd[19]and Genji-JP 6B[20]models are both fine-tuned versions of GPT-J. They also offer further fine-tuning services to produce and host custom models.[21]

EleutherAI has received praise from Cerebras,[1]GPT-3 Demo,[3]NLP Cloud,[12]and Databricks[18]for making the model open-source, and its open-source status is often cited as a major advantage when choosing which model to use.[9][15][22]

References[edit]

  1. ^abcdefVassilieva, Natalia (22 June 2022)."Cerebras Makes It Easy to Harness the Predictive Power of GPT-J".Cerebras.Retrieved14 June2023.
  2. ^abcdef"GPT-J 6B".Hugging Face.Retrieved13 June2023.
  3. ^abcd"GPT-J".GPT-3 Demo.Retrieved13 June2023.
  4. ^Biderman, Stella; Black, Sid; Foster, Charles; Gao, Leo; Hallahan, Eric; He, Horace; Wang, Ben; Wang, Phil (20 April 2021)."Rotary Embeddings: A Relative Revolution".EleutherAI.Retrieved14 June2023.In general we have found that across a large suite of setups including regular, linear, and local self-attention, it either matches or surpasses all other methods currently available for injecting positional information into transformers.
  5. ^Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (9 August 2022). "RoFormer: Enhanced Transformer with Rotary Position Embedding".arXiv:2104.09864[cs.CL].
  6. ^"GPT-J".GitHub.Hugging Face.Retrieved23 June2023.
  7. ^Wang, Ben; Komatsuzaki, Aran (May 2021)."Mesh Transformer JAX".GitHub.Retrieved13 June2023.
  8. ^Forefront (14 October 2021)."GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront".Medium.Forefront.Retrieved13 June2023.
  9. ^ab"GPT-J Reviews".Slashdot.Retrieved23 June2023.
  10. ^"Test the EAI models".EleutherAI.2021.Retrieved30 June2023.
  11. ^Timonin, Denis; Hsueh, Bo Yang; Singal, Dhruv; Nguyen, Vinh (3 August 2022)."Deploying GPT-J and T5 with NVIDIA Triton Inference Server".NVIDIA.Retrieved30 June2023.
  12. ^abVettier, Pauline (16 September 2021)."NLP Cloud now supports GPT-J, the open-source GPT-3 alternative"(Press release). Grenoble, France: NLP Cloud.Retrieved30 June2023.
  13. ^Awrahman, Zmnako; Tsitiridou, Anastasia Pachni; Patel, Dhawalkumar; Huilgol, Rahul; Bains, Roop; Stobieniecka, Wioletta (12 June 2023)."Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library".Amazon Web Services.Retrieved30 June2023.
  14. ^Schmid, Philipp (11 January 2022)."Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker".Hugging Face.Retrieved30 June2023.
  15. ^abLiguori, Sofia (9 June 2023)."Fine-Tune GPT-J: A Cost-Effective GPT-4 Alternative for Many NLP Tasks".Graphcore.Retrieved23 June2023.
  16. ^"GPT-J-6B".CoreWeave.23 June 2023.Retrieved30 June2023.
  17. ^Hjelm, Max."CoreWeave Powers a World of Possibility with GPT-J".CoreWeave.Retrieved30 June2023.
  18. ^abConover, Mike; Hayes, Matt; Mathur, Ankit; Meng, Xiangrui; Xie, Jianwei; Wan, Jun; Ghodsi, Ali; Wendell, Patrick; Zaharia, Matei (24 March 2023)."Hello Dolly: Democratizing the magic of ChatGPT with open models".Databricks.Retrieved18 June2023.
  19. ^NovelAI(9 May 2022)."The faces of NovelAI's AI Models: Part 1".Medium.Retrieved1 July2023.
  20. ^NovelAI(3 November 2021)."Data Efficient Language Transfer with GPT-J".Medium.Retrieved1 July2023.
  21. ^NovelAI(29 July 2021)."Introducing Custom AI Modules".Medium.Retrieved1 July2023.
  22. ^Shiraly, Karthik (26 February 2023)."See GPT-J vs. GPT-3 Go Head-to-Head on Popular Language Tasks".Width.ai.Retrieved23 June2023.