GPT-J
This articleneeds additional or more specificcategories.(February 2023) |
![]() Logo | |
Developer(s) | EleutherAI |
---|---|
Initial release | June 9, 2021 |
Type | |
License | Open-source |
Website | 6b![]() |
GPT-JorGPT-J-6Bis an open-sourcelarge language model(LLM) developed byEleutherAIin 2021.[1]As the name suggests, it is agenerative pre-trained transformermodel designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.[2]
Architecture[edit]
GPT-J is aGPT-3-like model with 6 billion parameters.[3]Like GPT-3, it is anautoregressive,decoder-onlytransformermodel designed to solvenatural language processing(NLP) tasks by predicting how a piece of text will continue.[1]
Its architecture differs from GPT-3 in three main ways.[1]
- Theattentionandfeedforward neural networkwere computedin parallelduring training, allowing for greater efficiency.
- The GPT-J model usesrotary position embeddings,which has been found to be a superior method of injecting positional information into transformers.[4][5]
- GPT-J uses dense attention instead of efficient sparse attention, as used in GPT-3.
Beyond that, the model has 28 transformer layers and 16 attention heads. Its vocabulary size is 50257tokens,the same size asGPT-2's.[2]It has acontext window[broken anchor]size of 2048 tokens.[6]
It was trained onthe Piledataset,[2][3]using the Mesh Transformer JAX library inJAXto handle the parallelization scheme.[2][7]
Performance[edit]
GPT-J was designed to generate English text from a prompt. It was not designed for translating or generating text in other languages or for performance without firstfine-tuningthe model for a specific task.[2]Nonetheless, GPT-J performs reasonably well even without fine-tuning, even in translation (at least from English to French).[8]
When neither is fine-tuned, GPT-J-6B performs almost as well as the 6.7 billion parameter GPT-3 (Curie) on a variety of tasks.[3]It even outperforms the 175 billion parameter GPT-3 (Davinci) on code generation tasks.[9]With fine-tuning, it outperforms an untuned GPT-3 (Davinci) on a number of tasks.[1]
Like all LLMs, it is not programmed to give factually accurate information, only to generate text based on probability.[2]
Applications[edit]
The untuned GPT-J is available on EleutherAI's website,[10]NVIDIA's Triton Inference Server,[11]and NLP Cloud's website.[12]Cerebras[1]andAmazon Web Services[13][14]offer services to fine-tune the GPT-J model for company-specific tasks.Graphcoreoffers both fine-tuning and hosting services for the untuned GPT-J, as well as offering to host the fine-tuned models after they are produced.[15]CoreWeave offers hosting services for both the untuned GPT-J and fine-tuned variants.[16][17]
In March 2023,Databricksreleased Dolly, anApache-licensed,instruction-following model created by fine-tuning GPT-J on theStanford Alpacadataset.[18]NovelAI's Sigurd[19]and Genji-JP 6B[20]models are both fine-tuned versions of GPT-J. They also offer further fine-tuning services to produce and host custom models.[21]
EleutherAI has received praise from Cerebras,[1]GPT-3 Demo,[3]NLP Cloud,[12]and Databricks[18]for making the model open-source, and its open-source status is often cited as a major advantage when choosing which model to use.[9][15][22]
References[edit]
- ^abcdefVassilieva, Natalia (22 June 2022)."Cerebras Makes It Easy to Harness the Predictive Power of GPT-J".Cerebras.Retrieved14 June2023.
- ^abcdef"GPT-J 6B".Hugging Face.Retrieved13 June2023.
- ^abcd"GPT-J".GPT-3 Demo.Retrieved13 June2023.
- ^Biderman, Stella; Black, Sid; Foster, Charles; Gao, Leo; Hallahan, Eric; He, Horace; Wang, Ben; Wang, Phil (20 April 2021)."Rotary Embeddings: A Relative Revolution".EleutherAI.Retrieved14 June2023.
In general we have found that across a large suite of setups including regular, linear, and local self-attention, it either matches or surpasses all other methods currently available for injecting positional information into transformers.
- ^Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (9 August 2022). "RoFormer: Enhanced Transformer with Rotary Position Embedding".arXiv:2104.09864[cs.CL].
- ^"GPT-J".GitHub.Hugging Face.Retrieved23 June2023.
- ^Wang, Ben; Komatsuzaki, Aran (May 2021)."Mesh Transformer JAX".GitHub.Retrieved13 June2023.
- ^Forefront (14 October 2021)."GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront".Medium.Forefront.Retrieved13 June2023.
- ^ab"GPT-J Reviews".Slashdot.Retrieved23 June2023.
- ^"Test the EAI models".EleutherAI.2021.Retrieved30 June2023.
- ^Timonin, Denis; Hsueh, Bo Yang; Singal, Dhruv; Nguyen, Vinh (3 August 2022)."Deploying GPT-J and T5 with NVIDIA Triton Inference Server".NVIDIA.Retrieved30 June2023.
- ^abVettier, Pauline (16 September 2021)."NLP Cloud now supports GPT-J, the open-source GPT-3 alternative"(Press release). Grenoble, France: NLP Cloud.Retrieved30 June2023.
- ^Awrahman, Zmnako; Tsitiridou, Anastasia Pachni; Patel, Dhawalkumar; Huilgol, Rahul; Bains, Roop; Stobieniecka, Wioletta (12 June 2023)."Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library".Amazon Web Services.Retrieved30 June2023.
- ^Schmid, Philipp (11 January 2022)."Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker".Hugging Face.Retrieved30 June2023.
- ^abLiguori, Sofia (9 June 2023)."Fine-Tune GPT-J: A Cost-Effective GPT-4 Alternative for Many NLP Tasks".Graphcore.Retrieved23 June2023.
- ^"GPT-J-6B".CoreWeave.23 June 2023.Retrieved30 June2023.
- ^Hjelm, Max."CoreWeave Powers a World of Possibility with GPT-J".CoreWeave.Retrieved30 June2023.
- ^abConover, Mike; Hayes, Matt; Mathur, Ankit; Meng, Xiangrui; Xie, Jianwei; Wan, Jun; Ghodsi, Ali; Wendell, Patrick; Zaharia, Matei (24 March 2023)."Hello Dolly: Democratizing the magic of ChatGPT with open models".Databricks.Retrieved18 June2023.
- ^NovelAI(9 May 2022)."The faces of NovelAI's AI Models: Part 1".Medium.Retrieved1 July2023.
- ^NovelAI(3 November 2021)."Data Efficient Language Transfer with GPT-J".Medium.Retrieved1 July2023.
- ^NovelAI(29 July 2021)."Introducing Custom AI Modules".Medium.Retrieved1 July2023.
- ^Shiraly, Karthik (26 February 2023)."See GPT-J vs. GPT-3 Go Head-to-Head on Popular Language Tasks".Width.ai.Retrieved23 June2023.