GPT-J

GPT-J
	Logo
Developer(s)	EleutherAI
Initial release	June 9, 2021;3 years ago
Type	Large language model; Generative pre-trained transformer; Foundation model;
License	Open-source
Website	6b.eleuther.ai

GPT-JorGPT-J-6Bis an open-sourcelarge language model(LLM) developed byEleutherAIin 2021.^[1]As the name suggests, it is agenerative pre-trained transformermodel designed to produce human-like text that continues from a prompt. The optional "6B" in the name refers to the fact that it has 6 billion parameters.^[2]

Architecture[edit]

GPT-J is aGPT-3-like model with 6 billion parameters.^[3]Like GPT-3, it is anautoregressive,decoder-onlytransformermodel designed to solvenatural language processing(NLP) tasks by predicting how a piece of text will continue.^[1]

Its architecture differs from GPT-3 in three main ways.^[1]

Theattentionandfeedforward neural networkwere computedin parallelduring training, allowing for greater efficiency.
The GPT-J model usesrotary position embeddings,which has been found to be a superior method of injecting positional information into transformers.^[4]^[5]
GPT-J uses dense attention instead of efficient sparse attention, as used in GPT-3.

Beyond that, the model has 28 transformer layers and 16 attention heads. Its vocabulary size is 50257tokens,the same size asGPT-2's.^[2]It has acontext window^{[broken anchor]}size of 2048 tokens.^[6]

It was trained onthe Piledataset,^[2]^[3]using the Mesh Transformer JAX library inJAXto handle the parallelization scheme.^[2]^[7]

Performance[edit]

GPT-J was designed to generate English text from a prompt. It was not designed for translating or generating text in other languages or for performance without firstfine-tuningthe model for a specific task.^[2]Nonetheless, GPT-J performs reasonably well even without fine-tuning, even in translation (at least from English to French).^[8]

When neither is fine-tuned, GPT-J-6B performs almost as well as the 6.7 billion parameter GPT-3 (Curie) on a variety of tasks.^[3]It even outperforms the 175 billion parameter GPT-3 (Davinci) on code generation tasks.^[9]With fine-tuning, it outperforms an untuned GPT-3 (Davinci) on a number of tasks.^[1]

Like all LLMs, it is not programmed to give factually accurate information, only to generate text based on probability.^[2]

Applications[edit]

The untuned GPT-J is available on EleutherAI's website,^[10]NVIDIA's Triton Inference Server,^[11]and NLP Cloud's website.^[12]Cerebras^[1]andAmazon Web Services^[13]^[14]offer services to fine-tune the GPT-J model for company-specific tasks.Graphcoreoffers both fine-tuning and hosting services for the untuned GPT-J, as well as offering to host the fine-tuned models after they are produced.^[15]CoreWeave offers hosting services for both the untuned GPT-J and fine-tuned variants.^[16]^[17]

In March 2023,Databricksreleased Dolly, anApache-licensed,instruction-following model created by fine-tuning GPT-J on theStanford Alpacadataset.^[18]NovelAI's Sigurd^[19]and Genji-JP 6B^[20]models are both fine-tuned versions of GPT-J. They also offer further fine-tuning services to produce and host custom models.^[21]

EleutherAI has received praise from Cerebras,^[1]GPT-3 Demo,^[3]NLP Cloud,^[12]and Databricks^[18]for making the model open-source, and its open-source status is often cited as a major advantage when choosing which model to use.^[9]^[15]^[22]

References[edit]

^^a ^b ^c ^d ^e ^fVassilieva, Natalia (22 June 2022)."Cerebras Makes It Easy to Harness the Predictive Power of GPT-J".Cerebras.Retrieved14 June2023.
^^a ^b ^c ^d ^e ^f"GPT-J 6B".Hugging Face.Retrieved13 June2023.
^^a ^b ^c ^d"GPT-J".GPT-3 Demo.Retrieved13 June2023.
^Biderman, Stella; Black, Sid; Foster, Charles; Gao, Leo; Hallahan, Eric; He, Horace; Wang, Ben; Wang, Phil (20 April 2021)."Rotary Embeddings: A Relative Revolution".EleutherAI.Retrieved14 June2023.In general we have found that across a large suite of setups including regular, linear, and local self-attention, it either matches or surpasses all other methods currently available for injecting positional information into transformers.
^Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (9 August 2022). "RoFormer: Enhanced Transformer with Rotary Position Embedding".arXiv:2104.09864[cs.CL].
^"GPT-J".GitHub.Hugging Face.Retrieved23 June2023.
^Wang, Ben; Komatsuzaki, Aran (May 2021)."Mesh Transformer JAX".GitHub.Retrieved13 June2023.
^Forefront (14 October 2021)."GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront".Medium.Forefront.Retrieved13 June2023.
^^a ^b"GPT-J Reviews".Slashdot.Retrieved23 June2023.
^"Test the EAI models".EleutherAI.2021.Retrieved30 June2023.
^Timonin, Denis; Hsueh, Bo Yang; Singal, Dhruv; Nguyen, Vinh (3 August 2022)."Deploying GPT-J and T5 with NVIDIA Triton Inference Server".NVIDIA.Retrieved30 June2023.
^^a ^bVettier, Pauline (16 September 2021)."NLP Cloud now supports GPT-J, the open-source GPT-3 alternative"(Press release). Grenoble, France: NLP Cloud.Retrieved30 June2023.
^Awrahman, Zmnako; Tsitiridou, Anastasia Pachni; Patel, Dhawalkumar; Huilgol, Rahul; Bains, Roop; Stobieniecka, Wioletta (12 June 2023)."Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library".Amazon Web Services.Retrieved30 June2023.
^Schmid, Philipp (11 January 2022)."Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker".Hugging Face.Retrieved30 June2023.
^^a ^bLiguori, Sofia (9 June 2023)."Fine-Tune GPT-J: A Cost-Effective GPT-4 Alternative for Many NLP Tasks".Graphcore.Retrieved23 June2023.
^"GPT-J-6B".CoreWeave.23 June 2023.Retrieved30 June2023.
^Hjelm, Max."CoreWeave Powers a World of Possibility with GPT-J".CoreWeave.Retrieved30 June2023.
^^a ^bConover, Mike; Hayes, Matt; Mathur, Ankit; Meng, Xiangrui; Xie, Jianwei; Wan, Jun; Ghodsi, Ali; Wendell, Patrick; Zaharia, Matei (24 March 2023)."Hello Dolly: Democratizing the magic of ChatGPT with open models".Databricks.Retrieved18 June2023.
^NovelAI(9 May 2022)."The faces of NovelAI's AI Models: Part 1".Medium.Retrieved1 July2023.
^NovelAI(3 November 2021)."Data Efficient Language Transfer with GPT-J".Medium.Retrieved1 July2023.
^NovelAI(29 July 2021)."Introducing Custom AI Modules".Medium.Retrieved1 July2023.
^Shiraly, Karthik (26 February 2023)."See GPT-J vs. GPT-3 Go Head-to-Head on Popular Language Tasks".Width.ai.Retrieved23 June2023.

[Cerebras-1] Vassilieva, Natalia (22 June 2022)."Cerebras Makes It Easy to Harness the Predictive Power of GPT-J".Cerebras.Retrieved14 June2023.

[Model_Card-2] "GPT-J 6B".Hugging Face.Retrieved13 June2023.

[GPT-3_Demo-3] "GPT-J".GPT-3 Demo.Retrieved13 June2023.

[4] Biderman, Stella; Black, Sid; Foster, Charles; Gao, Leo; Hallahan, Eric; He, Horace; Wang, Ben; Wang, Phil (20 April 2021)."Rotary Embeddings: A Relative Revolution".EleutherAI.Retrieved14 June2023.In general we have found that across a large suite of setups including regular, linear, and local self-attention, it either matches or surpasses all other methods currently available for injecting positional information into transformers.

[5] Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (9 August 2022). "RoFormer: Enhanced Transformer with Rotary Position Embedding".arXiv:2104.09864[cs.CL].

[6] "GPT-J".GitHub.Hugging Face.Retrieved23 June2023.

[7] Wang, Ben; Komatsuzaki, Aran (May 2021)."Mesh Transformer JAX".GitHub.Retrieved13 June2023.

[Medium-8] Forefront (14 October 2021)."GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront".Medium.Forefront.Retrieved13 June2023.

[Slashdot-9] "GPT-J Reviews".Slashdot.Retrieved23 June2023.

[10] "Test the EAI models".EleutherAI.2021.Retrieved30 June2023.

[11] Timonin, Denis; Hsueh, Bo Yang; Singal, Dhruv; Nguyen, Vinh (3 August 2022)."Deploying GPT-J and T5 with NVIDIA Triton Inference Server".NVIDIA.Retrieved30 June2023.

[NLP_Cloud-12] Vettier, Pauline (16 September 2021)."NLP Cloud now supports GPT-J, the open-source GPT-3 alternative"(Press release). Grenoble, France: NLP Cloud.Retrieved30 June2023.

[13] Awrahman, Zmnako; Tsitiridou, Anastasia Pachni; Patel, Dhawalkumar; Huilgol, Rahul; Bains, Roop; Stobieniecka, Wioletta (12 June 2023)."Fine-tune GPT-J using an Amazon SageMaker Hugging Face estimator and the model parallel library".Amazon Web Services.Retrieved30 June2023.

[14] Schmid, Philipp (11 January 2022)."Deploy GPT-J 6B for inference using Hugging Face Transformers and Amazon SageMaker".Hugging Face.Retrieved30 June2023.

[Graphcore-15] Liguori, Sofia (9 June 2023)."Fine-Tune GPT-J: A Cost-Effective GPT-4 Alternative for Many NLP Tasks".Graphcore.Retrieved23 June2023.

[16] "GPT-J-6B".CoreWeave.23 June 2023.Retrieved30 June2023.

[17] Hjelm, Max."CoreWeave Powers a World of Possibility with GPT-J".CoreWeave.Retrieved30 June2023.

[Databricks-18] Conover, Mike; Hayes, Matt; Mathur, Ankit; Meng, Xiangrui; Xie, Jianwei; Wan, Jun; Ghodsi, Ali; Wendell, Patrick; Zaharia, Matei (24 March 2023)."Hello Dolly: Democratizing the magic of ChatGPT with open models".Databricks.Retrieved18 June2023.

[19] NovelAI(9 May 2022)."The faces of NovelAI's AI Models: Part 1".Medium.Retrieved1 July2023.

[20] NovelAI(3 November 2021)."Data Efficient Language Transfer with GPT-J".Medium.Retrieved1 July2023.

[21] NovelAI(29 July 2021)."Introducing Custom AI Modules".Medium.Retrieved1 July2023.

[22] Shiraly, Karthik (26 February 2023)."See GPT-J vs. GPT-3 Go Head-to-Head on Popular Language Tasks".Width.ai.Retrieved23 June2023.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]