Generative pre-trained transformer

family of language models

Encyclopedia from Wikipedia, the free encyclopedia

The original GPT model

Generative pre-trained transformers (GPT) are a family of language models generally trained on a large corpus of text data to generate human-like text. They are built using several blocks of the transformer architecture. They can be fine-tuned for various natural language processing tasks such as text generation, language translation, and text classification. The "pre-training" in its name refers to the initial training process on a large text corpus where the model learns to predict the next word in a passage, which provides a solid foundation for the model to perform well on downstream tasks with limited amounts of task-specific data.



On June 11, 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training", in which they introduced the Generative Pre-trained Transformer (GPT).[6] At this point, the best-performing neural NLP models primarily employed supervised learning from large amounts of manually labeled data. This reliance on supervised learning limited their use on datasets that were not well-annotated, in addition to making it prohibitively expensive and time-consuming to train extremely large models;[6][7] many languages (such as Swahili or Haitian Creole) are difficult to translate and interpret using such models due to a lack of available text for corpus-building.[7] In contrast, GPT's "semi-supervised" approach involved two stages: an unsupervised generative "pre-training" stage in which a language modeling objective was used to set initial parameters, and a supervised discriminative "fine-tuning" stage in which these parameters were adapted to a target task.[6]

GPT versions
Architecture Parameter count Training data
GPT-1 12-level, 12-headed Transformer decoder (no encoder), followed by linear-softmax. 0.12 billion BookCorpus:[8] 4.5 GB of text, from 7000 unpublished books of various genres.
GPT-2 GPT-1, but with modified normalization 1.5 billion WebText: 40 GB of text, 8 million documents, from 45 million webpages upvoted on Reddit.
GPT-3 GPT-2, but with modification to allow larger scaling. 175 billion 570 GB plaintext, 0.4 trillion tokens. Mostly CommonCrawl, WebText, English Wikipedia, and two books corpora (Books1 and Books2).


  1. ^ Roose, Kevin (5 December 2022). "The Brilliance and Weirdness of ChatGPT". The New York Times. Archived from the original on January 18, 2023. Retrieved 26 December 2022. Like those tools, ChatGPT — which stands for "generative pre-trained transformer" — landed with a splash.
  2. ^ Quinn, Joanne (2020). Dive into deep learning: tools for engagement. Thousand Oaks, California. p. 551. ISBN 9781544361376. Archived from the original on January 10, 2023. Retrieved 10 January 2023.
  3. ^ Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H; et al. (2022). "BioGPT: generative pre-trained transformer for biomedical text generation and mining". Brief Bioinform. 23 (6). doi:10.1093/bib/bbac409. PMID 36156661.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  4. ^ Matthias Bastian (2023-01-29). "BioGPT is a Microsoft language model trained for biomedical tasks". The Decoder.
  5. ^ Ferruz, N., Schmidt, S. & Höcker, B.; et al. (2022). "ProtGPT2 is a deep unsupervised language model for protein design". Nature Communications volume. 13. doi:10.1038/s41467-022-32007-7.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  6. ^ a b c Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.
  7. ^ a b Tsvetkov, Yulia (22 June 2017). "Opportunities and Challenges in Working with Low-Resource Languages" (PDF). Carnegie Mellon University. Archived (PDF) from the original on 31 March 2020. Retrieved 23 January 2021.
  8. ^ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books": 19–27. {{cite journal}}: Cite journal requires |journal= (help)
Original content from Wikipedia, shared with licence Creative Commons By-Sa - Generative pre-trained transformer