大型语言模型列表

大型语言模型（LLM）是一种机器学习模型，专为语言生成等自然语言处理任务而设计。LLM 是具有许多参数的语言模型，并通过对大量文本进行自监督学习进行训练。

本页列出了值得注意的大型语言模型。

对于训练成本一列，1 petaFLOP-day = 1 petaFLOP/sec × 1 天 = 8.64×10¹⁹ FLOP。此外，仅列出最大模型的成本。

更多信息 名称, 发布日期[a] ...

名称	发布日期^[a]	开发者	参数量 (十亿) ^[b]	语料库大小	训练成本 (petaFLOP-day)	许可证^[c]	注解
Attention Is All You Need	000000002017-06-01-00002017年6月	瓦斯瓦尼等人在Google发表	0.213	3600万个英语-法语句子对	0.09^[1]	未发布	在8个NVIDIA P100 GPU上训练了30万步。训练和评估代码根据Apache 2.0许可证发布。^[2]
GPT-1	000000002018-06-01-00002018年6月	OpenAI	0.117 !0.117		1^[3]	MIT^[4]	首个GPT模型，为仅解码器transformer。在8个P600GPU上训练了30天。
BERT	000000002018-10-01-00002018年10月	Google	0.340 !0.340^[5]	3300000000 !33亿单词^[5]	9 !9^[6]	Apache 2.0^[7]	这是一个早期且有影响力的语言模型。^[8]是仅编码器模型，因此并非为提示或生成而构建。^[9] 在 64个TPUv2芯片上训练耗时4天。^[10]
T5（英语：T5 (language model)）	000000002019-10-01-00002019年10月	Google	11 !11^[11]	340亿 tokens^[11]		Apache 2.0^[12]	许多Google项目的基础模型，例如Imagen。^[13]
XLNet（英语：XLNet）	000000002019-06-01-00002019年6月	Google	0.340 !0.340^[14]	3300000000 !330亿单词	330	Apache 2.0^[15]	作为BERT的替代，设计为仅编码器。在512个TPU v3芯片上训练了5.5天。^[16]
GPT-2	000000002019-02-01-00002019年2月	OpenAI	1.5 !1.5^[17]	40 GB^[18] (~10000000000 !100亿 tokens)^[19]	28^[20]	MIT^[21]	在32个TPU v3芯片上训练了一周。^[20]
GPT-3	000000002020-05-01-00002020年5月	OpenAI	175 !175^[22]	300000000000 !3000亿 tokens^[19]	3640^[23]	专有	2022年，GPT-3的一个经过微调的变体，称为 GPT-3.5，通过名为ChatGPT的网络界面向公众开放。^[24]
GPT-Neo	000000002021-03-01-00002021年3月	EleutherAI（英语：EleutherAI）	2.7 !2.7^[25]	825 GiB^[26]		MIT^[27]	The first of a series of free GPT-3 alternatives released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.^[27]
GPT-J（英语：GPT-J）	000000002021-06-01-00002021年6月	EleutherAI（英语：EleutherAI）	6 !6^[28]	825 GiB^[26]	200^[29]	Apache 2.0	GPT-3-style language model
Megatron-Turing NLG	000000002021-10-01-00002021年10月 ^[30]	Microsoft and Nvidia	530 !530^[31]	338600000000 !338.6 billion tokens^[31]	38000^[32]	Restricted web access	Trained for 3 months on over 2000 A100 GPUs on the NVIDIA Selene Supercomputer, for over 3 million GPU-hours.^[32]
Ernie 3.0 Titan	000000002021-12-01-00002021年12月	Baidu	260 !260^[33]	4 Tb		专有	Chinese-language LLM. Ernie Bot is based on this model.
Claude^[34]	000000002021-12-01-00002021年12月	Anthropic	52 !52^[35]	400000000000 !400 billion tokens^[35]		beta	Fine-tuned for desirable behavior in conversations.^[36]
GLaM (Generalist Language Model)	000000002021-12-01-00002021年12月	Google	1200 !1200^[37]	1600000000000 !1.6 trillion tokens^[37]	5600^[37]	专有	Sparse mixture of experts model, making it more expensive to train but cheaper to run inference compared to GPT-3.
Gopher	000000002021-12-01-00002021年12月	DeepMind	280 !280^[38]	300000000000 !300 billion tokens^[39]	5833^[40]	专有	Later developed into the Chinchilla model.
LaMDA (Language Models for Dialog Applications)	000000002022-01-01-00002022年1月	Google	137 !137^[41]	1.56T words,^[41] 168000000000 !168 billion tokens^[39]	4110^[42]	专有	Specialized for response generation in conversations.
GPT-NeoX	000000002022-02-01-00002022年2月	EleutherAI（英语：EleutherAI）	20 !20^[43]	825 GiB^[26]	740^[29]	Apache 2.0	based on the Megatron architecture
Chinchilla	000000002022-03-01-00002022年3月	DeepMind	70 !70^[44]	1400000000000 !1.4 trillion tokens^[44]^[39]	6805^[40]	专有	Reduced-parameter model trained on more data. Used in the Sparrow bot. Often cited for its neural scaling law.
PaLM（路径语言模型）	000000002022-04-01-00002022年4月	Google	540 !540^[45]	768000000000 !768 billion tokens^[44]	29250 !29,250^[40]	专有	Trained for ~60 days on ~6000 TPU v4 chips.^[40] 截至2024年10月 (2024-10)^[update], it is the largest dense Transformer published.
OPT (Open Pretrained Transformer)	000000002022-05-01-00002022年5月	Meta	175 !175^[46]	180000000000 !180 billion tokens^[47]	310^[29]	Non-commercial research^[d]	GPT-3 architecture with some adaptations from Megatron. Uniquely, the training logbook written by the team was published.^[48]
YaLM 100B	000000002022-06-01-00002022年6月	Yandex	100 !100^[49]	1.7TB^[49]		Apache 2.0	English-Russian model based on Microsoft's Megatron-LM.
Minerva	000000002022-06-01-00002022年6月	Google	540 !540^[50]	38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server^[50]		专有	For solving "mathematical and scientific questions using step-by-step reasoning".^[51] Initialized from PaLM models, then finetuned on mathematical and scientific data.
BLOOM	000000002022-07-01-00002022年7月	Large collaboration led by Hugging Face	175 !175^[52]	350000000000 !350 billion tokens (1.6TB)^[53]		Responsible AI	Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
Galactica	000000002022-11-01-00002022年11月	Meta	120 !120	350000000000 !106 billion tokens^[54]	未知	CC-BY-NC-4.0	Trained on scientific text and modalities.
AlexaTM (Teacher Models)	000000002022-11-01-00002022年11月	Amazon	20 !20^[55]	1300000000000 !1.3 trillion^[56]		专有^[57]	bidirectional sequence-to-sequence architecture
LLaMA (Large Language Model Meta AI)	000000002023-02-01-00002023年2月	Meta AI	65 !65^[58]	1400000000000 !1.4 trillion^[58]	6300^[59]	Non-commercial research^[e]	Corpus has 20 languages. "Overtrained" (compared to Chinchilla scaling law) for better performance with fewer parameters.^[58]
GPT-4	000000002023-03-01-00002023年3月	OpenAI	未知^[f] (According to rumors: 1760)^[61]	未知	未知	专有	Available for ChatGPT Plus users and used in several products.
Chameleon	000000002024-06-01-00002024年6月	Meta AI	34 !34^[62]	4400000000000 !4.4 trillion
Cerebras-GPT	000000002023-03-01-00002023年3月	Cerebras（英语：Cerebras）	13 !13^[63]		270^[29]	Apache 2.0	Trained with Chinchilla formula.
Falcon	000000002023-03-01-00002023年3月	Technology Innovation Institute（英语：Technology Innovation Institute）	40 !40^[64]	1 trillion tokens, from RefinedWeb (filtered web text corpus)^[65] plus some "curated corpora".^[66]	2800^[59]	Apache 2.0^[67]
BloombergGPT	000000002023-03-01-00002023年3月	Bloomberg L.P.	50 !50	363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets^[68]		专有	Trained on financial data from proprietary sources, for financial tasks.
PanGu-Σ	000000002023-03-01-00002023年3月	Huawei	1085 !1085	329 billion tokens^[69]		专有
OpenAssistant^[70]	000000002023-03-01-00002023年3月	LAION（英语：LAION）	17 !17	1.5 trillion tokens		Apache 2.0	Trained on crowdsourced open data
Jurassic-2^[71]	000000002023-03-01-00002023年3月	AI21 Labs	未知	未知		专有	Multilingual^[72]
PaLM 2（路径语言模型2）	000000002023-05-01-00002023年5月	Google	340 !340^[73]	3600000000000 !3.6 trillion tokens^[73]	85000 !85,000^[59]	专有	Was used in Bard chatbot.^[74]
Llama 2	000000002023-07-01-00002023年7月	Meta AI	70 !70^[75]	2000000000000 !2 trillion tokens^[75]	21000 !21,000	Llama 2 license	1.7 million A100-hours.^[76]
Claude 2	000000002023-07-01-00002023年7月	Anthropic	未知	未知	未知	专有	Used in Claude chatbot.^[77]
Granite 13b	000000002023-07-01-00002023年7月	IBM	未知	未知	未知	专有	Used in IBM Watsonx.^[78]
Mistral 7B	000000002023-09-01-00002023年9月	Mistral AI	7.3 !7.3^[79]	未知		Apache 2.0
Claude 2.1	000000002023-11-01-00002023年11月	Anthropic	未知	未知	未知	专有	Used in Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages.^[80]
Grok-1^[81]	000000002023-11-01-00002023年11月	xAI	314	未知	未知	Apache 2.0	Used in Grok chatbot. Grok-1 has a context length of 8,192 tokens and has access to X (Twitter).^[82]
Gemini 1.0	000000002023-12-01-00002023年12月	Google DeepMind	未知	未知	未知	专有	Multimodal model, comes in three sizes. Used in the chatbot of the same name.^[83]
Mixtral 8x7B	000000002023-12-01-00002023年12月	Mistral AI	46.7	未知	未知	Apache 2.0	Outperforms GPT-3.5 and Llama 2 70B on many benchmarks.^[84] Mixture of experts model, with 12.9 billion parameters activated per token.^[85]
Mixtral 8x22B	000000002024-04-01-00002024年4月	Mistral AI	141	未知	未知	Apache 2.0	^[86]
DeepSeek LLM	000000002023-11-29-00002023年11月29日	DeepSeek	67	2T tokens^[87]	12000 !12,000	DeepSeek License	Trained on English and Chinese text. 1e24 FLOPs for 67B. 1e23 FLOPs for 7B^[87]
Phi-2	000000002023-12-01-00002023年12月	Microsoft	2.7	1.4T tokens	419^[88]	MIT	Trained on real and synthetic "textbook-quality" data, for 14 days on 96 A100 GPUs.^[88]
Gemini 1.5	000000002024-02-01-00002024年2月	Google DeepMind	未知	未知	未知	专有	Multimodal model, based on a Mixture-of-Experts (MoE) architecture. Context window above 1 million tokens.^[89]
Gemini Ultra	000000002024-02-01-00002024年2月	Google DeepMind	未知	未知	未知
Gemma	000000002024-02-01-00002024年2月	Google DeepMind	7	6T tokens	未知	Gemma Terms of Use^[90]
Claude 3	000000002024-03-01-00002024年3月	Anthropic	未知	未知	未知	专有	Includes three models, Haiku, Sonnet, and Opus.^[91]
Nova （页面存档备份，存于互联网档案馆）	000000002024-10-01-00002024年10月	Rubik's AI （页面存档备份，存于互联网档案馆）	未知	未知	未知	专有	Includes three models, Nova-Instant, Nova-Air, and Nova-Pro.
DBRX	000000002024-03-01-00002024年3月	Databricks（英语：Databricks）与Mosaic ML	136 !136	12T Tokens		Databricks Open Model License	Training cost 10 million USD.
Fugaku-LLM	000000002024-05-01-00002024年5月	富士通与东京工业大学等	13 !13	380B Tokens			The largest model ever trained on CPU-only, on the Fugaku.^[92]
Phi-3	000000002024-04-01-00002024年4月	Microsoft	14^[93]	4.8T Tokens		MIT	Microsoft markets them as "small language model".^[94]
Granite Code Models	000000002024-05-01-00002024年5月	IBM	未知	未知	未知	Apache 2.0
Qwen2	000000002024-06-01-00002024年6月	阿里云	72^[95]	3T Tokens	未知	Qwen License	Multiple sizes, the smallest being 0.5B.
DeepSeek V2	000000002024-06-01-00002024年6月	DeepSeek	236	8.1T tokens	28000 !28,000	DeepSeek License	1.4M hours on H800.^[96]
Nemotron-4	000000002024-06-01-00002024年6月	Nvidia	340 !340	9T Tokens	200000 !200,000	NVIDIA Open Model License	Trained for 1 epoch. Trained on 6144 H100 GPUs between December 2023 and May 2024.^[97]^[98]
Llama 3.1	000000002024-07-01-00002024年7月	Meta AI	405	15.6T tokens	440000 !440,000	Llama 3 license	405B version took 31 million hours on H100-80GB, at 3.8E25 FLOPs.^[99]^[100]
DeepSeek V3	000000002024-12-01-00002024年12月	DeepSeek	671	14.8T tokens	56000 !56,000	DeepSeek License	在H800 GPU上训练278.8万小时。^[101]
Amazon Nova	000000002024-12-01-00002024年12月	Amazon	未知	未知	未知	专有	Includes three models, Nova Micro, Nova Lite, and Nova Pro^[102]
DeepSeek R1	000000002025-01-01-00002025年1月	DeepSeek	671	未知	未知	MIT	无预训练，基于V3-Base强化学习。^[103]^[104]
Qwen2.5	000000002025-01-01-00002025年1月	Alibaba	72	18T tokens	未知	Qwen License	^[105]
MiniMax-Text-01	January 2025	Minimax	456	4.7T tokens^[106]	未知	Minimax Model license	^[107]^[106]
Gemini 2.0	000000002025-02-01-00002025年2月	Google DeepMind	未知	未知	未知	专有	Three models released: Flash, Flash-Lite and Pro^[108]^[109]^[110]
Mistral Large	000000002024-11-01-00002024年11月	Mistral AI	123	未知	未知	Mistral Research License	Upgraded over time. The latest version is 24.11.^[111]
Pixtral	000000002024-11-01-00002024年11月	Mistral AI	123	未知	未知	Mistral Research License	Multimodal. There is also a 12B version which is under Apache 2 license.^[111]
Grok 3	000000002025-02-01-00002025年2月	xAI	未知	未知	未知, estimated 5,800,000.	专有	Training cost claimed "10x the compute of previous state-of-the-art models".^[112]
Llama 4	000000002025-04-05-00002025年4月5日	Meta AI	400 !400	40000000000000 !40T tokens		Llama 4 license	^[113]^[114]
Qwen3	000000002025-04-01-00002025年4月	阿里云	235	36000000000000 !36T tokens	未知	Apache 2.0	Multiple sizes, the smallest being 0.6B.^[115]
GPT-OSS	000000002025-08-05-00002025年8月5日	OpenAI	117	未知	未知	Apache 2.0	有20B和120B两种模型大小发布。^[116]
Claude 4.1	000000002025-08-05-00002025年8月5日	Anthropic	未知	未知	未知	专有	Includes one model, Opus.^[117]
GPT-5	000000002025-08-07-00002025年8月7日	OpenAI	未知	未知	未知	专有	包括三个模型GPT-5，GPT-5 mini，和GPT-5 nano。GPT-5可在ChatGPT及其API中使用，包含思考能力。^[118]^[119]
DeepSeek-V3.1	August 21, 2025	DeepSeek	671	15.639T		MIT	训练大小：14.8T tokens, of DeepSeek V3 plus 839B tokens from the extension phases (630B + 209B)^[120]这是一个可在思考和非思考模式间切换的混合模型。^[121]
Apertus	000000002025-09-02-00002025年9月2日	ETH Zurich and EPF Lausanne	70	15000000000000 !15 trillion^[122]	未知	Apache 2.0	据称这是首个符合欧盟《人工智能法案》的LLM。^[123]
Claude 4.5	000000002025-09-29-00002025年9月29日	Anthropic	未知	未知	未知	专有	^[124]
DeepSeek-V3.2-Exp	000000002025-09-29-00002025年9月29日	DeepSeek	685			MIT	该实验性模型基于v3.1-Terminus构建，使用名为 DeepSeek Sparse Attention (DSA) 的自定义高效机制。^[125]^[126]^[127]
GLM-4.6	000000002025-09-30-00002025年9月30日	智谱	357			Apache 2.0	^[128]^[129]^[130]
Kimi K2 Thinking	000000002025-11-06-00002025年11月6日	Moonshot AI	1000			MIT	^[131]^[132]^[133]
GPT-5.1	000000002025-11-12-00002025年11月12日	OpenAI				专有	^[134]
Grok 4.1	000000002025-11-17-00002025年11月17日	xAI				专有	^[135]
Gemini 3	000000002025-11-18-00002025年11月18日	Google DeepMind				专有	^[136]
Claude Opus 4.5	000000002025-11-25-00002025年11月25日	Anthropic				专有	^[137]
DeepSeek-V3.2	000000002025-12-01-00002025年12月1日	DeepSeek	685			MIT	平衡推理能力与输出长度，适合日常使用场景如问答和通用Agent任务^[138]^[139]^[140]^[141]
DeepSeek-V3.2-Speciale	000000002025-12-01-00002025年12月1日	DeepSeek	685			MIT	将开源模型的推理能力推向极致，探索模型能力边界；但是仅供研究使用，不支持工具调用^[142]^[143]^[144]^[145]

[a]

[b]

[c]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[d]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[e]

[f]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

大型语言模型列表

参见

注释

参考资料

Wikiwand - on