支持的模型和数据集

模型

下表介绍了ms-swift接入的模型的相关信息:

  • Model ID: ModelScope模型id

  • HF Model ID: HuggingFace模型id

  • Model Type: 模型类型

  • Default Template: 默认对话模板

  • Requires: 使用该模型的额外依赖

  • Tags: 模型的tags

大语言模型

Model ID

Model Type

Default Template

Requires

Support Megatron

Tags

HF Model ID

Qwen/Qwen-1_8B-Chat

qwen

qwen

-

-

Qwen/Qwen-1_8B-Chat

Qwen/Qwen-7B-Chat

qwen

qwen

-

-

Qwen/Qwen-7B-Chat

Qwen/Qwen-14B-Chat

qwen

qwen

-

-

Qwen/Qwen-14B-Chat

Qwen/Qwen-72B-Chat

qwen

qwen

-

-

Qwen/Qwen-72B-Chat

Qwen/Qwen-1_8B

qwen

qwen

-

-

Qwen/Qwen-1_8B

Qwen/Qwen-7B

qwen

qwen

-

-

Qwen/Qwen-7B

Qwen/Qwen-14B

qwen

qwen

-

-

Qwen/Qwen-14B

Qwen/Qwen-72B

qwen

qwen

-

-

Qwen/Qwen-72B

Qwen/Qwen-1_8B-Chat-Int4

qwen

qwen

-

-

Qwen/Qwen-1_8B-Chat-Int4

Qwen/Qwen-7B-Chat-Int4

qwen

qwen

-

-

Qwen/Qwen-7B-Chat-Int4

Qwen/Qwen-14B-Chat-Int4

qwen

qwen

-

-

Qwen/Qwen-14B-Chat-Int4

Qwen/Qwen-72B-Chat-Int4

qwen

qwen

-

-

Qwen/Qwen-72B-Chat-Int4

Qwen/Qwen-1_8B-Chat-Int8

qwen

qwen

-

-

Qwen/Qwen-1_8B-Chat-Int8

Qwen/Qwen-7B-Chat-Int8

qwen

qwen

-

-

Qwen/Qwen-7B-Chat-Int8

Qwen/Qwen-14B-Chat-Int8

qwen

qwen

-

-

Qwen/Qwen-14B-Chat-Int8

Qwen/Qwen-72B-Chat-Int8

qwen

qwen

-

-

Qwen/Qwen-72B-Chat-Int8

TongyiFinance/Tongyi-Finance-14B-Chat

qwen

qwen

-

financial

jxy/Tongyi-Finance-14B-Chat

TongyiFinance/Tongyi-Finance-14B

qwen

qwen

-

financial

-

TongyiFinance/Tongyi-Finance-14B-Chat-Int4

qwen

qwen

-

financial

jxy/Tongyi-Finance-14B-Chat-Int4

Qwen/Qwen1.5-0.5B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat

Qwen/Qwen1.5-1.8B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat

Qwen/Qwen1.5-4B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat

Qwen/Qwen1.5-7B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat

Qwen/Qwen1.5-14B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat

Qwen/Qwen1.5-32B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-32B-Chat

Qwen/Qwen1.5-72B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat

Qwen/Qwen1.5-110B-Chat

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-110B-Chat

Qwen/Qwen1.5-0.5B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-0.5B

Qwen/Qwen1.5-1.8B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-1.8B

Qwen/Qwen1.5-4B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-4B

Qwen/Qwen1.5-7B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-7B

Qwen/Qwen1.5-14B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-14B

Qwen/Qwen1.5-32B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-32B

Qwen/Qwen1.5-72B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-72B

Qwen/Qwen1.5-110B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-110B

Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4

Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int4

Qwen/Qwen1.5-4B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat-GPTQ-Int4

Qwen/Qwen1.5-7B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat-GPTQ-Int4

Qwen/Qwen1.5-14B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat-GPTQ-Int4

Qwen/Qwen1.5-32B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-32B-Chat-GPTQ-Int4

Qwen/Qwen1.5-72B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat-GPTQ-Int4

Qwen/Qwen1.5-110B-Chat-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-110B-Chat-GPTQ-Int4

Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int8

Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat-GPTQ-Int8

Qwen/Qwen1.5-4B-Chat-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat-GPTQ-Int8

Qwen/Qwen1.5-7B-Chat-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat-GPTQ-Int8

Qwen/Qwen1.5-14B-Chat-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat-GPTQ-Int8

Qwen/Qwen1.5-72B-Chat-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat-GPTQ-Int8

Qwen/Qwen1.5-0.5B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-0.5B-Chat-AWQ

Qwen/Qwen1.5-1.8B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-1.8B-Chat-AWQ

Qwen/Qwen1.5-4B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-4B-Chat-AWQ

Qwen/Qwen1.5-7B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-7B-Chat-AWQ

Qwen/Qwen1.5-14B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-14B-Chat-AWQ

Qwen/Qwen1.5-32B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-32B-Chat-AWQ

Qwen/Qwen1.5-72B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-72B-Chat-AWQ

Qwen/Qwen1.5-110B-Chat-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen1.5-110B-Chat-AWQ

Qwen/CodeQwen1.5-7B

qwen2

qwen

transformers>=4.37

coding

Qwen/CodeQwen1.5-7B

Qwen/CodeQwen1.5-7B-Chat

qwen2

qwen

transformers>=4.37

coding

Qwen/CodeQwen1.5-7B-Chat

Qwen/CodeQwen1.5-7B-Chat-AWQ

qwen2

qwen

transformers>=4.37

coding

Qwen/CodeQwen1.5-7B-Chat-AWQ

Qwen/Qwen2-0.5B-Instruct

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct

Qwen/Qwen2-1.5B-Instruct

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-1.5B-Instruct

Qwen/Qwen2-7B-Instruct

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-7B-Instruct

Qwen/Qwen2-72B-Instruct

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-72B-Instruct

Qwen/Qwen2-0.5B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-0.5B

Qwen/Qwen2-1.5B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-1.5B

Qwen/Qwen2-7B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-7B

Qwen/Qwen2-72B

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-72B

Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct-GPTQ-Int4

Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-1.5B-Instruct-GPTQ-Int4

Qwen/Qwen2-7B-Instruct-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-7B-Instruct-GPTQ-Int4

Qwen/Qwen2-72B-Instruct-GPTQ-Int4

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-72B-Instruct-GPTQ-Int4

Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct-GPTQ-Int8

Qwen/Qwen2-1.5B-Instruct-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-1.5B-Instruct-GPTQ-Int8

Qwen/Qwen2-7B-Instruct-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-7B-Instruct-GPTQ-Int8

Qwen/Qwen2-72B-Instruct-GPTQ-Int8

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-72B-Instruct-GPTQ-Int8

Qwen/Qwen2-0.5B-Instruct-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-0.5B-Instruct-AWQ

Qwen/Qwen2-1.5B-Instruct-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-1.5B-Instruct-AWQ

Qwen/Qwen2-7B-Instruct-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-7B-Instruct-AWQ

Qwen/Qwen2-72B-Instruct-AWQ

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2-72B-Instruct-AWQ

Qwen/Qwen2-Math-1.5B-Instruct

qwen2

qwen

transformers>=4.37

math

Qwen/Qwen2-Math-1.5B-Instruct

Qwen/Qwen2-Math-7B-Instruct

qwen2

qwen

transformers>=4.37

math

Qwen/Qwen2-Math-7B-Instruct

Qwen/Qwen2-Math-72B-Instruct

qwen2

qwen

transformers>=4.37

math

Qwen/Qwen2-Math-72B-Instruct

Qwen/Qwen2-Math-1.5B

qwen2

qwen

transformers>=4.37

math

Qwen/Qwen2-Math-1.5B

Qwen/Qwen2-Math-7B

qwen2

qwen

transformers>=4.37

math

Qwen/Qwen2-Math-7B

Qwen/Qwen2-Math-72B

qwen2

qwen

transformers>=4.37

math

Qwen/Qwen2-Math-72B

Qwen/Qwen2.5-7B-Instruct-1M

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-1M

Qwen/Qwen2.5-14B-Instruct-1M

qwen2

qwen

transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-1M

PowerInfer/SmallThinker-3B-Preview

qwen2

qwen

transformers>=4.37

-

PowerInfer/SmallThinker-3B-Preview

Qwen/Qwen2.5-0.5B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct

Qwen/Qwen2.5-1.5B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct

Qwen/Qwen2.5-3B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct

Qwen/Qwen2.5-7B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct

Qwen/Qwen2.5-14B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct

Qwen/Qwen2.5-32B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct

Qwen/Qwen2.5-72B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct

Qwen/Qwen2.5-0.5B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-0.5B

Qwen/Qwen2.5-1.5B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-1.5B

Qwen/Qwen2.5-3B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-3B

Qwen/Qwen2.5-7B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-7B

Qwen/Qwen2.5-14B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-14B

Qwen/Qwen2.5-32B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-32B

Qwen/Qwen2.5-72B

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-72B

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-0.5B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-0.5B-Instruct-AWQ

Qwen/Qwen2.5-1.5B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-1.5B-Instruct-AWQ

Qwen/Qwen2.5-3B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-3B-Instruct-AWQ

Qwen/Qwen2.5-7B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-7B-Instruct-AWQ

Qwen/Qwen2.5-14B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-14B-Instruct-AWQ

Qwen/Qwen2.5-32B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-32B-Instruct-AWQ

Qwen/Qwen2.5-72B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

-

Qwen/Qwen2.5-72B-Instruct-AWQ

Qwen/Qwen2.5-Coder-0.5B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-0.5B-Instruct

Qwen/Qwen2.5-Coder-1.5B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-1.5B-Instruct

Qwen/Qwen2.5-Coder-3B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-3B-Instruct

Qwen/Qwen2.5-Coder-7B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-7B-Instruct

Qwen/Qwen2.5-Coder-14B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-14B-Instruct

Qwen/Qwen2.5-Coder-32B-Instruct

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-32B-Instruct

Qwen/Qwen2.5-Coder-0.5B

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-0.5B

Qwen/Qwen2.5-Coder-1.5B

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-1.5B

Qwen/Qwen2.5-Coder-3B

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-3B

Qwen/Qwen2.5-Coder-7B

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-7B

Qwen/Qwen2.5-Coder-14B

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-14B

Qwen/Qwen2.5-Coder-32B

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-32B

Qwen/Qwen2.5-Coder-0.5B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-0.5B-Instruct-AWQ

Qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-1.5B-Instruct-AWQ

Qwen/Qwen2.5-Coder-3B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-3B-Instruct-AWQ

Qwen/Qwen2.5-Coder-7B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-7B-Instruct-AWQ

Qwen/Qwen2.5-Coder-14B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-14B-Instruct-AWQ

Qwen/Qwen2.5-Coder-32B-Instruct-AWQ

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-32B-Instruct-AWQ

Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-0.5B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-1.5B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-3B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-7B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-14B-Instruct-GPTQ-Int8

Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int4

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int4

Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int8

qwen2_5

qwen2_5

transformers>=4.37

coding

Qwen/Qwen2.5-Coder-32B-Instruct-GPTQ-Int8

moonshotai/Kimi-Dev-72B

qwen2_5

qwen2_5

transformers>=4.37

-

moonshotai/Kimi-Dev-72B

Qwen/Qwen2.5-Math-1.5B-Instruct

qwen2_5_math

qwen2_5_math

transformers>=4.37

math

Qwen/Qwen2.5-Math-1.5B-Instruct

Qwen/Qwen2.5-Math-7B-Instruct

qwen2_5_math

qwen2_5_math

transformers>=4.37

math

Qwen/Qwen2.5-Math-7B-Instruct

Qwen/Qwen2.5-Math-72B-Instruct

qwen2_5_math

qwen2_5_math

transformers>=4.37

math

Qwen/Qwen2.5-Math-72B-Instruct

Qwen/Qwen2.5-Math-1.5B

qwen2_5_math

qwen2_5_math

transformers>=4.37

math

Qwen/Qwen2.5-Math-1.5B

Qwen/Qwen2.5-Math-7B

qwen2_5_math

qwen2_5_math

transformers>=4.37

math

Qwen/Qwen2.5-Math-7B

Qwen/Qwen2.5-Math-72B

qwen2_5_math

qwen2_5_math

transformers>=4.37

math

Qwen/Qwen2.5-Math-72B

Qwen/Qwen1.5-MoE-A2.7B-Chat

qwen2_moe

qwen

transformers>=4.40

-

Qwen/Qwen1.5-MoE-A2.7B-Chat

Qwen/Qwen1.5-MoE-A2.7B

qwen2_moe

qwen

transformers>=4.40

-

Qwen/Qwen1.5-MoE-A2.7B

Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4

qwen2_moe

qwen

transformers>=4.40

-

Qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4

Qwen/Qwen2-57B-A14B-Instruct

qwen2_moe

qwen

transformers>=4.40

-

Qwen/Qwen2-57B-A14B-Instruct

Qwen/Qwen2-57B-A14B

qwen2_moe

qwen

transformers>=4.40

-

Qwen/Qwen2-57B-A14B

Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4

qwen2_moe

qwen

transformers>=4.40

-

Qwen/Qwen2-57B-A14B-Instruct-GPTQ-Int4

Qwen/QwQ-32B-Preview

qwq_preview

qwq_preview

transformers>=4.37

-

Qwen/QwQ-32B-Preview

Qwen/QwQ-32B

qwq

qwq

transformers>=4.37

-

Qwen/QwQ-32B

Qwen/QwQ-32B-AWQ

qwq

qwq

transformers>=4.37

-

Qwen/QwQ-32B-AWQ

Qwen/Qwen3-0.6B-Base

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-0.6B-Base

Qwen/Qwen3-1.7B-Base

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-1.7B-Base

Qwen/Qwen3-4B-Base

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-4B-Base

Qwen/Qwen3-8B-Base

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-8B-Base

Qwen/Qwen3-14B-Base

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-14B-Base

Qwen/Qwen3-0.6B

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-0.6B

Qwen/Qwen3-1.7B

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-1.7B

Qwen/Qwen3-4B

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-4B

Qwen/Qwen3-8B

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-8B

Qwen/Qwen3-14B

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-14B

Qwen/Qwen3-32B

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-32B

Qwen/Qwen3-0.6B-FP8

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-0.6B-FP8

Qwen/Qwen3-1.7B-FP8

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-1.7B-FP8

Qwen/Qwen3-4B-FP8

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-4B-FP8

Qwen/Qwen3-8B-FP8

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-8B-FP8

Qwen/Qwen3-14B-FP8

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-14B-FP8

Qwen/Qwen3-32B-FP8

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-32B-FP8

Qwen/Qwen3-4B-AWQ

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-4B-AWQ

Qwen/Qwen3-8B-AWQ

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-8B-AWQ

Qwen/Qwen3-14B-AWQ

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-14B-AWQ

Qwen/Qwen3-32B-AWQ

qwen3

qwen3

transformers>=4.51

-

Qwen/Qwen3-32B-AWQ

swift/Qwen3-32B-AWQ

qwen3

qwen3

transformers>=4.51

-

-

Qwen/Qwen3Guard-Gen-0.6B

qwen3_guard

qwen3_guard

transformers>=4.51

-

Qwen/Qwen3Guard-Gen-0.6B

Qwen/Qwen3Guard-Gen-4B

qwen3_guard

qwen3_guard

transformers>=4.51

-

Qwen/Qwen3Guard-Gen-4B

Qwen/Qwen3Guard-Gen-8B

qwen3_guard

qwen3_guard

transformers>=4.51

-

Qwen/Qwen3Guard-Gen-8B

Qwen/Qwen3-4B-Thinking-2507

qwen3_thinking

qwen3_thinking

transformers>=4.51

-

Qwen/Qwen3-4B-Thinking-2507

Qwen/Qwen3-4B-Thinking-2507-FP8

qwen3_thinking

qwen3_thinking

transformers>=4.51

-

Qwen/Qwen3-4B-Thinking-2507-FP8

Qwen/Qwen3-30B-A3B-Instruct-2507

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

Qwen/Qwen3-30B-A3B-Instruct-2507

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

Qwen/Qwen3-30B-A3B-Instruct-2507-FP8

Qwen/Qwen3-235B-A22B-Instruct-2507

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

Qwen/Qwen3-235B-A22B-Instruct-2507

Qwen/Qwen3-235B-A22B-Instruct-2507-FP8

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

Qwen/Qwen3-235B-A22B-Instruct-2507-FP8

swift/Qwen3-235B-A22B-Instruct-2507-AWQ

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

-

Qwen/Qwen3-4B-Instruct-2507

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

Qwen/Qwen3-4B-Instruct-2507

Qwen/Qwen3-4B-Instruct-2507-FP8

qwen3_nothinking

qwen3_nothinking

transformers>=4.51

-

Qwen/Qwen3-4B-Instruct-2507-FP8

Qwen/Qwen3-Coder-30B-A3B-Instruct

qwen3_coder

qwen3_coder

transformers>=4.51

coding

Qwen/Qwen3-Coder-30B-A3B-Instruct

Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8

qwen3_coder

qwen3_coder

transformers>=4.51

coding

Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8

Qwen/Qwen3-Coder-480B-A35B-Instruct

qwen3_coder

qwen3_coder

transformers>=4.51

coding

Qwen/Qwen3-Coder-480B-A35B-Instruct

Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8

qwen3_coder

qwen3_coder

transformers>=4.51

coding

Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8

swift/Qwen3-Coder-480B-A35B-Instruct-AWQ

qwen3_coder

qwen3_coder

transformers>=4.51

coding

-

Qwen/Qwen3-30B-A3B-Base

qwen3_moe

qwen3

transformers>=4.51

-

Qwen/Qwen3-30B-A3B-Base

Qwen/Qwen3-30B-A3B

qwen3_moe

qwen3

transformers>=4.51

-

Qwen/Qwen3-30B-A3B

Qwen/Qwen3-235B-A22B

qwen3_moe

qwen3

transformers>=4.51

-

Qwen/Qwen3-235B-A22B

Qwen/Qwen3-30B-A3B-FP8

qwen3_moe

qwen3

transformers>=4.51

-

Qwen/Qwen3-30B-A3B-FP8

Qwen/Qwen3-235B-A22B-FP8

qwen3_moe

qwen3

transformers>=4.51

-

Qwen/Qwen3-235B-A22B-FP8

swift/Qwen3-30B-A3B-AWQ

qwen3_moe

qwen3

transformers>=4.51

-

cognitivecomputations/Qwen3-30B-A3B-AWQ

swift/Qwen3-235B-A22B-AWQ

qwen3_moe

qwen3

transformers>=4.51

-

cognitivecomputations/Qwen3-235B-A22B-AWQ

iic/Tongyi-DeepResearch-30B-A3B

qwen3_moe

qwen3

transformers>=4.51

-

Alibaba-NLP/Tongyi-DeepResearch-30B-A3B

Qwen/Qwen3-30B-A3B-Thinking-2507

qwen3_moe_thinking

qwen3_thinking

transformers>=4.51

-

Qwen/Qwen3-30B-A3B-Thinking-2507

Qwen/Qwen3-30B-A3B-Thinking-2507-FP8

qwen3_moe_thinking

qwen3_thinking

transformers>=4.51

-

Qwen/Qwen3-30B-A3B-Thinking-2507-FP8

Qwen/Qwen3-235B-A22B-Thinking-2507

qwen3_moe_thinking

qwen3_thinking

transformers>=4.51

-

Qwen/Qwen3-235B-A22B-Thinking-2507

Qwen/Qwen3-235B-A22B-Thinking-2507-FP8

qwen3_moe_thinking

qwen3_thinking

transformers>=4.51

-

Qwen/Qwen3-235B-A22B-Thinking-2507-FP8

swift/Qwen3-235B-A22B-Thinking-2507-AWQ

qwen3_moe_thinking

qwen3_thinking

transformers>=4.51

-

-

Qwen/Qwen3-Next-80B-A3B-Instruct

qwen3_next

qwen3_nothinking

transformers>=4.57

-

-

Qwen/Qwen3-Next-80B-A3B-Instruct-FP8

qwen3_next

qwen3_nothinking

transformers>=4.57

-

-

Qwen/Qwen3-Next-80B-A3B-Thinking

qwen3_next_thinking

qwen3_thinking

transformers>=4.57

-

-

Qwen/Qwen3-Next-80B-A3B-Thinking-FP8

qwen3_next_thinking

qwen3_thinking

transformers>=4.57

-

-

Qwen/Qwen3-Embedding-0.6B

qwen3_emb

qwen3_emb

-

-

Qwen/Qwen3-Embedding-0.6B

Qwen/Qwen3-Embedding-4B

qwen3_emb

qwen3_emb

-

-

Qwen/Qwen3-Embedding-4B

Qwen/Qwen3-Embedding-8B

qwen3_emb

qwen3_emb

-

-

Qwen/Qwen3-Embedding-8B

Qwen/Qwen3-Reranker-0.6B

qwen3_reranker

qwen3_reranker

-

-

Qwen/Qwen3-Reranker-0.6B

Qwen/Qwen3-Reranker-4B

qwen3_reranker

qwen3_reranker

-

-

Qwen/Qwen3-Reranker-4B

Qwen/Qwen3-Reranker-8B

qwen3_reranker

qwen3_reranker

-

-

Qwen/Qwen3-Reranker-8B

iic/gte_Qwen2-1.5B-instruct

qwen2_gte

dummy

-

-

Alibaba-NLP/gte-Qwen2-1.5B-instruct

iic/gte_Qwen2-7B-instruct

qwen2_gte

dummy

-

-

Alibaba-NLP/gte-Qwen2-7B-instruct

BAAI/bge-reranker-base

bge_reranker

bge_reranker

-

-

BAAI/bge-reranker-base

BAAI/bge-reranker-v2-m3

bge_reranker

bge_reranker

-

-

BAAI/bge-reranker-v2-m3

BAAI/bge-reranker-large

bge_reranker

bge_reranker

-

-

BAAI/bge-reranker-large

codefuse-ai/CodeFuse-QWen-14B

codefuse_qwen

codefuse

-

coding

codefuse-ai/CodeFuse-QWen-14B

iic/ModelScope-Agent-7B

modelscope_agent

modelscope_agent

-

-

-

iic/ModelScope-Agent-14B

modelscope_agent

modelscope_agent

-

-

-

AIDC-AI/Marco-o1

marco_o1

marco_o1

transformers>=4.37

-

AIDC-AI/Marco-o1

modelscope/Llama-2-7b-ms

llama

llama

-

-

meta-llama/Llama-2-7b-hf

modelscope/Llama-2-13b-ms

llama

llama

-

-

meta-llama/Llama-2-13b-hf

modelscope/Llama-2-70b-ms

llama

llama

-

-

meta-llama/Llama-2-70b-hf

modelscope/Llama-2-7b-chat-ms

llama

llama

-

-

meta-llama/Llama-2-7b-chat-hf

modelscope/Llama-2-13b-chat-ms

llama

llama

-

-

meta-llama/Llama-2-13b-chat-hf

modelscope/Llama-2-70b-chat-ms

llama

llama

-

-

meta-llama/Llama-2-70b-chat-hf

AI-ModelScope/chinese-llama-2-1.3b

llama

llama

-

-

hfl/chinese-llama-2-1.3b

AI-ModelScope/chinese-llama-2-7b

llama

llama

-

-

hfl/chinese-llama-2-7b

AI-ModelScope/chinese-llama-2-7b-16k

llama

llama

-

-

hfl/chinese-llama-2-7b-16k

AI-ModelScope/chinese-llama-2-7b-64k

llama

llama

-

-

hfl/chinese-llama-2-7b-64k

AI-ModelScope/chinese-llama-2-13b

llama

llama

-

-

hfl/chinese-llama-2-13b

AI-ModelScope/chinese-llama-2-13b-16k

llama

llama

-

-

hfl/chinese-llama-2-13b-16k

AI-ModelScope/chinese-alpaca-2-1.3b

llama

llama

-

-

hfl/chinese-alpaca-2-1.3b

AI-ModelScope/chinese-alpaca-2-7b

llama

llama

-

-

hfl/chinese-alpaca-2-7b

AI-ModelScope/chinese-alpaca-2-7b-16k

llama

llama

-

-

hfl/chinese-alpaca-2-7b-16k

AI-ModelScope/chinese-alpaca-2-7b-64k

llama

llama

-

-

hfl/chinese-alpaca-2-7b-64k

AI-ModelScope/chinese-alpaca-2-13b

llama

llama

-

-

hfl/chinese-alpaca-2-13b

AI-ModelScope/chinese-alpaca-2-13b-16k

llama

llama

-

-

hfl/chinese-alpaca-2-13b-16k

AI-ModelScope/Llama-2-7b-AQLM-2Bit-1x16-hf

llama

llama

transformers>=4.38, aqlm, torch>=2.2.0

-

ISTA-DASLab/Llama-2-7b-AQLM-2Bit-1x16-hf

LLM-Research/Meta-Llama-3-8B-Instruct

llama3

llama3

-

-

meta-llama/Meta-Llama-3-8B-Instruct

LLM-Research/Meta-Llama-3-70B-Instruct

llama3

llama3

-

-

meta-llama/Meta-Llama-3-70B-Instruct

LLM-Research/Meta-Llama-3-8B

llama3

llama3

-

-

meta-llama/Meta-Llama-3-8B

LLM-Research/Meta-Llama-3-70B

llama3

llama3

-

-

meta-llama/Meta-Llama-3-70B

swift/Meta-Llama-3-8B-Instruct-GPTQ-Int4

llama3

llama3

-

-

study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int4

swift/Meta-Llama-3-8B-Instruct-GPTQ-Int8

llama3

llama3

-

-

study-hjt/Meta-Llama-3-8B-Instruct-GPTQ-Int8

swift/Meta-Llama-3-8B-Instruct-AWQ

llama3

llama3

-

-

study-hjt/Meta-Llama-3-8B-Instruct-AWQ

swift/Meta-Llama-3-70B-Instruct-GPTQ-Int4

llama3

llama3

-

-

study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int4

swift/Meta-Llama-3-70B-Instruct-GPTQ-Int8

llama3

llama3

-

-

study-hjt/Meta-Llama-3-70B-Instruct-GPTQ-Int8

swift/Meta-Llama-3-70B-Instruct-AWQ

llama3

llama3

-

-

study-hjt/Meta-Llama-3-70B-Instruct-AWQ

ChineseAlpacaGroup/llama-3-chinese-8b-instruct

llama3

llama3

-

-

hfl/llama-3-chinese-8b-instruct

ChineseAlpacaGroup/llama-3-chinese-8b

llama3

llama3

-

-

hfl/llama-3-chinese-8b

LLM-Research/Meta-Llama-3.1-8B-Instruct

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-8B-Instruct

LLM-Research/Meta-Llama-3.1-70B-Instruct

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-70B-Instruct

LLM-Research/Meta-Llama-3.1-405B-Instruct

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-405B-Instruct

LLM-Research/Meta-Llama-3.1-8B

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-8B

LLM-Research/Meta-Llama-3.1-70B

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-70B

LLM-Research/Meta-Llama-3.1-405B

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-405B

LLM-Research/Meta-Llama-3.1-70B-Instruct-FP8

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-70B-Instruct-FP8

LLM-Research/Meta-Llama-3.1-405B-Instruct-FP8

llama3_1

llama3_2

transformers>=4.43

-

meta-llama/Meta-Llama-3.1-405B-Instruct-FP8

LLM-Research/Meta-Llama-3.1-8B-Instruct-BNB-NF4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-8B-Instruct-BNB-NF4

LLM-Research/Meta-Llama-3.1-70B-Instruct-bnb-4bit

llama3_1

llama3_2

transformers>=4.43

-

unsloth/Meta-Llama-3.1-70B-Instruct-bnb-4bit

LLM-Research/Meta-Llama-3.1-405B-Instruct-BNB-NF4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-405B-Instruct-BNB-NF4

LLM-Research/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4

LLM-Research/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-70B-Instruct-GPTQ-INT4

LLM-Research/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-405B-Instruct-GPTQ-INT4

LLM-Research/Meta-Llama-3.1-8B-Instruct-AWQ-INT4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4

LLM-Research/Meta-Llama-3.1-70B-Instruct-AWQ-INT4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4

LLM-Research/Meta-Llama-3.1-405B-Instruct-AWQ-INT4

llama3_1

llama3_2

transformers>=4.43

-

hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4

AI-ModelScope/Llama-3.1-Nemotron-70B-Instruct-HF

llama3_1

llama3_2

transformers>=4.43

-

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

LLM-Research/Llama-3.2-1B

llama3_2

llama3_2

transformers>=4.43

-

meta-llama/Llama-3.2-1B

LLM-Research/Llama-3.2-3B

llama3_2

llama3_2

transformers>=4.43

-

meta-llama/Llama-3.2-3B

LLM-Research/Llama-3.2-1B-Instruct

llama3_2

llama3_2

transformers>=4.43

-

meta-llama/Llama-3.2-1B-Instruct

LLM-Research/Llama-3.2-3B-Instruct

llama3_2

llama3_2

transformers>=4.43

-

meta-llama/Llama-3.2-3B-Instruct

LLM-Research/Llama-3.3-70B-Instruct

llama3_2

llama3_2

transformers>=4.43

-

meta-llama/Llama-3.3-70B-Instruct

unsloth/Llama-3.3-70B-Instruct-bnb-4bit

llama3_2

llama3_2

transformers>=4.43

-

unsloth/Llama-3.3-70B-Instruct-bnb-4bit

LLM-Research/Reflection-Llama-3.1-70B

reflection

reflection

transformers>=4.43

-

mattshumer/Reflection-Llama-3.1-70B

InfiniAI/Megrez-3b-Instruct

megrez

megrez

-

-

Infinigence/Megrez-3B-Instruct

01ai/Yi-6B

yi

chatml

-

-

01-ai/Yi-6B

01ai/Yi-6B-200K

yi

chatml

-

-

01-ai/Yi-6B-200K

01ai/Yi-6B-Chat

yi

chatml

-

-

01-ai/Yi-6B-Chat

01ai/Yi-6B-Chat-4bits

yi

chatml

-

-

01-ai/Yi-6B-Chat-4bits

01ai/Yi-6B-Chat-8bits

yi

chatml

-

-

01-ai/Yi-6B-Chat-8bits

01ai/Yi-9B

yi

chatml

-

-

01-ai/Yi-9B

01ai/Yi-9B-200K

yi

chatml

-

-

01-ai/Yi-9B-200K

01ai/Yi-34B

yi

chatml

-

-

01-ai/Yi-34B

01ai/Yi-34B-200K

yi

chatml

-

-

01-ai/Yi-34B-200K

01ai/Yi-34B-Chat

yi

chatml

-

-

01-ai/Yi-34B-Chat

01ai/Yi-34B-Chat-4bits

yi

chatml

-

-

01-ai/Yi-34B-Chat-4bits

01ai/Yi-34B-Chat-8bits

yi

chatml

-

-

01-ai/Yi-34B-Chat-8bits

01ai/Yi-1.5-6B

yi

chatml

-

-

01-ai/Yi-1.5-6B

01ai/Yi-1.5-6B-Chat

yi

chatml

-

-

01-ai/Yi-1.5-6B-Chat

01ai/Yi-1.5-9B

yi

chatml

-

-

01-ai/Yi-1.5-9B

01ai/Yi-1.5-9B-Chat

yi

chatml

-

-

01-ai/Yi-1.5-9B-Chat

01ai/Yi-1.5-9B-Chat-16K

yi

chatml

-

-

01-ai/Yi-1.5-9B-Chat-16K

01ai/Yi-1.5-34B

yi

chatml

-

-

01-ai/Yi-1.5-34B

01ai/Yi-1.5-34B-Chat

yi

chatml

-

-

01-ai/Yi-1.5-34B-Chat

01ai/Yi-1.5-34B-Chat-16K

yi

chatml

-

-

01-ai/Yi-1.5-34B-Chat-16K

AI-ModelScope/Yi-1.5-6B-Chat-GPTQ

yi

chatml

-

-

modelscope/Yi-1.5-6B-Chat-GPTQ

AI-ModelScope/Yi-1.5-6B-Chat-AWQ

yi

chatml

-

-

modelscope/Yi-1.5-6B-Chat-AWQ

AI-ModelScope/Yi-1.5-9B-Chat-GPTQ

yi

chatml

-

-

modelscope/Yi-1.5-9B-Chat-GPTQ

AI-ModelScope/Yi-1.5-9B-Chat-AWQ

yi

chatml

-

-

modelscope/Yi-1.5-9B-Chat-AWQ

AI-ModelScope/Yi-1.5-34B-Chat-GPTQ

yi

chatml

-

-

modelscope/Yi-1.5-34B-Chat-GPTQ

AI-ModelScope/Yi-1.5-34B-Chat-AWQ

yi

chatml

-

-

modelscope/Yi-1.5-34B-Chat-AWQ

01ai/Yi-Coder-1.5B

yi_coder

yi_coder

-

coding

01-ai/Yi-Coder-1.5B

01ai/Yi-Coder-9B

yi_coder

yi_coder

-

coding

01-ai/Yi-Coder-9B

01ai/Yi-Coder-1.5B-Chat

yi_coder

yi_coder

-

coding

01-ai/Yi-Coder-1.5B-Chat

01ai/Yi-Coder-9B-Chat

yi_coder

yi_coder

-

coding

01-ai/Yi-Coder-9B-Chat

SUSTC/SUS-Chat-34B

sus

sus

-

-

SUSTech/SUS-Chat-34B

openai-mirror/gpt-oss-20b

gpt_oss

gpt_oss

transformers>=4.55

-

openai/gpt-oss-20b

openai-mirror/gpt-oss-120b

gpt_oss

gpt_oss

transformers>=4.55

-

openai/gpt-oss-120b

ByteDance-Seed/Seed-OSS-36B-Instruct

seed_oss

seed_oss

transformers>=4.56

-

ByteDance-Seed/Seed-OSS-36B-Instruct

ByteDance-Seed/Seed-OSS-36B-Base

seed_oss

seed_oss

transformers>=4.56

-

ByteDance-Seed/Seed-OSS-36B-Base

ByteDance-Seed/Seed-OSS-36B-Base-woSyn

seed_oss

seed_oss

transformers>=4.56

-

ByteDance-Seed/Seed-OSS-36B-Base-woSyn

codefuse-ai/CodeFuse-CodeLlama-34B

codefuse_codellama

codefuse_codellama

-

coding

codefuse-ai/CodeFuse-CodeLlama-34B

langboat/Mengzi3-13B-Base

mengzi3

mengzi

-

-

Langboat/Mengzi3-13B-Base

Fengshenbang/Ziya2-13B-Base

ziya

ziya

-

-

IDEA-CCNL/Ziya2-13B-Base

Fengshenbang/Ziya2-13B-Chat

ziya

ziya

-

-

IDEA-CCNL/Ziya2-13B-Chat

AI-ModelScope/NuminaMath-7B-TIR

numina

numina

-

math

AI-MO/NuminaMath-7B-TIR

FlagAlpha/Atom-7B

atom

atom

-

-

FlagAlpha/Atom-7B

FlagAlpha/Atom-7B-Chat

atom

atom

-

-

FlagAlpha/Atom-7B-Chat

ZhipuAI/chatglm2-6b

chatglm2

chatglm2

transformers<4.42

-

zai-org/chatglm2-6b

ZhipuAI/chatglm2-6b-32k

chatglm2

chatglm2

transformers<4.42

-

zai-org/chatglm2-6b-32k

ZhipuAI/codegeex2-6b

chatglm2

chatglm2

transformers<4.34

coding

zai-org/codegeex2-6b

ZhipuAI/chatglm3-6b

chatglm3

glm4

transformers<4.42

-

zai-org/chatglm3-6b

ZhipuAI/chatglm3-6b-base

chatglm3

glm4

transformers<4.42

-

zai-org/chatglm3-6b-base

ZhipuAI/chatglm3-6b-32k

chatglm3

glm4

transformers<4.42

-

zai-org/chatglm3-6b-32k

ZhipuAI/chatglm3-6b-128k

chatglm3

glm4

transformers<4.42

-

zai-org/chatglm3-6b-128k

ZhipuAI/glm-4-9b-chat

glm4

glm4

transformers>=4.42

-

zai-org/glm-4-9b-chat

ZhipuAI/glm-4-9b

glm4

glm4

transformers>=4.42

-

zai-org/glm-4-9b

ZhipuAI/glm-4-9b-chat-1m

glm4

glm4

transformers>=4.42

-

zai-org/glm-4-9b-chat-1m

ZhipuAI/LongWriter-glm4-9b

glm4

glm4

transformers>=4.42

-

zai-org/LongWriter-glm4-9b

ZhipuAI/GLM-4-9B-0414

glm4_0414

glm4_0414

transformers>=4.51

-

zai-org/GLM-4-9B-0414

ZhipuAI/GLM-4-32B-0414

glm4_0414

glm4_0414

transformers>=4.51

-

zai-org/GLM-4-32B-0414

ZhipuAI/GLM-4-32B-Base-0414

glm4_0414

glm4_0414

transformers>=4.51

-

zai-org/GLM-4-32B-Base-0414

ZhipuAI/GLM-Z1-9B-0414

glm4_0414

glm4_0414

transformers>=4.51

-

zai-org/GLM-Z1-9B-0414

ZhipuAI/GLM-Z1-32B-0414

glm4_0414

glm4_0414

transformers>=4.51

-

zai-org/GLM-Z1-32B-0414

ZhipuAI/GLM-4.5-Air-Base

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.5-Air-Base

ZhipuAI/GLM-4.5-Air

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.5-Air

ZhipuAI/GLM-4.5-Air-FP8

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.5-Air-FP8

ZhipuAI/GLM-4.5-Base

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.5-Base

ZhipuAI/GLM-4.5

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.5

ZhipuAI/GLM-4.5-FP8

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.5-FP8

ZhipuAI/GLM-4.6

glm4_5

glm4_5

transformers>=4.54

-

zai-org/GLM-4.6

ZhipuAI/GLM-Z1-Rumination-32B-0414

glm4_z1_rumination

glm4_z1_rumination

transformers>4.51

-

zai-org/GLM-Z1-Rumination-32B-0414

ZhipuAI/glm-edge-1.5b-chat

glm_edge

glm4

transformers>=4.46

-

zai-org/glm-edge-1.5b-chat

ZhipuAI/glm-edge-4b-chat

glm_edge

glm4

transformers>=4.46

-

zai-org/glm-edge-4b-chat

codefuse-ai/CodeFuse-CodeGeeX2-6B

codefuse_codegeex2

codefuse

transformers<4.34

coding

codefuse-ai/CodeFuse-CodeGeeX2-6B

ZhipuAI/codegeex4-all-9b

codegeex4

codegeex4

transformers<4.42

coding

zai-org/codegeex4-all-9b

ZhipuAI/LongWriter-llama3.1-8b

longwriter_llama3_1

longwriter_llama

transformers>=4.43

-

zai-org/LongWriter-llama3.1-8b

Shanghai_AI_Laboratory/internlm-chat-7b

internlm

internlm

-

-

internlm/internlm-chat-7b

Shanghai_AI_Laboratory/internlm-7b

internlm

internlm

-

-

internlm/internlm-7b

Shanghai_AI_Laboratory/internlm-chat-7b-8k

internlm

internlm

-

-

-

Shanghai_AI_Laboratory/internlm-20b

internlm

internlm

-

-

internlm/internlm-20b

Shanghai_AI_Laboratory/internlm-chat-20b

internlm

internlm

-

-

internlm/internlm-chat-20b

Shanghai_AI_Laboratory/internlm2-chat-1_8b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-chat-1_8b

Shanghai_AI_Laboratory/internlm2-1_8b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-1_8b

Shanghai_AI_Laboratory/internlm2-chat-1_8b-sft

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-chat-1_8b-sft

Shanghai_AI_Laboratory/internlm2-base-7b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-base-7b

Shanghai_AI_Laboratory/internlm2-7b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-7b

Shanghai_AI_Laboratory/internlm2-chat-7b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-chat-7b

Shanghai_AI_Laboratory/internlm2-chat-7b-sft

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-chat-7b-sft

Shanghai_AI_Laboratory/internlm2-base-20b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-base-20b

Shanghai_AI_Laboratory/internlm2-20b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-20b

Shanghai_AI_Laboratory/internlm2-chat-20b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-chat-20b

Shanghai_AI_Laboratory/internlm2-chat-20b-sft

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2-chat-20b-sft

Shanghai_AI_Laboratory/internlm2-math-7b

internlm2

internlm2

transformers>=4.38

math

internlm/internlm2-math-7b

Shanghai_AI_Laboratory/internlm2-math-base-7b

internlm2

internlm2

transformers>=4.38

math

internlm/internlm2-math-base-7b

Shanghai_AI_Laboratory/internlm2-math-base-20b

internlm2

internlm2

transformers>=4.38

math

internlm/internlm2-math-base-20b

Shanghai_AI_Laboratory/internlm2-math-20b

internlm2

internlm2

transformers>=4.38

math

internlm/internlm2-math-20b

Shanghai_AI_Laboratory/internlm2_5-1_8b-chat

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-1_8b-chat

Shanghai_AI_Laboratory/internlm2_5-1_8b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-1_8b

Shanghai_AI_Laboratory/internlm2_5-7b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-7b

Shanghai_AI_Laboratory/internlm2_5-7b-chat

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-7b-chat

Shanghai_AI_Laboratory/internlm2_5-7b-chat-1m

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-7b-chat-1m

Shanghai_AI_Laboratory/internlm2_5-20b

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-20b

Shanghai_AI_Laboratory/internlm2_5-20b-chat

internlm2

internlm2

transformers>=4.38

-

internlm/internlm2_5-20b-chat

Shanghai_AI_Laboratory/internlm3-8b-instruct

internlm3

internlm2

transformers>=4.48

-

internlm/internlm3-8b-instruct

deepseek-ai/deepseek-llm-7b-base

deepseek

deepseek

-

-

deepseek-ai/deepseek-llm-7b-base

deepseek-ai/deepseek-llm-7b-chat

deepseek

deepseek

-

-

deepseek-ai/deepseek-llm-7b-chat

deepseek-ai/deepseek-llm-67b-base

deepseek

deepseek

-

-

deepseek-ai/deepseek-llm-67b-base

deepseek-ai/deepseek-llm-67b-chat

deepseek

deepseek

-

-

deepseek-ai/deepseek-llm-67b-chat

deepseek-ai/deepseek-math-7b-base

deepseek

deepseek

-

math

deepseek-ai/deepseek-math-7b-base

deepseek-ai/deepseek-math-7b-instruct

deepseek

deepseek

-

math

deepseek-ai/deepseek-math-7b-instruct

deepseek-ai/deepseek-math-7b-rl

deepseek

deepseek

-

math

deepseek-ai/deepseek-math-7b-rl

deepseek-ai/deepseek-coder-1.3b-base

deepseek

deepseek

-

coding

deepseek-ai/deepseek-coder-1.3b-base

deepseek-ai/deepseek-coder-1.3b-instruct

deepseek

deepseek

-

coding

deepseek-ai/deepseek-coder-1.3b-instruct

deepseek-ai/deepseek-coder-6.7b-base

deepseek

deepseek

-

coding

deepseek-ai/deepseek-coder-6.7b-base

deepseek-ai/deepseek-coder-6.7b-instruct

deepseek

deepseek

-

coding

deepseek-ai/deepseek-coder-6.7b-instruct

deepseek-ai/deepseek-coder-33b-base

deepseek

deepseek

-

coding

deepseek-ai/deepseek-coder-33b-base

deepseek-ai/deepseek-coder-33b-instruct

deepseek

deepseek

-

coding

deepseek-ai/deepseek-coder-33b-instruct

deepseek-ai/deepseek-moe-16b-chat

deepseek_moe

deepseek

-

-

deepseek-ai/deepseek-moe-16b-chat

deepseek-ai/deepseek-moe-16b-base

deepseek_moe

deepseek

-

-

deepseek-ai/deepseek-moe-16b-base

deepseek-ai/DeepSeek-Coder-V2-Instruct

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-Coder-V2-Instruct

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

deepseek-ai/DeepSeek-Coder-V2-Base

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-Coder-V2-Base

deepseek-ai/DeepSeek-Coder-V2-Lite-Base

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-Coder-V2-Lite-Base

deepseek-ai/DeepSeek-V2-Lite

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V2-Lite

deepseek-ai/DeepSeek-V2-Lite-Chat

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V2-Lite-Chat

deepseek-ai/DeepSeek-V2

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V2

deepseek-ai/DeepSeek-V2-Chat

deepseek_v2

deepseek

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V2-Chat

deepseek-ai/DeepSeek-V2.5

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V2.5

deepseek-ai/DeepSeek-V2.5-1210

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V2.5-1210

deepseek-ai/DeepSeek-V3-Base

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V3-Base

deepseek-ai/DeepSeek-V3

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V3

deepseek-ai/DeepSeek-V3-0324

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V3-0324

cognitivecomputations/DeepSeek-V3-awq

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

cognitivecomputations/DeepSeek-V3-AWQ

cognitivecomputations/DeepSeek-V3-0324-AWQ

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

cognitivecomputations/DeepSeek-V3-0324-AWQ

deepseek-ai/DeepSeek-Prover-V2-7B

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-Prover-V2-7B

deepseek-ai/DeepSeek-Prover-V2-671B

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

deepseek-ai/DeepSeek-Prover-V2-671B

unsloth/DeepSeek-V3-bf16

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

unsloth/DeepSeek-V3-bf16

unsloth/DeepSeek-V3-0324-BF16

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

unsloth/DeepSeek-V3-0324-BF16

unsloth/DeepSeek-Prover-V2-671B-BF16

deepseek_v2_5

deepseek_v2_5

transformers>=4.39.3

-

unsloth/DeepSeek-Prover-V2-671B-BF16

deepseek-ai/DeepSeek-R1

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

deepseek-ai/DeepSeek-R1

deepseek-ai/DeepSeek-R1-Zero

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

deepseek-ai/DeepSeek-R1-Zero

deepseek-ai/DeepSeek-R1-0528

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

deepseek-ai/DeepSeek-R1-0528

cognitivecomputations/DeepSeek-R1-awq

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

cognitivecomputations/DeepSeek-R1-AWQ

cognitivecomputations/DeepSeek-R1-0528-AWQ

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

cognitivecomputations/DeepSeek-R1-0528-AWQ

unsloth/DeepSeek-R1-BF16

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

unsloth/DeepSeek-R1-BF16

unsloth/DeepSeek-R1-Zero-BF16

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

unsloth/DeepSeek-R1-Zero-BF16

unsloth/DeepSeek-R1-0528-BF16

deepseek_r1

deepseek_r1

transformers>=4.39.3

-

unsloth/DeepSeek-R1-0528-BF16

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

deepseek_r1_distill

deepseek_r1

transformers>=4.37

-

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

deepseek_r1_distill

deepseek_r1

transformers>=4.37

-

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

deepseek_r1_distill

deepseek_r1

transformers>=4.37

-

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

deepseek_r1_distill

deepseek_r1

transformers>=4.37

-

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

iic/QwenLong-L1-32B

deepseek_r1_distill

deepseek_r1

transformers>=4.37

-

Tongyi-Zhiwen/QwenLong-L1-32B

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

deepseek_r1_distill

deepseek_r1

-

-

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

deepseek_r1_distill

deepseek_r1

-

-

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

deepseek_r1_distill

deepseek_r1

-

-

deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

deepseek-ai/DeepSeek-V3.1-Base

deepseek_v3_1

deepseek_v3_1

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V3.1-Base

deepseek-ai/DeepSeek-V3.1

deepseek_v3_1

deepseek_v3_1

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V3.1

deepseek-ai/DeepSeek-V3.1-Terminus

deepseek_v3_1

deepseek_v3_1

transformers>=4.39.3

-

deepseek-ai/DeepSeek-V3.1-Terminus

OpenBuddy/openbuddy-llama-65b-v8-bf16

openbuddy_llama

openbuddy

-

-

OpenBuddy/openbuddy-llama-65b-v8-bf16

OpenBuddy/openbuddy-llama2-13b-v8.1-fp16

openbuddy_llama

openbuddy

-

-

OpenBuddy/openbuddy-llama2-13b-v8.1-fp16

OpenBuddy/openbuddy-llama2-70b-v10.1-bf16

openbuddy_llama

openbuddy

-

-

OpenBuddy/openbuddy-llama2-70b-v10.1-bf16

OpenBuddy/openbuddy-deepseek-67b-v15.2

openbuddy_llama

openbuddy

-

-

OpenBuddy/openbuddy-deepseek-67b-v15.2

OpenBuddy/openbuddy-llama3-8b-v21.1-8k

openbuddy_llama3

openbuddy2

-

-

OpenBuddy/openbuddy-llama3-8b-v21.1-8k

OpenBuddy/openbuddy-llama3-70b-v21.1-8k

openbuddy_llama3

openbuddy2

-

-

OpenBuddy/openbuddy-llama3-70b-v21.1-8k

OpenBuddy/openbuddy-yi1.5-34b-v21.3-32k

openbuddy_llama3

openbuddy2

-

-

OpenBuddy/openbuddy-yi1.5-34b-v21.3-32k

OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k

openbuddy_llama3

openbuddy2

transformers>=4.43

-

OpenBuddy/openbuddy-llama3.1-8b-v22.1-131k

OpenBuddy/openbuddy-nemotron-70b-v23.2-131k

openbuddy_llama3

openbuddy2

transformers>=4.43

-

OpenBuddy/openbuddy-nemotron-70b-v23.2-131k

OpenBuddy/openbuddy-llama3.3-70b-v24.3-131k

openbuddy_llama3

openbuddy2

transformers>=4.45

-

OpenBuddy/openbuddy-llama3.3-70b-v24.3-131k

OpenBuddy/openbuddy-mistral-7b-v17.1-32k

openbuddy_mistral

openbuddy

transformers>=4.34

-

OpenBuddy/openbuddy-mistral-7b-v17.1-32k

OpenBuddy/openbuddy-zephyr-7b-v14.1

openbuddy_mistral

openbuddy

transformers>=4.34

-

OpenBuddy/openbuddy-zephyr-7b-v14.1

OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k

openbuddy_mixtral

openbuddy

transformers>=4.36

-

OpenBuddy/openbuddy-mixtral-7bx8-v18.1-32k

baichuan-inc/Baichuan-13B-Chat

baichuan

baichuan

transformers<4.34

-

baichuan-inc/Baichuan-13B-Chat

baichuan-inc/Baichuan-13B-Base

baichuan

baichuan

transformers<4.34

-

baichuan-inc/Baichuan-13B-Base

baichuan-inc/baichuan-7B

baichuan

baichuan

transformers<4.34

-

baichuan-inc/Baichuan-7B

baichuan-inc/Baichuan2-7B-Chat

baichuan2

baichuan

-

-

baichuan-inc/Baichuan2-7B-Chat

baichuan-inc/Baichuan2-7B-Base

baichuan2

baichuan

-

-

baichuan-inc/Baichuan2-7B-Base

baichuan-inc/Baichuan2-13B-Chat

baichuan2

baichuan

-

-

baichuan-inc/Baichuan2-13B-Chat

baichuan-inc/Baichuan2-13B-Base

baichuan2

baichuan

-

-

baichuan-inc/Baichuan2-13B-Base

baichuan-inc/Baichuan2-7B-Chat-4bits

baichuan2

baichuan

bitsandbytes<0.41.2, accelerate<0.26

-

baichuan-inc/Baichuan2-7B-Chat-4bits

baichuan-inc/Baichuan2-13B-Chat-4bits

baichuan2

baichuan

bitsandbytes<0.41.2, accelerate<0.26

-

baichuan-inc/Baichuan2-13B-Chat-4bits

baichuan-inc/Baichuan-M1-14B-Instruct

baichuan_m1

baichuan_m1

transformers>=4.48

-

baichuan-inc/Baichuan-M1-14B-Instruct

OpenBMB/MiniCPM-2B-sft-fp32

minicpm

minicpm

transformers>=4.36.0

-

openbmb/MiniCPM-2B-sft-fp32

OpenBMB/MiniCPM-2B-dpo-fp32

minicpm

minicpm

transformers>=4.36.0

-

openbmb/MiniCPM-2B-dpo-fp32

OpenBMB/MiniCPM-1B-sft-bf16

minicpm

minicpm

transformers>=4.36.0

-

openbmb/MiniCPM-1B-sft-bf16

OpenBMB/MiniCPM-2B-128k

minicpm_chatml

chatml

transformers>=4.36

-

openbmb/MiniCPM-2B-128k

OpenBMB/MiniCPM4-0.5B

minicpm_chatml

chatml

transformers>=4.36

-

openbmb/MiniCPM4-0.5B

OpenBMB/MiniCPM4-8B

minicpm_chatml

chatml

transformers>=4.36

-

openbmb/MiniCPM4-8B

OpenBMB/MiniCPM3-4B

minicpm3

chatml

transformers>=4.36

-

openbmb/MiniCPM3-4B

OpenBMB/MiniCPM-MoE-8x2B

minicpm_moe

minicpm

transformers>=4.36

-

openbmb/MiniCPM-MoE-8x2B

TeleAI/TeleChat-7B

telechat

telechat

-

-

Tele-AI/telechat-7B

TeleAI/TeleChat-12B

telechat

telechat

-

-

Tele-AI/TeleChat-12B

TeleAI/TeleChat-12B-v2

telechat

telechat

-

-

Tele-AI/TeleChat-12B-v2

TeleAI/TeleChat-52B

telechat

telechat

-

-

TeleAI/TeleChat-52B

swift/TeleChat-12B-V2-GPTQ-Int4

telechat

telechat

-

-

-

TeleAI/TeleChat2-35B

telechat

telechat

-

-

Tele-AI/TeleChat2-35B

TeleAI/TeleChat2-115B

telechat

telechat

-

-

Tele-AI/TeleChat2-115B

TeleAI/TeleChat2-3B

telechat2

telechat2

-

-

Tele-AI/TeleChat2-3B

TeleAI/TeleChat2-7B-32K

telechat2

telechat2

-

-

Tele-AI/TeleChat2-7B-32K

TeleAI/TeleChat2-35B-32K

telechat2

telechat2

-

-

Tele-AI/TeleChat2-35B-32K

TeleAI/TeleChat2-35B-Nov

telechat2

telechat2

-

-

Tele-AI/TeleChat2-35B-Nov

AI-ModelScope/Mistral-7B-Instruct-v0.1

mistral

llama

transformers>=4.34

-

mistralai/Mistral-7B-Instruct-v0.1

AI-ModelScope/Mistral-7B-Instruct-v0.2

mistral

llama

transformers>=4.34

-

mistralai/Mistral-7B-Instruct-v0.2

LLM-Research/Mistral-7B-Instruct-v0.3

mistral

llama

transformers>=4.34

-

mistralai/Mistral-7B-Instruct-v0.3

AI-ModelScope/Mistral-7B-v0.1

mistral

llama

transformers>=4.34

-

mistralai/Mistral-7B-v0.1

AI-ModelScope/Mistral-7B-v0.2-hf

mistral

llama

transformers>=4.34

-

alpindale/Mistral-7B-v0.2-hf

swift/Codestral-22B-v0.1

mistral

llama

transformers>=4.34

-

mistralai/Codestral-22B-v0.1

mistralai/Devstral-Small-2505

devstral

devstral

transformers>=4.43, mistral-common>=1.5.5

-

mistralai/Devstral-Small-2505

modelscope/zephyr-7b-beta

zephyr

zephyr

transformers>=4.34

-

HuggingFaceH4/zephyr-7b-beta

AI-ModelScope/Mixtral-8x7B-Instruct-v0.1

mixtral

llama

transformers>=4.36

-

mistralai/Mixtral-8x7B-Instruct-v0.1

AI-ModelScope/Mixtral-8x7B-v0.1

mixtral

llama

transformers>=4.36

-

mistralai/Mixtral-8x7B-v0.1

AI-ModelScope/Mixtral-8x22B-v0.1

mixtral

llama

transformers>=4.36

-

mistral-community/Mixtral-8x22B-v0.1

AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf

mixtral

llama

transformers>=4.38, aqlm, torch>=2.2.0

-

ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf

AI-ModelScope/Mistral-Small-Instruct-2409

mistral_nemo

mistral_nemo

transformers>=4.43

-

mistralai/Mistral-Small-Instruct-2409

LLM-Research/Mistral-Large-Instruct-2407

mistral_nemo

mistral_nemo

transformers>=4.43

-

mistralai/Mistral-Large-Instruct-2407

AI-ModelScope/Mistral-Nemo-Base-2407

mistral_nemo

mistral_nemo

transformers>=4.43

-

mistralai/Mistral-Nemo-Base-2407

AI-ModelScope/Mistral-Nemo-Instruct-2407

mistral_nemo

mistral_nemo

transformers>=4.43

-

mistralai/Mistral-Nemo-Instruct-2407

AI-ModelScope/Ministral-8B-Instruct-2410

mistral_nemo

mistral_nemo

transformers>=4.46

-

mistralai/Ministral-8B-Instruct-2410

mistralai/Mistral-Small-24B-Base-2501

mistral_2501

mistral_2501

-

-

mistralai/Mistral-Small-24B-Base-2501

mistralai/Mistral-Small-24B-Instruct-2501

mistral_2501

mistral_2501

-

-

mistralai/Mistral-Small-24B-Instruct-2501

AI-ModelScope/WizardLM-2-7B-AWQ

wizardlm2

wizardlm2

transformers>=4.34

-

MaziyarPanahi/WizardLM-2-7B-AWQ

AI-ModelScope/WizardLM-2-8x22B

wizardlm2_moe

wizardlm2_moe

transformers>=4.36

-

alpindale/WizardLM-2-8x22B

AI-ModelScope/phi-2

phi2

default

-

-

microsoft/phi-2

LLM-Research/Phi-3-small-8k-instruct

phi3_small

phi3

transformers>=4.36

-

microsoft/Phi-3-small-8k-instruct

LLM-Research/Phi-3-small-128k-instruct

phi3_small

phi3

transformers>=4.36

-

microsoft/Phi-3-small-128k-instruct

LLM-Research/Phi-3-mini-4k-instruct

phi3

phi3

transformers>=4.36

-

microsoft/Phi-3-mini-4k-instruct

LLM-Research/Phi-3-mini-128k-instruct

phi3

phi3

transformers>=4.36

-

microsoft/Phi-3-mini-128k-instruct

LLM-Research/Phi-3-medium-4k-instruct

phi3

phi3

transformers>=4.36

-

microsoft/Phi-3-medium-4k-instruct

LLM-Research/Phi-3-medium-128k-instruct

phi3

phi3

transformers>=4.36

-

microsoft/Phi-3-medium-128k-instruct

LLM-Research/Phi-3.5-mini-instruct

phi3

phi3

transformers>=4.36

-

microsoft/Phi-3.5-mini-instruct

LLM-Research/Phi-4-mini-instruct

phi3

phi3

transformers>=4.36

-

microsoft/Phi-4-mini-instruct

LLM-Research/Phi-3.5-MoE-instruct

phi3_moe

phi3

transformers>=4.36

-

microsoft/Phi-3.5-MoE-instruct

LLM-Research/phi-4

phi4

phi4

transformers>=4.36

-

microsoft/phi-4

MiniMax/MiniMax-Text-01

minimax

minimax

-

-

MiniMaxAI/MiniMax-Text-01

MiniMax/MiniMax-M1-40k

minimax_m1

minimax_m1

-

-

MiniMaxAI/MiniMax-M1-40k

MiniMax/MiniMax-M1-80k

minimax_m1

minimax_m1

-

-

MiniMaxAI/MiniMax-M1-80k

AI-ModelScope/gemma-2b-it

gemma

gemma

transformers>=4.38

-

google/gemma-2b-it

AI-ModelScope/gemma-2b

gemma

gemma

transformers>=4.38

-

google/gemma-2b

AI-ModelScope/gemma-7b

gemma

gemma

transformers>=4.38

-

google/gemma-7b

AI-ModelScope/gemma-7b-it

gemma

gemma

transformers>=4.38

-

google/gemma-7b-it

LLM-Research/gemma-2-2b-it

gemma2

gemma

transformers>=4.42

-

google/gemma-2-2b-it

LLM-Research/gemma-2-2b

gemma2

gemma

transformers>=4.42

-

google/gemma-2-2b

LLM-Research/gemma-2-9b

gemma2

gemma

transformers>=4.42

-

google/gemma-2-9b

LLM-Research/gemma-2-9b-it

gemma2

gemma

transformers>=4.42

-

google/gemma-2-9b-it

LLM-Research/gemma-2-27b

gemma2

gemma

transformers>=4.42

-

google/gemma-2-27b

LLM-Research/gemma-2-27b-it

gemma2

gemma

transformers>=4.42

-

google/gemma-2-27b-it

LLM-Research/gemma-3-1b-pt

gemma3_text

gemma3_text

transformers>=4.49

-

google/gemma-3-1b-pt

LLM-Research/gemma-3-1b-it

gemma3_text

gemma3_text

transformers>=4.49

-

google/gemma-3-1b-it

google/gemma-3-270m

gemma3_text

gemma3_text

transformers>=4.49

-

google/gemma-3-270m

google/gemma-3-270m-it

gemma3_text

gemma3_text

transformers>=4.49

-

google/gemma-3-270m-it

skywork/Skywork-13B-base

skywork

skywork

-

-

skywork/Skywork-13B-base

skywork/Skywork-13B-chat

skywork

skywork

-

-

-

AI-ModelScope/Skywork-o1-Open-Llama-3.1-8B

skywork_o1

skywork_o1

transformers>=4.43

-

Skywork/Skywork-o1-Open-Llama-3.1-8B

inclusionAI/Ling-lite

ling

ling

-

-

inclusionAI/Ling-lite

inclusionAI/Ling-plus

ling

ling

-

-

inclusionAI/Ling-plus

inclusionAI/Ling-lite-base

ling

ling

-

-

inclusionAI/Ling-lite-base

inclusionAI/Ling-plus-base

ling

ling

-

-

inclusionAI/Ling-plus-base

inclusionAI/Ling-mini-2.0

ling2

ling2

-

-

inclusionAI/Ling-mini-2.0

inclusionAI/Ling-mini-base-2.0

ling2

ling2

-

-

inclusionAI/Ling-mini-base-2.0

inclusionAI/Ring-mini-2.0

ring2

ring2

-

-

inclusionAI/Ring-mini-2.0

IEITYuan/Yuan2.0-2B-hf

yuan2

yuan

-

-

IEITYuan/Yuan2-2B-hf

IEITYuan/Yuan2.0-51B-hf

yuan2

yuan

-

-

IEITYuan/Yuan2-51B-hf

IEITYuan/Yuan2.0-102B-hf

yuan2

yuan

-

-

IEITYuan/Yuan2-102B-hf

IEITYuan/Yuan2-2B-Janus-hf

yuan2

yuan

-

-

IEITYuan/Yuan2-2B-Janus-hf

IEITYuan/Yuan2-M32-hf

yuan2

yuan

-

-

IEITYuan/Yuan2-M32-hf

OrionStarAI/Orion-14B-Chat

orion

orion

-

-

OrionStarAI/Orion-14B-Chat

OrionStarAI/Orion-14B-Base

orion

orion

-

-

OrionStarAI/Orion-14B-Base

xverse/XVERSE-7B-Chat

xverse

xverse

-

-

xverse/XVERSE-7B-Chat

xverse/XVERSE-7B

xverse

xverse

-

-

xverse/XVERSE-7B

xverse/XVERSE-13B

xverse

xverse

-

-

xverse/XVERSE-13B

xverse/XVERSE-13B-Chat

xverse

xverse

-

-

xverse/XVERSE-13B-Chat

xverse/XVERSE-65B

xverse

xverse

-

-

xverse/XVERSE-65B

xverse/XVERSE-65B-2

xverse

xverse

-

-

xverse/XVERSE-65B-2

xverse/XVERSE-65B-Chat

xverse

xverse

-

-

xverse/XVERSE-65B-Chat

xverse/XVERSE-13B-256K

xverse

xverse

-

-

xverse/XVERSE-13B-256K

xverse/XVERSE-MoE-A4.2B

xverse_moe

xverse

-

-

xverse/XVERSE-MoE-A4.2B

damo/nlp_seqgpt-560m

seggpt

default

-

-

DAMO-NLP/SeqGPT-560M

vivo-ai/BlueLM-7B-Chat-32K

bluelm

bluelm

-

-

vivo-ai/BlueLM-7B-Chat-32K

vivo-ai/BlueLM-7B-Chat

bluelm

bluelm

-

-

vivo-ai/BlueLM-7B-Chat

vivo-ai/BlueLM-7B-Base-32K

bluelm

bluelm

-

-

vivo-ai/BlueLM-7B-Base-32K

vivo-ai/BlueLM-7B-Base

bluelm

bluelm

-

-

vivo-ai/BlueLM-7B-Base

AI-ModelScope/c4ai-command-r-v01

c4ai

c4ai

transformers>=4.39

-

CohereForAI/c4ai-command-r-v01

AI-ModelScope/c4ai-command-r-plus

c4ai

c4ai

transformers>=4.39

-

CohereForAI/c4ai-command-r-plus

AI-ModelScope/dbrx-base

dbrx

dbrx

transformers>=4.36

-

databricks/dbrx-base

AI-ModelScope/dbrx-instruct

dbrx

dbrx

transformers>=4.36

-

databricks/dbrx-instruct

colossalai/grok-1-pytorch

grok

default

-

-

hpcai-tech/grok-1

AI-ModelScope/mamba-130m-hf

mamba

default

transformers>=4.39.0

-

state-spaces/mamba-130m-hf

AI-ModelScope/mamba-370m-hf

mamba

default

transformers>=4.39.0

-

state-spaces/mamba-370m-hf

AI-ModelScope/mamba-390m-hf

mamba

default

transformers>=4.39.0

-

state-spaces/mamba-390m-hf

AI-ModelScope/mamba-790m-hf

mamba

default

transformers>=4.39.0

-

state-spaces/mamba-790m-hf

AI-ModelScope/mamba-1.4b-hf

mamba

default

transformers>=4.39.0

-

state-spaces/mamba-1.4b-hf

AI-ModelScope/mamba-2.8b-hf

mamba

default

transformers>=4.39.0

-

state-spaces/mamba-2.8b-hf

damo/nlp_polylm_13b_text_generation

polylm

default

-

-

DAMO-NLP-MT/polylm-13b

AI-ModelScope/aya-expanse-8b

aya

aya

transformers>=4.44.0

-

CohereForAI/aya-expanse-8b

AI-ModelScope/aya-expanse-32b

aya

aya

transformers>=4.44.0

-

CohereForAI/aya-expanse-32b

moonshotai/Moonlight-16B-A3B

moonlight

moonlight

transformers<4.49

-

moonshotai/Moonlight-16B-A3B

moonshotai/Moonlight-16B-A3B-Instruct

moonlight

moonlight

transformers<4.49

-

moonshotai/Moonlight-16B-A3B-Instruct

moonshotai/Kimi-K2-Base

moonlight

moonlight

transformers<4.49

-

moonshotai/Kimi-K2-Base

moonshotai/Kimi-K2-Instruct

moonlight

moonlight

transformers<4.49

-

moonshotai/Kimi-K2-Instruct

moonshotai/Kimi-K2-Instruct-0905

moonlight

moonlight

transformers<4.49

-

moonshotai/Kimi-K2-Instruct-0905

XiaomiMiMo/MiMo-7B-Base

mimo

qwen

transformers>=4.37

-

XiaomiMiMo/MiMo-7B-Base

XiaomiMiMo/MiMo-7B-SFT

mimo

qwen

transformers>=4.37

-

XiaomiMiMo/MiMo-7B-SFT

XiaomiMiMo/MiMo-7B-RL-Zero

mimo

qwen

transformers>=4.37

-

XiaomiMiMo/MiMo-7B-RL-Zero

XiaomiMiMo/MiMo-7B-RL

mimo

qwen

transformers>=4.37

-

XiaomiMiMo/MiMo-7B-RL

XiaomiMiMo/MiMo-7B-RL-0530

mimo_rl

mimo_rl

transformers>=4.37

-

XiaomiMiMo/MiMo-7B-RL-0530

rednote-hilab/dots.llm1.base

dots1

dots1

transformers>=4.53

-

rednote-hilab/dots.llm1.base

rednote-hilab/dots.llm1.inst

dots1

dots1

transformers>=4.53

-

rednote-hilab/dots.llm1.inst

Tencent-Hunyuan/Hunyuan-A13B-Instruct

hunyuan_moe

hunyuan_moe

-

-

tencent/Hunyuan-A13B-Instruct

Tencent-Hunyuan/Hunyuan-0.5B-Instruct

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-0.5B-Instruct

Tencent-Hunyuan/Hunyuan-1.8B-Instruct

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-1.8B-Instruct

Tencent-Hunyuan/Hunyuan-4B-Instruct

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-4B-Instruct

Tencent-Hunyuan/Hunyuan-7B-Instruct

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-7B-Instruct

Tencent-Hunyuan/Hunyuan-0.5B-Pretrain

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-0.5B-Pretrain

Tencent-Hunyuan/Hunyuan-1.8B-Pretrain

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-1.8B-Pretrain

Tencent-Hunyuan/Hunyuan-4B-Pretrain

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-4B-Pretrain

Tencent-Hunyuan/Hunyuan-7B-Pretrain

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-7B-Pretrain

Tencent-Hunyuan/Hunyuan-0.5B-Instruct-FP8

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-0.5B-Instruct-FP8

Tencent-Hunyuan/Hunyuan-1.8B-Instruct-FP8

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-1.8B-Instruct-FP8

Tencent-Hunyuan/Hunyuan-4B-Instruct-FP8

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-4B-Instruct-FP8

Tencent-Hunyuan/Hunyuan-7B-Instruct-FP8

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-7B-Instruct-FP8

Tencent-Hunyuan/Hunyuan-0.5B-Instruct-AWQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-0.5B-Instruct-AWQ-Int4

Tencent-Hunyuan/Hunyuan-1.8B-Instruct-AWQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-1.8B-Instruct-AWQ-Int4

Tencent-Hunyuan/Hunyuan-4B-Instruct-AWQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-4B-Instruct-AWQ-Int4

Tencent-Hunyuan/Hunyuan-7B-Instruct-AWQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-7B-Instruct-AWQ-Int4

Tencent-Hunyuan/Hunyuan-0.5B-Instruct-GPTQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-0.5B-Instruct-GPTQ-Int4

Tencent-Hunyuan/Hunyuan-1.8B-Instruct-GPTQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-1.8B-Instruct-GPTQ-Int4

Tencent-Hunyuan/Hunyuan-4B-Instruct-GPTQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-4B-Instruct-GPTQ-Int4

Tencent-Hunyuan/Hunyuan-7B-Instruct-GPTQ-Int4

hunyuan

hunyuan

transformers>=4.55.0.dev0

-

tencent/Hunyuan-7B-Instruct-GPTQ-Int4

PaddlePaddle/ERNIE-4.5-0.3B-Base-PT

ernie

ernie

-

-

baidu/ERNIE-4.5-0.3B-PT

PaddlePaddle/ERNIE-4.5-0.3B-PT

ernie

ernie

-

-

baidu/ERNIE-4.5-0.3B-PT

PaddlePaddle/ERNIE-4.5-21B-A3B-Base-PT

ernie

ernie

-

-

baidu/ERNIE-4.5-21B-A3B-Base-PT

PaddlePaddle/ERNIE-4.5-21B-A3B-PT

ernie

ernie

-

-

baidu/ERNIE-4.5-21B-A3B-PT

PaddlePaddle/ERNIE-4.5-300B-A47B-Base-PT

ernie

ernie

-

-

baidu/ERNIE-4.5-300B-A47B-Base-PT

PaddlePaddle/ERNIE-4.5-300B-A47B-PT

ernie

ernie

-

-

baidu/ERNIE-4.5-300B-A47B-PT

google/embeddinggemma-300m

gemma_emb

dummy

-

-

google/embeddinggemma-300m

PaddlePaddle/ERNIE-4.5-21B-A3B-Thinking

ernie_thinking

ernie_thinking

-

-

baidu/ERNIE-4.5-21B-A3B-Thinking

meituan-longcat/LongCat-Flash-Chat

longchat

longchat

transformers>=4.54,<4.56

-

meituan-longcat/LongCat-Flash-Chat

meituan-longcat/LongCat-Flash-Chat-FP8

longchat

longchat

transformers>=4.54,<4.56

-

meituan-longcat/LongCat-Flash-Chat-FP8

answerdotai/ModernBERT-base

modern_bert

dummy

transformers>=4.48

bert

answerdotai/ModernBERT-base

answerdotai/ModernBERT-large

modern_bert

dummy

transformers>=4.48

bert

answerdotai/ModernBERT-large

iic/gte-modernbert-base

modern_bert_gte

dummy

transformers>=4.48

bert, embedding

Alibaba-NLP/gte-modernbert-base

iic/gte-reranker-modernbert-base

modern_bert_gte_reranker

bert

transformers>=4.48

bert, reranker

Alibaba-NLP/gte-reranker-modernbert-base

iic/nlp_structbert_backbone_base_std

bert

dummy

-

bert

-

Shanghai_AI_Laboratory/internlm2-1_8b-reward

internlm2_reward

internlm2_reward

transformers>=4.38

-

internlm/internlm2-1_8b-reward

Shanghai_AI_Laboratory/internlm2-7b-reward

internlm2_reward

internlm2_reward

transformers>=4.38

-

internlm/internlm2-7b-reward

Shanghai_AI_Laboratory/internlm2-20b-reward

internlm2_reward

internlm2_reward

transformers>=4.38

-

internlm/internlm2-20b-reward

Qwen/Qwen2-Math-RM-72B

qwen2_reward

qwen

transformers>=4.37

-

Qwen/Qwen2-Math-RM-72B

Qwen/Qwen2.5-Math-PRM-7B

qwen2_5_prm

qwen2_5_math_prm

transformers>=4.37

-

Qwen/Qwen2.5-Math-PRM-7B

Qwen/Qwen2.5-Math-7B-PRM800K

qwen2_5_prm

qwen2_5_math_prm

transformers>=4.37

-

Qwen/Qwen2.5-Math-7B-PRM800K

Qwen/Qwen2.5-Math-PRM-72B

qwen2_5_prm

qwen2_5_math_prm

transformers>=4.37

-

Qwen/Qwen2.5-Math-PRM-72B

Qwen/Qwen2.5-Math-RM-72B

qwen2_5_math_reward

qwen2_5_math

transformers>=4.37

-

Qwen/Qwen2.5-Math-RM-72B

AI-ModelScope/Skywork-Reward-Llama-3.1-8B

llama3_2_reward

llama3_2

transformers>=4.43

-

Skywork/Skywork-Reward-Llama-3.1-8B

AI-ModelScope/Skywork-Reward-Llama-3.1-8B-v0.2

llama3_2_reward

llama3_2

transformers>=4.43

-

Skywork/Skywork-Reward-Llama-3.1-8B-v0.2

AI-ModelScope/GRM_Llama3.1_8B_rewardmodel-ft

llama3_2_reward

llama3_2

transformers>=4.43

-

Ray2333/GRM_Llama3.1_8B_rewardmodel-ft

AI-ModelScope/GRM-llama3.2-3B-rewardmodel-ft

llama3_2_reward

llama3_2

transformers>=4.43

-

Ray2333/GRM-llama3.2-3B-rewardmodel-ft

AI-ModelScope/Skywork-Reward-Gemma-2-27B

gemma_reward

gemma

transformers>=4.42

-

Skywork/Skywork-Reward-Gemma-2-27B

AI-ModelScope/Skywork-Reward-Gemma-2-27B-v0.2

gemma_reward

gemma

transformers>=4.42

-

Skywork/Skywork-Reward-Gemma-2-27B-v0.2

多模态大模型

Model ID

Model Type

Default Template

Requires

Support Megatron

Tags

HF Model ID

Qwen/Qwen-VL-Chat

qwen_vl

qwen_vl

-

vision

Qwen/Qwen-VL-Chat

Qwen/Qwen-VL

qwen_vl

qwen_vl

-

vision

Qwen/Qwen-VL

Qwen/Qwen-VL-Chat-Int4

qwen_vl

qwen_vl

-

vision

Qwen/Qwen-VL-Chat-Int4

Qwen/Qwen-Audio-Chat

qwen_audio

qwen_audio

-

audio

Qwen/Qwen-Audio-Chat

Qwen/Qwen-Audio

qwen_audio

qwen_audio

-

audio

Qwen/Qwen-Audio

Qwen/Qwen2-VL-2B-Instruct

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-2B-Instruct

Qwen/Qwen2-VL-7B-Instruct

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-7B-Instruct

Qwen/Qwen2-VL-72B-Instruct

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-72B-Instruct

Qwen/Qwen2-VL-2B

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-2B

Qwen/Qwen2-VL-7B

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-7B

Qwen/Qwen2-VL-72B

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-72B

Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int4

Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4

Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int4

Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-2B-Instruct-GPTQ-Int8

Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int8

Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-72B-Instruct-GPTQ-Int8

Qwen/Qwen2-VL-2B-Instruct-AWQ

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-2B-Instruct-AWQ

Qwen/Qwen2-VL-7B-Instruct-AWQ

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-7B-Instruct-AWQ

Qwen/Qwen2-VL-72B-Instruct-AWQ

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2-VL-72B-Instruct-AWQ

bytedance-research/UI-TARS-2B-SFT

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

bytedance-research/UI-TARS-2B-SFT

bytedance-research/UI-TARS-7B-SFT

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

bytedance-research/UI-TARS-7B-SFT

bytedance-research/UI-TARS-7B-DPO

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

bytedance-research/UI-TARS-7B-DPO

bytedance-research/UI-TARS-72B-SFT

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

bytedance-research/UI-TARS-72B-SFT

bytedance-research/UI-TARS-72B-DPO

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

bytedance-research/UI-TARS-72B-DPO

allenai/olmOCR-7B-0225-preview

qwen2_vl

qwen2_vl

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

allenai/olmOCR-7B-0225-preview

Qwen/Qwen2.5-VL-3B-Instruct

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-3B-Instruct

Qwen/Qwen2.5-VL-7B-Instruct

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-7B-Instruct

Qwen/Qwen2.5-VL-32B-Instruct

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-32B-Instruct

Qwen/Qwen2.5-VL-72B-Instruct

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-72B-Instruct

Qwen/Qwen2.5-VL-3B-Instruct-AWQ

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-3B-Instruct-AWQ

Qwen/Qwen2.5-VL-7B-Instruct-AWQ

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-7B-Instruct-AWQ

Qwen/Qwen2.5-VL-32B-Instruct-AWQ

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-32B-Instruct-AWQ

Qwen/Qwen2.5-VL-72B-Instruct-AWQ

qwen2_5_vl

qwen2_5_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/Qwen2.5-VL-72B-Instruct-AWQ

Qwen/Qwen2.5-Omni-3B

qwen2_5_omni

qwen2_5_omni

transformers>=4.50, soundfile, qwen_omni_utils, decord

vision, video, audio

Qwen/Qwen2.5-Omni-3B

Qwen/Qwen2.5-Omni-7B

qwen2_5_omni

qwen2_5_omni

transformers>=4.50, soundfile, qwen_omni_utils, decord

vision, video, audio

Qwen/Qwen2.5-Omni-7B

Qwen/Qwen3-Omni-30B-A3B-Instruct

qwen3_omni

qwen3_omni

transformers>=4.57.dev0, soundfile, decord, qwen_omni_utils

vision, video, audio

Qwen/Qwen3-Omni-30B-A3B-Instruct

Qwen/Qwen3-Omni-30B-A3B-Thinking

qwen3_omni

qwen3_omni

transformers>=4.57.dev0, soundfile, decord, qwen_omni_utils

vision, video, audio

Qwen/Qwen3-Omni-30B-A3B-Thinking

Qwen/Qwen3-Omni-30B-A3B-Captioner

qwen3_omni

qwen3_omni

transformers>=4.57.dev0, soundfile, decord, qwen_omni_utils

vision, video, audio

Qwen/Qwen3-Omni-30B-A3B-Captioner

Qwen/Qwen2-Audio-7B-Instruct

qwen2_audio

qwen2_audio

transformers>=4.45,<4.49, librosa

audio

Qwen/Qwen2-Audio-7B-Instruct

Qwen/Qwen2-Audio-7B

qwen2_audio

qwen2_audio

transformers>=4.45,<4.49, librosa

audio

Qwen/Qwen2-Audio-7B

Qwen/Qwen3-VL-2B-Instruct

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-2B-Instruct

Qwen/Qwen3-VL-2B-Thinking

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-2B-Thinking

Qwen/Qwen3-VL-2B-Instruct-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-2B-Instruct-FP8

Qwen/Qwen3-VL-2B-Thinking-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-2B-Thinking-FP8

Qwen/Qwen3-VL-4B-Instruct

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-4B-Instruct

Qwen/Qwen3-VL-4B-Thinking

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-4B-Thinking

Qwen/Qwen3-VL-4B-Instruct-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-4B-Instruct-FP8

Qwen/Qwen3-VL-4B-Thinking-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-4B-Thinking-FP8

Qwen/Qwen3-VL-8B-Instruct

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-8B-Instruct

Qwen/Qwen3-VL-8B-Thinking

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-8B-Thinking

Qwen/Qwen3-VL-8B-Instruct-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-8B-Instruct-FP8

Qwen/Qwen3-VL-8B-Thinking-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-8B-Thinking-FP8

Qwen/Qwen3-VL-32B-Instruct

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-32B-Instruct

Qwen/Qwen3-VL-32B-Thinking

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-32B-Thinking

Qwen/Qwen3-VL-32B-Instruct-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-32B-Instruct-FP8

Qwen/Qwen3-VL-32B-Thinking-FP8

qwen3_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-32B-Thinking-FP8

Qwen/Qwen3-VL-30B-A3B-Instruct

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-30B-A3B-Instruct

Qwen/Qwen3-VL-30B-A3B-Thinking

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-30B-A3B-Thinking

Qwen/Qwen3-VL-30B-A3B-Instruct-FP8

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-30B-A3B-Instruct-FP8

Qwen/Qwen3-VL-30B-A3B-Thinking-FP8

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-30B-A3B-Thinking-FP8

Qwen/Qwen3-VL-235B-A22B-Instruct

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-235B-A22B-Instruct

Qwen/Qwen3-VL-235B-A22B-Thinking

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-235B-A22B-Thinking

Qwen/Qwen3-VL-235B-A22B-Instruct-FP8

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-235B-A22B-Instruct-FP8

Qwen/Qwen3-VL-235B-A22B-Thinking-FP8

qwen3_moe_vl

qwen3_vl

transformers>=4.57, qwen_vl_utils>=0.0.14, decord

vision, video

Qwen/Qwen3-VL-235B-A22B-Thinking-FP8

Qwen/QVQ-72B-Preview

qvq

qvq

transformers>=4.45, qwen_vl_utils>=0.0.6, decord

vision, video

Qwen/QVQ-72B-Preview

iic/gme-Qwen2-VL-2B-Instruct

qwen2_gme

qwen2_gme

-

vision

Alibaba-NLP/gme-Qwen2-VL-2B-Instruct

iic/gme-Qwen2-VL-7B-Instruct

qwen2_gme

qwen2_gme

-

vision

Alibaba-NLP/gme-Qwen2-VL-7B-Instruct

AIDC-AI/Ovis1.6-Gemma2-9B

ovis1_6

ovis1_6

transformers>=4.42

vision

AIDC-AI/Ovis1.6-Gemma2-9B

AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4

ovis1_6

ovis1_6

transformers>=4.42

vision

AIDC-AI/Ovis1.6-Gemma2-9B-GPTQ-Int4

AIDC-AI/Ovis1.6-Gemma2-27B

ovis1_6

ovis1_6

transformers>=4.42

vision

AIDC-AI/Ovis1.6-Gemma2-27B

AIDC-AI/Ovis1.6-Llama3.2-3B

ovis1_6_llama3

ovis1_6_llama3

-

vision

AIDC-AI/Ovis1.6-Llama3.2-3B

AIDC-AI/Ovis2-1B

ovis2

ovis2

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2-1B

AIDC-AI/Ovis2-2B

ovis2

ovis2

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2-2B

AIDC-AI/Ovis2-4B

ovis2

ovis2

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2-4B

AIDC-AI/Ovis2-8B

ovis2

ovis2

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2-8B

AIDC-AI/Ovis2-16B

ovis2

ovis2

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2-16B

AIDC-AI/Ovis2-34B

ovis2

ovis2

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2-34B

AIDC-AI/Ovis2.5-2B

ovis2_5

ovis2_5

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2.5-2B

AIDC-AI/Ovis2.5-9B

ovis2_5

ovis2_5

transformers>=4.46.2, moviepy<2

vision

AIDC-AI/Ovis2.5-9B

XiaomiMiMo/MiMo-VL-7B-SFT

mimo_vl

mimo_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

XiaomiMiMo/MiMo-VL-7B-SFT

XiaomiMiMo/MiMo-VL-7B-RL

mimo_vl

mimo_vl

transformers>=4.49, qwen_vl_utils>=0.0.6, decord

vision, video

XiaomiMiMo/MiMo-VL-7B-RL

mispeech/midashenglm-7b

midashenglm

midashenglm

transformers>=4.52, soundfile

audio

mispeech/midashenglm-7b

ZhipuAI/glm-4v-9b

glm4v

glm4v

transformers>=4.42,<4.45

-

zai-org/glm-4v-9b

ZhipuAI/cogagent-9b-20241220

glm4v

glm4v

transformers>=4.42

-

zai-org/cogagent-9b-20241220

ZhipuAI/GLM-4.1V-9B-Base

glm4_1v

glm4_1v

transformers>=4.53

-

zai-org/GLM-4.1V-9B-Base

ZhipuAI/GLM-4.1V-9B-Thinking

glm4_1v

glm4_1v

transformers>=4.53

-

zai-org/GLM-4.1V-9B-Thinking

ZhipuAI/GLM-4.5V

glm4_5v

glm4_5v

transformers>=4.56

-

zai-org/GLM-4.5V

ZhipuAI/GLM-4.5V-FP8

glm4_5v

glm4_5v

transformers>=4.56

-

zai-org/GLM-4.5V-FP8

ZhipuAI/glm-edge-v-2b

glm_edge_v

glm_edge_v

transformers>=4.46

vision

zai-org/glm-edge-v-2b

ZhipuAI/glm-edge-4b-chat

glm_edge_v

glm_edge_v

transformers>=4.46

vision

zai-org/glm-edge-4b-chat

ZhipuAI/cogvlm-chat

cogvlm

cogvlm

transformers<4.42

-

zai-org/cogvlm-chat-hf

ZhipuAI/cogagent-vqa

cogagent_vqa

cogagent_vqa

transformers<4.42

-

zai-org/cogagent-vqa-hf

ZhipuAI/cogagent-chat

cogagent_chat

cogagent_chat

transformers<4.42, timm

-

zai-org/cogagent-chat-hf

ZhipuAI/cogvlm2-llama3-chat-19B

cogvlm2

cogvlm2

transformers<4.42

-

zai-org/cogvlm2-llama3-chat-19B

ZhipuAI/cogvlm2-llama3-chinese-chat-19B

cogvlm2

cogvlm2

transformers<4.42

-

zai-org/cogvlm2-llama3-chinese-chat-19B

ZhipuAI/cogvlm2-video-llama3-chat

cogvlm2_video

cogvlm2_video

decord, pytorchvideo, transformers>=4.42

video

zai-org/cogvlm2-video-llama3-chat

OpenGVLab/Mini-InternVL-Chat-2B-V1-5

internvl

internvl

transformers>=4.35, timm

vision

OpenGVLab/Mini-InternVL-Chat-2B-V1-5

AI-ModelScope/InternVL-Chat-V1-5

internvl

internvl

transformers>=4.35, timm

vision

OpenGVLab/InternVL-Chat-V1-5

AI-ModelScope/InternVL-Chat-V1-5-int8

internvl

internvl

transformers>=4.35, timm

vision

OpenGVLab/InternVL-Chat-V1-5-int8

OpenGVLab/Mini-InternVL-Chat-4B-V1-5

internvl_phi3

internvl_phi3

transformers>=4.35,<4.42, timm

vision

OpenGVLab/Mini-InternVL-Chat-4B-V1-5

OpenGVLab/InternVL2-1B

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-1B

OpenGVLab/InternVL2-2B

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-2B

OpenGVLab/InternVL2-8B

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-8B

OpenGVLab/InternVL2-26B

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-26B

OpenGVLab/InternVL2-40B

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-40B

OpenGVLab/InternVL2-Llama3-76B

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Llama3-76B

OpenGVLab/InternVL2-2B-AWQ

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-2B-AWQ

OpenGVLab/InternVL2-8B-AWQ

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-8B-AWQ

OpenGVLab/InternVL2-26B-AWQ

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-26B-AWQ

OpenGVLab/InternVL2-40B-AWQ

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-40B-AWQ

OpenGVLab/InternVL2-Llama3-76B-AWQ

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Llama3-76B-AWQ

OpenGVLab/InternVL2-8B-MPO

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-8B-MPO

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-1B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-1B-Pretrain

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-2B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-2B-Pretrain

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-4B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-4B-Pretrain

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-8B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-8B-Pretrain

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-26B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-26B-Pretrain

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-40B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-40B-Pretrain

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-Llama3-76B-Pretrain

internvl2

internvl2

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2-Pretrain-Models:InternVL2-Llama3-76B-Pretrain

OpenGVLab/InternVL2-4B

internvl2_phi3

internvl2_phi3

transformers>=4.36,<4.42, timm

vision, video

OpenGVLab/InternVL2-4B

OpenGVLab/InternVL2_5-1B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-1B

OpenGVLab/InternVL2_5-2B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-2B

OpenGVLab/InternVL2_5-4B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-4B

OpenGVLab/InternVL2_5-8B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-8B

OpenGVLab/InternVL2_5-26B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-26B

OpenGVLab/InternVL2_5-38B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-38B

OpenGVLab/InternVL2_5-78B

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-78B

OpenGVLab/InternVL2_5-4B-AWQ

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-4B-AWQ

OpenGVLab/InternVL2_5-8B-AWQ

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-8B-AWQ

OpenGVLab/InternVL2_5-26B-AWQ

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-26B-AWQ

OpenGVLab/InternVL2_5-38B-AWQ

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-38B-AWQ

OpenGVLab/InternVL2_5-78B-AWQ

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-78B-AWQ

OpenGVLab/InternVL2_5-1B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-1B-MPO

OpenGVLab/InternVL2_5-2B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-2B-MPO

OpenGVLab/InternVL2_5-4B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-4B-MPO

OpenGVLab/InternVL2_5-8B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-8B-MPO

OpenGVLab/InternVL2_5-26B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-26B-MPO

OpenGVLab/InternVL2_5-38B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-38B-MPO

OpenGVLab/InternVL2_5-78B-MPO

internvl2_5

internvl2_5

transformers>=4.36, timm

vision, video

OpenGVLab/InternVL2_5-78B-MPO

OpenGVLab/InternVL3-1B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-1B-Pretrained

OpenGVLab/InternVL3-2B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-2B-Pretrained

OpenGVLab/InternVL3-8B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-8B-Pretrained

OpenGVLab/InternVL3-9B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-9B-Pretrained

OpenGVLab/InternVL3-14B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-14B-Pretrained

OpenGVLab/InternVL3-38B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-38B-Pretrained

OpenGVLab/InternVL3-78B-Pretrained

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-78B-Pretrained

OpenGVLab/InternVL3-1B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-1B-Instruct

OpenGVLab/InternVL3-2B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-2B-Instruct

OpenGVLab/InternVL3-8B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-8B-Instruct

OpenGVLab/InternVL3-9B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-9B-Instruct

OpenGVLab/InternVL3-14B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-14B-Instruct

OpenGVLab/InternVL3-38B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-38B-Instruct

OpenGVLab/InternVL3-78B-Instruct

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-78B-Instruct

OpenGVLab/InternVL3-1B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-1B

OpenGVLab/InternVL3-2B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-2B

OpenGVLab/InternVL3-8B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-8B

OpenGVLab/InternVL3-9B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-9B

OpenGVLab/InternVL3-14B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-14B

OpenGVLab/InternVL3-38B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-38B

OpenGVLab/InternVL3-78B

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-78B

OpenGVLab/InternVL3-1B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-1B-AWQ

OpenGVLab/InternVL3-2B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-2B-AWQ

OpenGVLab/InternVL3-8B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-8B-AWQ

OpenGVLab/InternVL3-9B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-9B-AWQ

OpenGVLab/InternVL3-14B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-14B-AWQ

OpenGVLab/InternVL3-38B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-38B-AWQ

OpenGVLab/InternVL3-78B-AWQ

internvl3

internvl2_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3-78B-AWQ

OpenGVLab/InternVL3-1B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-1B-hf

OpenGVLab/InternVL3-2B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-2B-hf

OpenGVLab/InternVL3-8B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-8B-hf

OpenGVLab/InternVL3-9B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-9B-hf

OpenGVLab/InternVL3-14B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-14B-hf

OpenGVLab/InternVL3-38B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-38B-hf

OpenGVLab/InternVL3-78B-hf

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3-78B-hf

OpenGVLab/InternVL3_5-1B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-1B-HF

OpenGVLab/InternVL3_5-2B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-2B-HF

OpenGVLab/InternVL3_5-4B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-4B-HF

OpenGVLab/InternVL3_5-8B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-8B-HF

OpenGVLab/InternVL3_5-14B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-14B-HF

OpenGVLab/InternVL3_5-38B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-38B-HF

OpenGVLab/InternVL3_5-30B-A3B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-30B-A3B-HF

OpenGVLab/InternVL3_5-241B-A28B-HF

internvl_hf

internvl_hf

transformers>=4.52.1, timm

vision, video

OpenGVLab/InternVL3_5-241B-A28B-HF

OpenGVLab/InternVL3_5-1B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-1B-Pretrained

OpenGVLab/InternVL3_5-2B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-2B-Pretrained

OpenGVLab/InternVL3_5-4B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-4B-Pretrained

OpenGVLab/InternVL3_5-8B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-8B-Pretrained

OpenGVLab/InternVL3_5-14B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-14B-Pretrained

OpenGVLab/InternVL3_5-38B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-38B-Pretrained

OpenGVLab/InternVL3_5-30B-A3B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-30B-A3B-Pretrained

OpenGVLab/InternVL3_5-241B-A28B-Pretrained

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-241B-A28B-Pretrained

OpenGVLab/InternVL3_5-1B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-1B-Instruct

OpenGVLab/InternVL3_5-2B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-2B-Instruct

OpenGVLab/InternVL3_5-4B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-4B-Instruct

OpenGVLab/InternVL3_5-8B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-8B-Instruct

OpenGVLab/InternVL3_5-14B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-14B-Instruct

OpenGVLab/InternVL3_5-38B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-38B-Instruct

OpenGVLab/InternVL3_5-30B-A3B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-30B-A3B-Instruct

OpenGVLab/InternVL3_5-241B-A28B-Instruct

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-241B-A28B-Instruct

OpenGVLab/InternVL3_5-1B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-1B-MPO

OpenGVLab/InternVL3_5-2B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-2B-MPO

OpenGVLab/InternVL3_5-4B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-4B-MPO

OpenGVLab/InternVL3_5-8B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-8B-MPO

OpenGVLab/InternVL3_5-14B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-14B-MPO

OpenGVLab/InternVL3_5-38B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-38B-MPO

OpenGVLab/InternVL3_5-30B-A3B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-30B-A3B-MPO

OpenGVLab/InternVL3_5-241B-A28B-MPO

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-241B-A28B-MPO

OpenGVLab/InternVL3_5-1B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-1B

OpenGVLab/InternVL3_5-2B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-2B

OpenGVLab/InternVL3_5-4B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-4B

OpenGVLab/InternVL3_5-8B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-8B

OpenGVLab/InternVL3_5-14B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-14B

OpenGVLab/InternVL3_5-38B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-38B

OpenGVLab/InternVL3_5-30B-A3B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-30B-A3B

OpenGVLab/InternVL3_5-241B-A28B

internvl3_5

internvl3_5

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-241B-A28B

OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview

internvl3_5_gpt

internvl3_5_gpt

transformers>=4.37.2, timm

vision, video

OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview

OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF

internvl_gpt_hf

internvl_hf

transformers>=4.55.0, timm

vision, video

OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF

Shanghai_AI_Laboratory/Intern-S1-mini

interns1

interns1

transformers>=4.55.2,<4.56

vision, video

internlm/Intern-S1-mini

Shanghai_AI_Laboratory/Intern-S1

interns1

interns1

transformers>=4.55.2,<4.56

vision, video

internlm/Intern-S1

Shanghai_AI_Laboratory/Intern-S1-mini-FP8

interns1

interns1

transformers>=4.55.2,<4.56

vision, video

internlm/Intern-S1-mini-FP8

Shanghai_AI_Laboratory/Intern-S1-FP8

interns1

interns1

transformers>=4.55.2,<4.56

vision, video

internlm/Intern-S1-FP8

Shanghai_AI_Laboratory/internlm-xcomposer2-7b

xcomposer2

ixcomposer2

-

vision

internlm/internlm-xcomposer2-7b

Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b

xcomposer2_4khd

ixcomposer2

-

vision

internlm/internlm-xcomposer2-4khd-7b

Shanghai_AI_Laboratory/internlm-xcomposer2d5-7b

xcomposer2_5

xcomposer2_5

decord

vision

internlm/internlm-xcomposer2d5-7b

Shanghai_AI_Laboratory/internlm-xcomposer2d5-ol-7b:base

xcomposer2_5

xcomposer2_5

decord

vision

internlm/internlm-xcomposer2d5-ol-7b:base

Shanghai_AI_Laboratory/internlm-xcomposer2d5-ol-7b:audio

xcomposer2_5_ol_audio

qwen2_audio

transformers>=4.45

audio

internlm/internlm-xcomposer2d5-ol-7b:audio

LLM-Research/Llama-3.2-11B-Vision-Instruct

llama3_2_vision

llama3_2_vision

transformers>=4.45

vision

meta-llama/Llama-3.2-11B-Vision-Instruct

LLM-Research/Llama-3.2-90B-Vision-Instruct

llama3_2_vision

llama3_2_vision

transformers>=4.45

vision

meta-llama/Llama-3.2-90B-Vision-Instruct

LLM-Research/Llama-3.2-11B-Vision

llama3_2_vision

llama3_2_vision

transformers>=4.45

vision

meta-llama/Llama-3.2-11B-Vision

LLM-Research/Llama-3.2-90B-Vision

llama3_2_vision

llama3_2_vision

transformers>=4.45

vision

meta-llama/Llama-3.2-90B-Vision

LLM-Research/Llama-4-Scout-17B-16E

llama4

llama4

transformers>=4.51

vision

meta-llama/Llama-4-Scout-17B-16E

LLM-Research/Llama-4-Maverick-17B-128E

llama4

llama4

transformers>=4.51

vision

meta-llama/Llama-4-Maverick-17B-128E

LLM-Research/Llama-4-Scout-17B-16E-Instruct

llama4

llama4

transformers>=4.51

vision

meta-llama/Llama-4-Scout-17B-16E-Instruct

LLM-Research/Llama-4-Maverick-17B-128E-Instruct-FP8

llama4

llama4

transformers>=4.51

vision

meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8

LLM-Research/Llama-4-Maverick-17B-128E-Instruct

llama4

llama4

transformers>=4.51

vision

meta-llama/Llama-4-Maverick-17B-128E-Instruct

ICTNLP/Llama-3.1-8B-Omni

llama3_1_omni

llama3_1_omni

openai-whisper

audio

ICTNLP/Llama-3.1-8B-Omni

llava-hf/llava-1.5-7b-hf

llava1_5_hf

llava1_5_hf

transformers>=4.36

vision

llava-hf/llava-1.5-7b-hf

llava-hf/llava-1.5-13b-hf

llava1_5_hf

llava1_5_hf

transformers>=4.36

vision

llava-hf/llava-1.5-13b-hf

llava-hf/llava-v1.6-mistral-7b-hf

llava1_6_mistral_hf

llava1_6_mistral_hf

transformers>=4.39

vision

llava-hf/llava-v1.6-mistral-7b-hf

llava-hf/llava-v1.6-vicuna-7b-hf

llava1_6_vicuna_hf

llava1_6_vicuna_hf

transformers>=4.39

vision

llava-hf/llava-v1.6-vicuna-7b-hf

llava-hf/llava-v1.6-vicuna-13b-hf

llava1_6_vicuna_hf

llava1_6_vicuna_hf

transformers>=4.39

vision

llava-hf/llava-v1.6-vicuna-13b-hf

llava-hf/llava-v1.6-34b-hf

llava1_6_yi_hf

llava1_6_yi_hf

transformers>=4.39

vision

llava-hf/llava-v1.6-34b-hf

llava-hf/llama3-llava-next-8b-hf

llama3_llava_next_hf

llama3_llava_next_hf

transformers>=4.39

vision

llava-hf/llama3-llava-next-8b-hf

llava-hf/llava-next-72b-hf

llava_next_qwen_hf

llava_next_qwen_hf

transformers>=4.39

vision

llava-hf/llava-next-72b-hf

llava-hf/llava-next-110b-hf

llava_next_qwen_hf

llava_next_qwen_hf

transformers>=4.39

vision

llava-hf/llava-next-110b-hf

llava-hf/LLaVA-NeXT-Video-7B-DPO-hf

llava_next_video_hf

llava_next_video_hf

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-7B-DPO-hf

llava-hf/LLaVA-NeXT-Video-7B-32K-hf

llava_next_video_hf

llava_next_video_hf

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-7B-32K-hf

llava-hf/LLaVA-NeXT-Video-7B-hf

llava_next_video_hf

llava_next_video_hf

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-7B-hf

llava-hf/LLaVA-NeXT-Video-34B-hf

llava_next_video_yi_hf

llava_next_video_hf

transformers>=4.42, av

video

llava-hf/LLaVA-NeXT-Video-34B-hf

llava-hf/llava-onevision-qwen2-0.5b-ov-hf

llava_onevision_hf

llava_onevision_hf

transformers>=4.45

vision, video

llava-hf/llava-onevision-qwen2-0.5b-ov-hf

llava-hf/llava-onevision-qwen2-7b-ov-hf

llava_onevision_hf

llava_onevision_hf

transformers>=4.45

vision, video

llava-hf/llava-onevision-qwen2-7b-ov-hf

llava-hf/llava-onevision-qwen2-72b-ov-hf

llava_onevision_hf

llava_onevision_hf

transformers>=4.45

vision, video

llava-hf/llava-onevision-qwen2-72b-ov-hf

01ai/Yi-VL-6B

yi_vl

yi_vl

transformers>=4.34

vision

01-ai/Yi-VL-6B

01ai/Yi-VL-34B

yi_vl

yi_vl

transformers>=4.34

vision

01-ai/Yi-VL-34B

swift/llava-llama3.1-8b

llava_llama3_1_hf

llava_llama3_1_hf

transformers>=4.41

vision

-

AI-ModelScope/llava-llama-3-8b-v1_1-transformers

llava_llama3_hf

llava_llama3_hf

transformers>=4.36

vision

xtuner/llava-llama-3-8b-v1_1-transformers

AI-ModelScope/llava-v1.6-mistral-7b

llava1_6_mistral

llava1_6_mistral

transformers>=4.34

vision

liuhaotian/llava-v1.6-mistral-7b

AI-ModelScope/llava-v1.6-34b

llava1_6_yi

llava1_6_yi

transformers>=4.34

vision

liuhaotian/llava-v1.6-34b

AI-ModelScope/llava-next-72b

llava_next_qwen

llava_next_qwen

transformers>=4.42, av

vision

lmms-lab/llava-next-72b

AI-ModelScope/llava-next-110b

llava_next_qwen

llava_next_qwen

transformers>=4.42, av

vision

lmms-lab/llava-next-110b

AI-ModelScope/llama3-llava-next-8b

llama3_llava_next

llama3_llava_next

transformers>=4.42, av

vision

lmms-lab/llama3-llava-next-8b

deepseek-ai/deepseek-vl-1.3b-chat

deepseek_vl

deepseek_vl

-

vision

deepseek-ai/deepseek-vl-1.3b-chat

deepseek-ai/deepseek-vl-7b-chat

deepseek_vl

deepseek_vl

-

vision

deepseek-ai/deepseek-vl-7b-chat

deepseek-ai/deepseek-vl2-tiny

deepseek_vl2

deepseek_vl2

transformers<4.42

vision

deepseek-ai/deepseek-vl2-tiny

deepseek-ai/deepseek-vl2-small

deepseek_vl2

deepseek_vl2

transformers<4.42

vision

deepseek-ai/deepseek-vl2-small

deepseek-ai/deepseek-vl2

deepseek_vl2

deepseek_vl2

transformers<4.42

vision

deepseek-ai/deepseek-vl2

deepseek-ai/Janus-1.3B

deepseek_janus

deepseek_janus

-

vision

deepseek-ai/Janus-1.3B

deepseek-ai/Janus-Pro-1B

deepseek_janus_pro

deepseek_janus_pro

-

vision

deepseek-ai/Janus-Pro-1B

deepseek-ai/Janus-Pro-7B

deepseek_janus_pro

deepseek_janus_pro

-

vision

deepseek-ai/Janus-Pro-7B

OpenBMB/MiniCPM-V

minicpmv

minicpmv

timm, transformers<4.42

vision

openbmb/MiniCPM-V

OpenBMB/MiniCPM-V-2

minicpmv

minicpmv

timm, transformers<4.42

vision

openbmb/MiniCPM-V-2

OpenBMB/MiniCPM-Llama3-V-2_5

minicpmv2_5

minicpmv2_5

timm, transformers>=4.36

vision

openbmb/MiniCPM-Llama3-V-2_5

OpenBMB/MiniCPM-V-2_6

minicpmv2_6

minicpmv2_6

timm, transformers>=4.36, decord

vision, video

openbmb/MiniCPM-V-2_6

OpenBMB/MiniCPM-o-2_6

minicpmo2_6

minicpmo2_6

timm, transformers>=4.36, decord, soundfile

vision, video, omni, audio

openbmb/MiniCPM-o-2_6

OpenBMB/MiniCPM-V-4

minicpmv4

minicpmv4

timm, transformers>=4.36, decord

vision, video

openbmb/MiniCPM-V-4

OpenBMB/MiniCPM-V-4_5

minicpmv4_5

minicpmv4_5

timm, transformers>=4.36, decord

vision, video

openbmb/MiniCPM-V-4_5

MiniMax/MiniMax-VL-01

minimax_vl

minimax_vl

-

vision

MiniMaxAI/MiniMax-VL-01

iic/mPLUG-Owl2

mplug_owl2

mplug_owl2

transformers<4.35, icecream

vision

MAGAer13/mplug-owl2-llama2-7b

iic/mPLUG-Owl2.1

mplug_owl2_1

mplug_owl2

transformers<4.35, icecream

vision

Mizukiluke/mplug_owl_2_1

iic/mPLUG-Owl3-1B-241014

mplug_owl3

mplug_owl3

transformers>=4.36, icecream, decord

vision, video

mPLUG/mPLUG-Owl3-1B-241014

iic/mPLUG-Owl3-2B-241014

mplug_owl3

mplug_owl3

transformers>=4.36, icecream, decord

vision, video

mPLUG/mPLUG-Owl3-2B-241014

iic/mPLUG-Owl3-7B-240728

mplug_owl3

mplug_owl3

transformers>=4.36, icecream, decord

vision, video

mPLUG/mPLUG-Owl3-7B-240728

iic/mPLUG-Owl3-7B-241101

mplug_owl3_241101

mplug_owl3_241101

transformers>=4.36, icecream

vision, video

mPLUG/mPLUG-Owl3-7B-241101

iic/DocOwl2

doc_owl2

doc_owl2

transformers>=4.36, icecream

vision

mPLUG/DocOwl2

BAAI/Emu3-Gen

emu3_gen

emu3_gen

-

t2i

BAAI/Emu3-Gen

BAAI/Emu3-Chat

emu3_chat

emu3_chat

transformers>=4.44.0

vision

BAAI/Emu3-Chat

stepfun-ai/GOT-OCR2_0

got_ocr2

got_ocr2

-

vision

stepfun-ai/GOT-OCR2_0

stepfun-ai/GOT-OCR-2.0-hf

got_ocr2_hf

got_ocr2_hf

-

vision

stepfun-ai/GOT-OCR-2.0-hf

stepfun-ai/Step-Audio-Chat

step_audio

step_audio

funasr, sox, conformer, openai-whisper, librosa

audio

stepfun-ai/Step-Audio-Chat

stepfun-ai/Step-Audio-2-mini

step_audio2_mini

step_audio2_mini

transformers==4.53.3, torchaudio, librosa

audio

stepfun-ai/Step-Audio-2-mini

moonshotai/Kimi-VL-A3B-Instruct

kimi_vl

kimi_vl

transformers<4.49

-

moonshotai/Kimi-VL-A3B-Instruct

moonshotai/Kimi-VL-A3B-Thinking

kimi_vl

kimi_vl

transformers<4.49

-

moonshotai/Kimi-VL-A3B-Thinking

moonshotai/Kimi-VL-A3B-Thinking-2506

kimi_vl

kimi_vl

transformers<4.49

-

moonshotai/Kimi-VL-A3B-Thinking-2506

Kwai-Keye/Keye-VL-8B-Preview

keye_vl

keye_vl

keye_vl_utils

vision

Kwai-Keye/Keye-VL-8B-Preview

Kwai-Keye/Keye-VL-1_5-8B

keye_vl_1_5

keye_vl_1_5

keye_vl_utils>=1.5.2

vision

Kwai-Keye/Keye-VL-1_5-8B

rednote-hilab/dots.ocr

dots_ocr

dots_ocr

transformers>=4.51.0

-

rednote-hilab/dots.ocr

BytedanceDouyinContent/SAIL-VL2-2B

sail_vl2

sail_vl2

transformers<=4.51.3

vision

BytedanceDouyinContent/SAIL-VL2-2B

BytedanceDouyinContent/SAIL-VL2-2B-Thinking

sail_vl2

sail_vl2

transformers<=4.51.3

vision

BytedanceDouyinContent/SAIL-VL2-2B-Thinking

BytedanceDouyinContent/SAIL-VL2-8B

sail_vl2

sail_vl2

transformers<=4.51.3

vision

BytedanceDouyinContent/SAIL-VL2-8B

BytedanceDouyinContent/SAIL-VL2-8B-Thinking

sail_vl2

sail_vl2

transformers<=4.51.3

vision

BytedanceDouyinContent/SAIL-VL2-8B-Thinking

LLM-Research/Phi-3-vision-128k-instruct

phi3_vision

phi3_vision

transformers>=4.36

vision

microsoft/Phi-3-vision-128k-instruct

LLM-Research/Phi-3.5-vision-instruct

phi3_vision

phi3_vision

transformers>=4.36

vision

microsoft/Phi-3.5-vision-instruct

LLM-Research/Phi-4-multimodal-instruct

phi4_multimodal

phi4_multimodal

transformers>=4.36,<4.49, backoff, soundfile

vision, audio

microsoft/Phi-4-multimodal-instruct

AI-ModelScope/Florence-2-base-ft

florence

florence

-

vision

microsoft/Florence-2-base-ft

AI-ModelScope/Florence-2-base

florence

florence

-

vision

microsoft/Florence-2-base

AI-ModelScope/Florence-2-large

florence

florence

-

vision

microsoft/Florence-2-large

AI-ModelScope/Florence-2-large-ft

florence

florence

-

vision

microsoft/Florence-2-large-ft

AI-ModelScope/Idefics3-8B-Llama3

idefics3

idefics3

transformers>=4.45

vision

HuggingFaceM4/Idefics3-8B-Llama3

AI-ModelScope/paligemma-3b-pt-224

paligemma

paligemma

transformers>=4.41

vision

google/paligemma-3b-pt-224

AI-ModelScope/paligemma-3b-pt-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma-3b-pt-448

AI-ModelScope/paligemma-3b-pt-896

paligemma

paligemma

transformers>=4.41

vision

google/paligemma-3b-pt-896

AI-ModelScope/paligemma-3b-mix-224

paligemma

paligemma

transformers>=4.41

vision

google/paligemma-3b-mix-224

AI-ModelScope/paligemma-3b-mix-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma-3b-mix-448

AI-ModelScope/paligemma2-3b-pt-224

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-3b-pt-224

AI-ModelScope/paligemma2-3b-pt-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-3b-pt-448

AI-ModelScope/paligemma2-3b-pt-896

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-3b-pt-896

AI-ModelScope/paligemma2-10b-pt-224

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-10b-pt-224

AI-ModelScope/paligemma2-10b-pt-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-10b-pt-448

AI-ModelScope/paligemma2-10b-pt-896

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-10b-pt-896

AI-ModelScope/paligemma2-28b-pt-224

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-28b-pt-224

AI-ModelScope/paligemma2-28b-pt-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-28b-pt-448

AI-ModelScope/paligemma2-28b-pt-896

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-28b-pt-896

AI-ModelScope/paligemma2-3b-ft-docci-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-3b-ft-docci-448

AI-ModelScope/paligemma2-10b-ft-docci-448

paligemma

paligemma

transformers>=4.41

vision

google/paligemma2-10b-ft-docci-448

LLM-Research/Molmo-7B-O-0924

molmo

molmo

transformers>=4.45

vision

allenai/Molmo-7B-O-0924

LLM-Research/Molmo-7B-D-0924

molmo

molmo

transformers>=4.45

vision

allenai/Molmo-7B-D-0924

LLM-Research/Molmo-72B-0924

molmo

molmo

transformers>=4.45

vision

allenai/Molmo-72B-0924

LLM-Research/MolmoE-1B-0924

molmoe

molmo

transformers>=4.45

vision

allenai/MolmoE-1B-0924

AI-ModelScope/pixtral-12b

pixtral

pixtral

transformers>=4.45

vision

mistral-community/pixtral-12b

InfiniAI/Megrez-3B-Omni

megrez_omni

megrez_omni

-

vision, audio

Infinigence/Megrez-3B-Omni

bytedance-research/Valley-Eagle-7B

valley

valley

transformers>=4.42, av

vision

-

LLM-Research/gemma-3-4b-pt

gemma3_vision

gemma3_vision

transformers>=4.49

-

google/gemma-3-4b-pt

LLM-Research/gemma-3-4b-it

gemma3_vision

gemma3_vision

transformers>=4.49

-

google/gemma-3-4b-it

LLM-Research/gemma-3-12b-pt

gemma3_vision

gemma3_vision

transformers>=4.49

-

google/gemma-3-12b-pt

LLM-Research/gemma-3-12b-it

gemma3_vision

gemma3_vision

transformers>=4.49

-

google/gemma-3-12b-it

LLM-Research/gemma-3-27b-pt

gemma3_vision

gemma3_vision

transformers>=4.49

-

google/gemma-3-27b-pt

LLM-Research/gemma-3-27b-it

gemma3_vision

gemma3_vision

transformers>=4.49

-

google/gemma-3-27b-it

google/gemma-3n-E2B

gemma3n

gemma3n

transformers>=4.53.1

-

google/gemma-3n-E2B

google/gemma-3n-E4B

gemma3n

gemma3n

transformers>=4.53.1

-

google/gemma-3n-E4B

google/gemma-3n-E2B-it

gemma3n

gemma3n

transformers>=4.53.1

-

google/gemma-3n-E2B-it

google/gemma-3n-E4B-it

gemma3n

gemma3n

transformers>=4.53.1

-

google/gemma-3n-E4B-it

mistralai/Mistral-Small-3.1-24B-Base-2503

mistral_2503

mistral_2503

transformers>=4.49

-

mistralai/Mistral-Small-3.1-24B-Base-2503

mistralai/Mistral-Small-3.1-24B-Instruct-2503

mistral_2503

mistral_2503

transformers>=4.49

-

mistralai/Mistral-Small-3.1-24B-Instruct-2503

数据集

下表介绍了ms-swift接入的数据集的相关信息:

  • Dataset ID: ModelScope数据集id

  • HF Dataset ID: HuggingFace数据集id

  • Subset Name: 子数据集名称

  • Dataset Size: 数据集大小

  • Statistic: 数据集的统计量. 我们使用token数进行统计, 这对于调整max_length超参数有帮助. 我们使用qwen2.5的tokenizer对数据集进行分词. 不同的tokenizer的统计量不同, 如果你要获取其他的模型的tokenizer的token统计量, 可以通过脚本自行获取.

  • Tags: 数据集的tags

Dataset ID Subset Name Dataset Size Statistic (token) Tags HF Dataset ID
AI-MO/NuminaMath-1.5 default 896215 116.1±80.8, min=31, max=5064 grpo, math AI-MO/NuminaMath-1.5
AI-MO/NuminaMath-CoT default 859494 113.1±60.2, min=35, max=2120 grpo, math AI-MO/NuminaMath-CoT
AI-MO/NuminaMath-TIR default 72441 100.9±52.2, min=36, max=1683 grpo, math, 🔥 AI-MO/NuminaMath-TIR
AI-ModelScope/COIG-CQIA chinese_traditional
coig_pc
exam
finance
douban
human_value
logi_qa
ruozhiba
segmentfault
wiki
wikihow
xhs
zhihu
44694 331.2±693.8, min=34, max=19288 general, 🔥 -
AI-ModelScope/CodeAlpaca-20k default 20022 99.3±57.6, min=30, max=857 code, en HuggingFaceH4/CodeAlpaca_20K
AI-ModelScope/DISC-Law-SFT default 166758 1799.0±474.9, min=769, max=3151 chat, law, 🔥 ShengbinYue/DISC-Law-SFT
AI-ModelScope/DISC-Med-SFT default 464885 426.5±178.7, min=110, max=1383 chat, medical, 🔥 Flmc/DISC-Med-SFT
AI-ModelScope/Duet-v0.5 default 5000 1157.4±189.3, min=657, max=2344 CoT, en G-reen/Duet-v0.5
AI-ModelScope/GuanacoDataset default 31563 250.3±70.6, min=95, max=987 chat, zh JosephusCheung/GuanacoDataset
AI-ModelScope/LLaVA-Instruct-150K default 623302 630.7±143.0, min=301, max=1166 chat, multi-modal, vision -
AI-ModelScope/LLaVA-Pretrain default huge dataset - chat, multi-modal, quality liuhaotian/LLaVA-Pretrain
AI-ModelScope/LaTeX_OCR default
human_handwrite
human_handwrite_print
synthetic_handwrite
small
162149 117.6±44.9, min=41, max=312 chat, ocr, multi-modal, vision linxy/LaTeX_OCR
AI-ModelScope/LongAlpaca-12k default 11998 9941.8±3417.1, min=4695, max=25826 long-sequence, QA Yukang/LongAlpaca-12k
AI-ModelScope/M3IT coco
vqa-v2
shapes
shapes-rephrased
coco-goi-rephrased
snli-ve
snli-ve-rephrased
okvqa
a-okvqa
viquae
textcap
docvqa
science-qa
imagenet
imagenet-open-ended
imagenet-rephrased
coco-goi
clevr
clevr-rephrased
nlvr
coco-itm
coco-itm-rephrased
vsr
vsr-rephrased
mocheg
mocheg-rephrased
coco-text
fm-iqa
activitynet-qa
msrvtt
ss
coco-cn
refcoco
refcoco-rephrased
multi30k
image-paragraph-captioning
visual-dialog
visual-dialog-rephrased
iqa
vcr
visual-mrc
ivqa
msrvtt-qa
msvd-qa
gqa
text-vqa
ocr-vqa
st-vqa
flickr8k-cn
huge dataset - chat, multi-modal, vision -
AI-ModelScope/MATH-lighteval default 7500 104.4±92.8, min=36, max=1683 grpo, math DigitalLearningGmbH/MATH-lighteval
AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese default 200000 448.4±223.5, min=87, max=4098 chat, sft, 🔥, zh Magpie-Align/Magpie-Qwen2-Pro-200K-Chinese
AI-ModelScope/Magpie-Qwen2-Pro-200K-English default 200000 609.9±277.1, min=257, max=4098 chat, sft, 🔥, en Magpie-Align/Magpie-Qwen2-Pro-200K-English
AI-ModelScope/Magpie-Qwen2-Pro-300K-Filtered default 300000 556.6±288.6, min=175, max=4098 chat, sft, 🔥 Magpie-Align/Magpie-Qwen2-Pro-300K-Filtered
AI-ModelScope/MathInstruct default 262040 253.3±177.4, min=42, max=2193 math, cot, en, quality TIGER-Lab/MathInstruct
AI-ModelScope/MovieChat-1K-test default 162 39.7±2.0, min=32, max=43 chat, multi-modal, video Enxin/MovieChat-1K-test
AI-ModelScope/Open-Platypus default 24926 389.0±256.4, min=55, max=3153 chat, math, quality garage-bAInd/Open-Platypus
AI-ModelScope/OpenO1-SFT default 125894 1080.7±622.9, min=145, max=11637 chat, general, o1 O1-OPEN/OpenO1-SFT
AI-ModelScope/OpenOrca default
3_5M
huge dataset - chat, multilingual, general -
AI-ModelScope/OpenOrca-Chinese default huge dataset - QA, zh, general, quality yys/OpenOrca-Chinese
AI-ModelScope/SFT-Nectar default 131201 441.9±307.0, min=45, max=3136 cot, en, quality AstraMindAI/SFT-Nectar
AI-ModelScope/ShareGPT-4o image_caption 57289 599.8±140.4, min=214, max=1932 vqa, multi-modal OpenGVLab/ShareGPT-4o
AI-ModelScope/ShareGPT4V ShareGPT4V
ShareGPT4V-PT
huge dataset - chat, multi-modal, vision -
AI-ModelScope/SkyPile-150B default huge dataset - pretrain, quality, zh Skywork/SkyPile-150B
AI-ModelScope/WizardLM_evol_instruct_V2_196k default 109184 483.3±338.4, min=27, max=3735 chat, en WizardLM/WizardLM_evol_instruct_V2_196k
AI-ModelScope/alpaca-cleaned default 51760 170.1±122.9, min=29, max=1028 chat, general, bench, quality yahma/alpaca-cleaned
AI-ModelScope/alpaca-gpt4-data-en default 52002 167.6±123.9, min=29, max=607 chat, general, 🔥 vicgalle/alpaca-gpt4
AI-ModelScope/alpaca-gpt4-data-zh default 48818 157.2±93.2, min=27, max=544 chat, general, 🔥 llm-wizard/alpaca-gpt4-data-zh
AI-ModelScope/blossom-math-v2 default 10000 175.4±59.1, min=35, max=563 chat, math, 🔥 Azure99/blossom-math-v2
AI-ModelScope/captcha-images default 8000 47.0±0.0, min=47, max=47 chat, multi-modal, vision -
AI-ModelScope/chartqa_digit_r1v_format default 11399 48.3±5.1, min=37, max=82 grpo zyang39/chartqa_digit_r1v_format
AI-ModelScope/clevr_cogen_a_train default 70000 67.0±0.0, min=67, max=67 qa, math, vision, grpo leonardPKU/clevr_cogen_a_train
AI-ModelScope/coco default huge dataset - multi-modal, en, vqa, quality detection-datasets/coco
AI-ModelScope/databricks-dolly-15k default 15011 199.0±268.8, min=26, max=5987 multi-task, en, quality databricks/databricks-dolly-15k
AI-ModelScope/deepctrl-sft-data default
en
huge dataset - chat, general, sft, multi-round -
AI-ModelScope/egoschema default
cls
101 191.6±80.7, min=96, max=435 chat, multi-modal, video lmms-lab/egoschema
AI-ModelScope/firefly-train-1.1M default 1649399 204.3±365.3, min=28, max=9306 chat, general YeungNLP/firefly-train-1.1M
AI-ModelScope/function-calling-chatml default 112958 465.3±320.1, min=36, max=6106 agent, en, sft, 🔥 Locutusque/function-calling-chatml
AI-ModelScope/generated_chat_0.4M default 396004 272.7±51.1, min=78, max=579 chat, character-dialogue BelleGroup/generated_chat_0.4M
AI-ModelScope/guanaco_belle_merge_v1.0 default 693987 133.8±93.5, min=30, max=1872 QA, zh Chinese-Vicuna/guanaco_belle_merge_v1.0
AI-ModelScope/hh-rlhf helpful-base
helpful-online
helpful-rejection-sampled
huge dataset - rlhf, dpo -
AI-ModelScope/hh_rlhf_cn hh_rlhf
harmless_base_cn
harmless_base_en
helpful_base_cn
helpful_base_en
362909 142.3±107.5, min=25, max=1571 rlhf, dpo, 🔥 -
AI-ModelScope/lawyer_llama_data default 21476 224.4±83.9, min=69, max=832 chat, law Skepsun/lawyer_llama_data
AI-ModelScope/leetcode-solutions-python default 2359 723.8±233.5, min=259, max=2117 chat, coding, 🔥 -
AI-ModelScope/lmsys-chat-1m default 166211 545.8±3272.8, min=22, max=219116 chat, em lmsys/lmsys-chat-1m
AI-ModelScope/math-trn-format default 11500 102.2±88.9, min=36, max=1683 math -
AI-ModelScope/ms_agent_for_agentfabric default
addition
30000 615.7±198.7, min=251, max=2055 chat, agent, multi-round, 🔥 -
AI-ModelScope/orpo-dpo-mix-40k default 43666 938.1±694.2, min=36, max=8483 dpo, orpo, en, quality mlabonne/orpo-dpo-mix-40k
AI-ModelScope/pile default huge dataset - pretrain EleutherAI/pile
AI-ModelScope/ruozhiba post-annual
title-good
title-norm
85658 40.0±18.3, min=22, max=559 pretrain, 🔥 -
AI-ModelScope/school_math_0.25M default 248481 158.8±73.4, min=39, max=980 chat, math, quality BelleGroup/school_math_0.25M
AI-ModelScope/sharegpt_gpt4 default
V3_format
zh_38K_format
103329 3476.6±5959.0, min=33, max=115132 chat, multilingual, general, multi-round, gpt4, 🔥 -
AI-ModelScope/sql-create-context default 78577 82.7±31.5, min=36, max=282 chat, sql, 🔥 b-mc2/sql-create-context
AI-ModelScope/stack-exchange-paired default huge dataset - hfrl, dpo, pairwise lvwerra/stack-exchange-paired
AI-ModelScope/starcoderdata default huge dataset - pretrain, quality bigcode/starcoderdata
AI-ModelScope/synthetic_text_to_sql default 100000 221.8±69.9, min=64, max=616 nl2sql, en gretelai/synthetic_text_to_sql
AI-ModelScope/texttosqlv2_25000_v2 default 25000 277.3±328.3, min=40, max=1971 chat, sql Clinton/texttosqlv2_25000_v2
AI-ModelScope/the-stack default huge dataset - pretrain, quality bigcode/the-stack
AI-ModelScope/tigerbot-law-plugin default 55895 104.9±51.0, min=43, max=1087 text-generation, law, pretrained TigerResearch/tigerbot-law-plugin
AI-ModelScope/train_0.5M_CN default 519255 128.4±87.4, min=31, max=936 common, zh, quality BelleGroup/train_0.5M_CN
AI-ModelScope/train_1M_CN default huge dataset - common, zh, quality BelleGroup/train_1M_CN
AI-ModelScope/train_2M_CN default huge dataset - common, zh, quality BelleGroup/train_2M_CN
AI-ModelScope/tulu-v2-sft-mixture default 326154 523.3±439.3, min=68, max=2549 chat, multilingual, general, multi-round allenai/tulu-v2-sft-mixture
AI-ModelScope/ultrafeedback-binarized-preferences-cleaned-kto default 230720 471.5±274.3, min=27, max=2232 rlhf, kto -
AI-ModelScope/webnovel_cn default 50000 1455.2±12489.4, min=524, max=490480 chat, novel zxbsmk/webnovel_cn
AI-ModelScope/wikipedia-cn-20230720-filtered default huge dataset - pretrain, quality pleisto/wikipedia-cn-20230720-filtered
AI-ModelScope/zhihu_rlhf_3k default 3460 594.5±365.9, min=31, max=1716 rlhf, dpo, zh liyucheng/zhihu_rlhf_3k
DAMO_NLP/jd default
cls
45012 66.9±87.0, min=41, max=1699 text-generation, classification, 🔥 -
FreedomIntelligence/medical-o1-reasoning-SFT en
zh
50143 98.0±53.6, min=36, max=1508 medical, o1, 🔥 FreedomIntelligence/medical-o1-reasoning-SFT
- default huge dataset - pretrain, quality HuggingFaceFW/fineweb
- auto_math_text
khanacademy
openstax
stanford
stories
web_samples_v1
web_samples_v2
wikihow
huge dataset - multi-domain, en, qa HuggingFaceTB/cosmopedia
HumanLLMs/Human-Like-DPO-Dataset default 10884 47.5±7.9, min=32, max=85 rlhf, dpo HumanLLMs/Human-Like-DPO-Dataset
LLM-Research/xlam-function-calling-60k default
grpo
120000 453.7±219.5, min=164, max=2779 agent, grpo, 🔥 Salesforce/xlam-function-calling-60k
MTEB/scidocs-reranking default 39193 41.9±5.8, min=31, max=107 rerank, 🔥 mteb/scidocs-reranking
MTEB/stackoverflowdupquestions-reranking default 26485 39.9±4.6, min=31, max=77 rerank, 🔥 mteb/stackoverflowdupquestions-reranking
OmniData/Zhihu-KOL default huge dataset - zhihu, qa wangrui6/Zhihu-KOL
OmniData/Zhihu-KOL-More-Than-100-Upvotes default 271261 1003.4±1826.1, min=28, max=52541 zhihu, qa bzb2023/Zhihu-KOL-More-Than-100-Upvotes
PowerInfer/LONGCOT-Refine-500K default 521921 296.5±158.4, min=39, max=4634 chat, sft, 🔥, cot PowerInfer/LONGCOT-Refine-500K
PowerInfer/QWQ-LONGCOT-500K default 498082 310.7±303.1, min=35, max=22941 chat, sft, 🔥, cot PowerInfer/QWQ-LONGCOT-500K
ServiceNow-AI/R1-Distill-SFT v0
v1
1850809 164.2±438.0, min=30, max=32469 chat, sft, cot, r1 ServiceNow-AI/R1-Distill-SFT
TIGER-Lab/MATH-plus train 893929 301.4±196.7, min=50, max=1162 qa, math, en, quality TIGER-Lab/MATH-plus
Tongyi-DataEngine/SA1B-Dense-Caption default huge dataset - zh, multi-modal, vqa -
Tongyi-DataEngine/SA1B-Paired-Captions-Images default 7736284 106.4±18.5, min=48, max=193 zh, multi-modal, vqa -
YorickHe/CoT default 74771 141.6±45.5, min=58, max=410 chat, general -
YorickHe/CoT_zh default 74771 129.1±53.2, min=51, max=401 chat, general -
ZhipuAI/LongWriter-6k default 6000 5009.0±2932.8, min=117, max=30354 long, chat, sft, 🔥 zai-org/LongWriter-6k
- default huge dataset - pretrain, quality allenai/c4
bespokelabs/Bespoke-Stratos-17k default 16710 480.7±236.1, min=266, max=3556 chat, sft, cot, r1 bespokelabs/Bespoke-Stratos-17k
- default huge dataset - pretrain, quality cerebras/SlimPajama-627B
codefuse-ai/CodeExercise-Python-27k default 27224 337.3±154.2, min=90, max=2826 chat, coding, 🔥 -
codefuse-ai/Evol-instruction-66k default 66862 440.1±208.4, min=46, max=2661 chat, coding, 🔥 -
damo/MSAgent-Bench default
mini
638149 859.2±460.1, min=38, max=3479 chat, agent, multi-round -
damo/nlp_polylm_multialpaca_sft ar
de
es
fr
id
ja
ko
pt
ru
th
vi
131867 101.6±42.5, min=30, max=1029 chat, general, multilingual -
damo/zh_cls_fudan-news default 4959 3234.4±2547.5, min=91, max=19548 chat, classification -
damo/zh_ner-JAVE default 1266 118.3±45.5, min=44, max=223 chat, ner -
hjh0119/shareAI-Llama3-DPO-zh-en-emoji default 2449 334.0±162.8, min=36, max=1801 rlhf, dpo shareAI/DPO-zh-en-emoji
huangjintao/AgentInstruct_copy alfworld
db
kg
mind2web
os
webshop
1866 1144.3±635.5, min=206, max=6412 chat, agent, multi-round -
iic/100PoisonMpts default 906 150.6±80.8, min=39, max=656 poison-management, zh -
iic/DocQA-RL-1.6K default 1591 8307.3±7748.9, min=202, max=32563 docqa, rl, long-sequence Tongyi-Zhiwen/DocQA-RL-1.6K
iic/MSAgent-MultiRole default 543 413.0±79.7, min=70, max=936 chat, agent, multi-round, role-play, multi-agent -
iic/MSAgent-Pro default 21910 1978.1±747.9, min=339, max=8064 chat, agent, multi-round, 🔥 -
iic/ms_agent default 30000 645.8±218.0, min=199, max=2070 chat, agent, multi-round, 🔥 -
iic/ms_bench default 316820 353.4±424.5, min=29, max=2924 chat, general, multi-round, 🔥 -
liucong/Chinese-DeepSeek-R1-Distill-data-110k-SFT default 110000 72.1±60.9, min=29, max=2315 chat, sft, cot, r1, 🔥 Congliu/Chinese-DeepSeek-R1-Distill-data-110k-SFT
- default huge dataset - multi-modal, en, vqa, quality lmms-lab/GQA
- 0_30_s_academic_v0_1
0_30_s_youtube_v0_1
1_2_m_academic_v0_1
1_2_m_youtube_v0_1
2_3_m_academic_v0_1
2_3_m_youtube_v0_1
30_60_s_academic_v0_1
30_60_s_youtube_v0_1
1335486 273.7±78.8, min=107, max=638 chat, multi-modal, video lmms-lab/LLaVA-Video-178K
lmms-lab/multimodal-open-r1-8k-verified default 7689 74.0±24.8, min=41, max=214 grpo, vision, 🔥 lmms-lab/multimodal-open-r1-8k-verified
lvjianjin/AdvertiseGen default 97484 130.9±21.9, min=73, max=232 text-generation, 🔥 shibing624/AdvertiseGen
mapjack/openwebtext_dataset default huge dataset - pretrain, zh, quality -
modelscope/DuReader_robust-QG default 17899 242.0±143.1, min=75, max=1416 text-generation, 🔥 -
modelscope/MathR default
clean
6089 188.7±75.3, min=64, max=3341 qa, math -
modelscope/MathR-32B-Distill data 25921 209.4±63.1, min=121, max=3407 qa, math -
modelscope/chinese-poetry-collection default 1710 58.1±8.1, min=31, max=71 text-generation, poetry -
modelscope/clue cmnli 391783 81.6±16.0, min=54, max=157 text-generation, classification clue
modelscope/coco_2014_caption train
validation
454617 389.6±68.4, min=70, max=587 chat, multi-modal, vision, 🔥 -
modelscope/gsm8k main 7473 88.6±21.6, min=41, max=241 qa, math -
open-r1/verifiable-coding-problems-python default 35735 559.0±255.2, min=74, max=6191 grpo, code open-r1/verifiable-coding-problems-python
open-r1/verifiable-coding-problems-python-10k default 1800 581.6±233.4, min=136, max=2022 grpo, code open-r1/verifiable-coding-problems-python-10k
open-r1/verifiable-coding-problems-python-10k_decontaminated default 1574 575.7±234.3, min=136, max=2022 grpo, code open-r1/verifiable-coding-problems-python-10k_decontaminated
open-r1/verifiable-coding-problems-python_decontaminated default 27839 561.9±252.2, min=74, max=6191 grpo, code open-r1/verifiable-coding-problems-python_decontaminated
open-thoughts/OpenThoughts-114k default 113957 413.2±186.9, min=265, max=13868 chat, sft, cot, r1 open-thoughts/OpenThoughts-114k
swift/self-cognition default
qwen3
empty_think
108 58.9±20.3, min=32, max=131 chat, self-cognition, 🔥 modelscope/self-cognition
sentence-transformers/stsb default
positive
generate
reg
5748 21.0±0.0, min=21, max=21 similarity, 🔥 sentence-transformers/stsb
shenweizhou/alpha-umi-toolbench-processed-v2 backbone
caller
planner
summarizer
huge dataset - chat, agent, 🔥 -
simpleai/HC3 finance
finance_cls
medicine
medicine_cls
11021 296.0±153.3, min=65, max=2267 text-generation, classification, 🔥 Hello-SimpleAI/HC3
simpleai/HC3-Chinese baike
baike_cls
open_qa
open_qa_cls
nlpcc_dbqa
nlpcc_dbqa_cls
finance
finance_cls
medicine
medicine_cls
law
law_cls
psychology
psychology_cls
39781 179.9±70.2, min=90, max=1070 text-generation, classification, 🔥 Hello-SimpleAI/HC3-Chinese
speech_asr/speech_asr_aishell1_trainsets train
validation
test
141600 40.8±3.3, min=33, max=53 chat, multi-modal, audio -
swift/A-OKVQA default 18201 43.5±7.9, min=27, max=94 multi-modal, en, vqa, quality HuggingFaceM4/A-OKVQA
swift/ChartQA default 28299 36.8±6.5, min=26, max=74 en, vqa, quality HuggingFaceM4/ChartQA
swift/Chinese-Qwen3-235B-2507-Distill-data-110k-SFT default 110000 72.1±60.9, min=29, max=2315 🔥, distill, sft -
swift/Chinese-Qwen3-235B-Thinking-2507-Distill-data-110k-SFT default 110000 72.1±60.9, min=29, max=2315 🔥, distill, sft, cot, r1, thinking -
swift/GRIT caption
grounding
vqa
huge dataset - multi-modal, en, caption-grounding, vqa, quality zzliang/GRIT
swift/GenQA default huge dataset - qa, quality, multi-task tomg-group-umd/GenQA
swift/Infinity-Instruct 3M
7M
0625
Gen
7M_domains
huge dataset - qa, quality, multi-task BAAI/Infinity-Instruct
swift/Mantis-Instruct birds-to-words
chartqa
coinstruct
contrastive_caption
docvqa
dreamsim
dvqa
iconqa
imagecode
llava_665k_multi
lrv_multi
multi_vqa
nextqa
nlvr2
spot-the-diff
star
visual_story_telling
988115 619.9±156.6, min=243, max=1926 chat, multi-modal, vision -
swift/MideficsDataset default 3800 201.3±70.2, min=60, max=454 medical, en, vqa WinterSchool/MideficsDataset
swift/Multimodal-Mind2Web default 1009 293855.4±331149.5, min=11301, max=3577519 agent, multi-modal osunlp/Multimodal-Mind2Web
swift/OCR-VQA default 186753 32.3±5.8, min=27, max=80 multi-modal, en, ocr-vqa howard-hou/OCR-VQA
swift/OK-VQA_train default 9009 31.7±3.4, min=25, max=56 multi-modal, en, vqa, quality Multimodal-Fatima/OK-VQA_train
swift/OpenHermes-2.5 default huge dataset - cot, en, quality teknium/OpenHermes-2.5
swift/RLAIF-V-Dataset default 83132 99.6±54.8, min=30, max=362 rlhf, dpo, multi-modal, en openbmb/RLAIF-V-Dataset
swift/RedPajama-Data-1T default huge dataset - pretrain, quality togethercomputer/RedPajama-Data-1T
swift/RedPajama-Data-V2 default huge dataset - pretrain, quality togethercomputer/RedPajama-Data-V2
swift/ScienceQA default 16967 101.7±55.8, min=32, max=620 multi-modal, science, vqa, quality derek-thomas/ScienceQA
swift/SlimOrca default 517982 405.5±442.1, min=47, max=8312 quality, en Open-Orca/SlimOrca
swift/TextCaps default
emb
huge dataset - multi-modal, en, caption, quality HuggingFaceM4/TextCaps
swift/ToolBench default 124345 2251.7±1039.8, min=641, max=9451 chat, agent, multi-round -
swift/VQAv2 default huge dataset - en, vqa, quality HuggingFaceM4/VQAv2
swift/VideoChatGPT Generic
Temporal
Consistency
3206 87.4±48.3, min=31, max=398 chat, multi-modal, video, 🔥 lmms-lab/VideoChatGPT
swift/WebInstructSub default huge dataset - qa, en, math, quality, multi-domain, science TIGER-Lab/WebInstructSub
swift/aya_collection aya_dataset 202364 474.6±1539.1, min=25, max=71312 multi-lingual, qa CohereForAI/aya_collection
swift/chinese-c4 default huge dataset - pretrain, zh, quality shjwudp/chinese-c4
swift/cinepile default huge dataset - vqa, en, youtube, video tomg-group-umd/cinepile
swift/classical_chinese_translate default 6655 349.3±77.1, min=61, max=815 chat, play-ground -
swift/cosmopedia-100k default 100000 1037.0±254.8, min=339, max=2818 multi-domain, en, qa HuggingFaceTB/cosmopedia-100k
swift/dolma v1_7 huge dataset - pretrain, quality allenai/dolma
swift/dolphin flan1m-alpaca-uncensored
flan5m-alpaca-uncensored
huge dataset - en cognitivecomputations/dolphin
swift/github-code default huge dataset - pretrain, quality codeparrot/github-code
swift/gpt4v-dataset default huge dataset - en, caption, multi-modal, quality laion/gpt4v-dataset
swift/llava-data llava_instruct 624255 369.7±143.0, min=40, max=905 sft, multi-modal, quality TIGER-Lab/llava-data
swift/llava-instruct-mix-vsft default 13640 178.8±119.8, min=34, max=951 multi-modal, en, vqa, quality HuggingFaceH4/llava-instruct-mix-vsft
swift/llava-med-zh-instruct-60k default 56649 207.9±67.7, min=42, max=594 zh, medical, vqa, multi-modal BUAADreamer/llava-med-zh-instruct-60k
swift/lnqa default huge dataset - multi-modal, en, ocr-vqa, quality vikhyatk/lnqa
swift/longwriter-6k-filtered default 666 4108.9±2636.9, min=1190, max=17050 long, chat, sft, 🔥 -
swift/medical_zh en
zh
2068589 256.4±87.3, min=39, max=1167 chat, medical -
swift/moondream2-coyo-5M-captions default huge dataset - caption, pretrain, quality isidentical/moondream2-coyo-5M-captions
swift/no_robots default 9485 300.0±246.2, min=40, max=6739 multi-task, quality, human-annotated HuggingFaceH4/no_robots
swift/orca_dpo_pairs default 12859 364.9±248.2, min=36, max=2010 rlhf, quality Intel/orca_dpo_pairs
swift/path-vqa default 19654 34.2±6.8, min=28, max=85 multi-modal, vqa, medical flaviagiammarino/path-vqa
swift/pile-val-backup default 214661 1831.4±11087.5, min=21, max=516620 text-generation, awq mit-han-lab/pile-val-backup
swift/pixelprose default huge dataset - caption, multi-modal, vision tomg-group-umd/pixelprose
swift/refcoco caption
grounding
92430 45.4±3.0, min=37, max=63 multi-modal, en, grounding jxu124/refcoco
swift/refcocog caption
grounding
89598 50.3±4.6, min=39, max=91 multi-modal, en, grounding jxu124/refcocog
swift/sharegpt common-zh
unknow-zh
common-en
194063 820.5±366.1, min=25, max=2221 chat, general, multi-round -
swift/swift-sft-mixture sharegpt
firefly
codefuse
metamathqa
huge dataset - chat, sft, general, 🔥 -
swift/tagengo-gpt4 default 76437 468.1±276.8, min=28, max=1726 chat, multi-lingual, quality lightblue/tagengo-gpt4
swift/train_3.5M_CN default huge dataset - common, zh, quality BelleGroup/train_3.5M_CN
swift/ultrachat_200k default 207843 1188.0±571.1, min=170, max=4068 chat, en, quality HuggingFaceH4/ultrachat_200k
swift/wikipedia default huge dataset - pretrain, quality wikipedia
tany0699/garbage265 default 132673 39.0±0.0, min=39, max=39 cls, 🔥, multi-modal -
tastelikefeet/competition_math default 12000 101.9±87.3, min=36, max=1683 qa, math -
- default huge dataset - pretrain, quality tiiuae/falcon-refinedweb
wyj123456/GPT4all default 806199 97.3±20.9, min=62, max=414 chat, general -
wyj123456/code_alpaca_en default 20022 99.3±57.6, min=30, max=857 chat, coding sahil2801/CodeAlpaca-20k
wyj123456/finance_en default 68912 264.5±207.1, min=30, max=2268 chat, financial ssbuild/alpaca_finance_en
wyj123456/instinwild default
subset
103695 125.1±43.7, min=35, max=801 chat, general -
wyj123456/instruct default 888970 271.0±333.6, min=34, max=3967 chat, general -
zouxuhong/Countdown-Tasks-3to4 default 490364 126.6±2.0, min=122, max=130 math -