swift

Get Started

SWIFT安装
快速开始
Web-UI

Instruction

命令行参数
预训练与微调
GRPO
GKD
人类对齐
推理和部署
采样
评测
导出与推送
ray的支持
强化微调
Agent支持
支持的模型和数据集
使用Tuners
常见问题整理

Megatron-SWIFT

快速开始
命令行参数
LoRA训练
多模态模型
Mcore Bridge
GRPO

Customization

自定义模型
自定义数据集
插件化

Best Practices

GRPO完整实验流程
多模态GRPO完整实验流程
GRPO代码训练
Qwen3最佳实践
Qwen3-VL最佳实践
注册多模态模型最佳实践
Embedding训练
Reranker训练
快速训练VL模型
NPU支持
更多最佳实践

swift

GRPO
Advanced Research
查看页面源码

Advanced Research

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Clipped Importance Sampling Policy Optimization (CISPO)
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
Group Sequence Policy Optimization
On-Policy RL Meets Off-Policy Experts: Harmonizing SFT and RL via Dynamic Weighting (CHORD)
REINFORCE Leave-One-Out (RLOO)
REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models
Soft Adaptive Policy Optimization (SAPO)
Training-Inference-Mismatch
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

上一页下一页

© 版权所有 2022-2025, Alibaba ModelScope。

利用 Sphinx 构建，使用的主题由 Read the Docs 开发.