swift

Get Started

  • SWIFT安装
  • 快速开始
  • Web-UI

Instruction

  • 命令行参数
  • 预训练与微调
  • GRPO
    • Get Started
    • Developer Guide
    • Advanced Research
      • Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
      • Clipped Importance Sampling Policy Optimization (CISPO)
      • DAPO: An Open-Source LLM Reinforcement Learning System at Scale
      • DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
      • Group Sequence Policy Optimization
      • On-Policy RL Meets Off-Policy Experts: Harmonizing SFT and RL via Dynamic Weighting (CHORD)
      • REINFORCE Leave-One-Out (RLOO)
      • REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models
      • Soft Adaptive Policy Optimization (SAPO)
      • Training-Inference-Mismatch
      • TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
  • GKD
  • 人类对齐
  • 推理和部署
  • 采样
  • 评测
  • 导出与推送
  • ray的支持
  • 强化微调
  • Agent支持
  • 支持的模型和数据集
  • 使用Tuners
  • 常见问题整理

Megatron-SWIFT

  • 快速开始
  • 命令行参数
  • LoRA训练
  • 多模态模型
  • Mcore Bridge
  • GRPO

Customization

  • 自定义模型
  • 自定义数据集
  • 插件化

Best Practices

  • GRPO完整实验流程
  • 多模态GRPO完整实验流程
  • GRPO代码训练
  • Qwen3最佳实践
  • Qwen3-VL最佳实践
  • 注册多模态模型最佳实践
  • Embedding训练
  • Reranker训练
  • 快速训练VL模型
  • NPU支持
  • 更多最佳实践
swift
  • GRPO
  • Advanced Research
  • 查看页面源码

Advanced Research

  • Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
  • Clipped Importance Sampling Policy Optimization (CISPO)
  • DAPO: An Open-Source LLM Reinforcement Learning System at Scale
  • DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
  • Group Sequence Policy Optimization
  • On-Policy RL Meets Off-Policy Experts: Harmonizing SFT and RL via Dynamic Weighting (CHORD)
  • REINFORCE Leave-One-Out (RLOO)
  • REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models
  • Soft Adaptive Policy Optimization (SAPO)
  • Training-Inference-Mismatch
  • TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling
上一页 下一页

© 版权所有 2022-2025, Alibaba ModelScope。

利用 Sphinx 构建,使用的 主题 由 Read the Docs 开发.