swift

Get Started

  • SWIFT Installation
  • Quick Start
  • Web-UI

Instruction

  • Command Line Parameters
  • Pre-training and Fine-tuning
  • GRPO
    • Get Started
    • Developer Guide
    • Advanced Research
  • RLHF
  • Inference and Deployment
  • Sampling
  • Evaluation
  • Export and Push
  • Reinforced Fine-Tuning
  • Agent Support
  • Supported Models and Datasets
  • Using Tuners
  • Frequently-asked-questions

Megatron-SWIFT

  • Quick Start
  • Command Line Arguments
  • LoRA Training
  • Multimodal Models

Customization

  • Custom Model
  • Custom Dataset
  • Pluginization

Best Practices

  • Complete GRPO Experiment Process
  • Complete Multimodal GRPO Experiment Workflow
  • Code Training with GRPO
  • Qwen3 Best Practices
  • Qwen3-VL Best Practices
  • Best Practices for Registering Multimodal Models
  • Embedding Training
  • Reranker Training
  • Best Practices for Rapidly Training Vision-Language (VL) Models
  • NPU Support
  • More Best Practices
swift
  • GRPO
  • View page source

GRPO

Get Started

  • Get Started
    • GRPO

Developer Guide

  • Developer Guide
    • Multi-turn Training
    • Multi-Task Training
    • Reward Function
    • Reward Model
    • GYM Environment Training

Advanced Research

  • Advanced Research
    • Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
    • DAPO
    • DeepEyes: Incentivizing “Thinking with Images” via Reinforcement Learning
    • Group Sequence Policy Optimization
    • On-Policy RL Meets Off-Policy Experts: Harmonizing SFT and RL via Dynamic Weighting (CHORD)
Previous Next

© Copyright 2022-2025, Alibaba ModelScope.

Built with Sphinx using a theme provided by Read the Docs.