# Qwen3.5 Best Practices ms-swift supports training [Qwen3.5](https://github.com/QwenLM/Qwen3.5) Dense/MoE models using transformers/Megatron backends. Qwen3.5 is a multimodal model with hybrid thinking, combining linear attention and full attention. This article will introduce how to perform inference, instruction fine-tuning, and reinforcement learning on Qwen3.5 Dense/MoE models. ## Environment Setup ```shell pip install -U ms-swift pip install -U "transformers>=5.9" "qwen_vl_utils>=0.0.14" peft liger-kernel # flash-linear-attention # If you encounter slow training issues, please refer to: https://github.com/fla-org/flash-linear-attention/issues/758 # Please use Python 3.12: https://github.com/fla-org/flash-linear-attention/issues/121 pip install -U "flash-linear-attention>=0.4.2" --no-build-isolation # causal_conv1d pip install -U git+https://github.com/Dao-AILab/causal-conv1d --no-build-isolation # flash-attention pip install "flash-attn==2.8.3" --no-build-isolation # deepspeed training pip install deepspeed # vllm (torch2.10) for inference/deployment/RL pip install -U "vllm>=0.17.0" ``` - Qwen3.5 video data training hangs: Using the decord backend to read videos may cause hanging issues, refer to [this issue](https://github.com/dmlc/decord/issues/269). You can use the torchcodec backend, specifically refer to the [qwen_vl_utils](https://github.com/QwenLM/Qwen3-VL/blob/50068df2334f309979ff05d75f1078c8309c63ed/qwen-vl-utils/src/qwen_vl_utils/vision_process.py#L390-L400) library. - If you are using Qwen3.5 on Ascend NPU and want details about the FLA / MindSpeed replacement, effective patch path, and verified version combinations, please refer to [Qwen3.5 FLA Patch Notes in the NPU Support document](./NPU-support.md#qwen35-fla-patch-notes). ## Inference Using ms-swift's `TransformersEngine` for inference: - The meaning of model-specific parameters such as `VIDEO_MAX_TOKEN_NUM` environment variables is the same as Qwen3-VL, refer to [Command-line Parameters Documentation](../Instruction/Command-line-parameters.md#qwen3_vl,qwen3_5). ```python import os # os.environ['SWIFT_DEBUG'] = '1' os.environ['CUDA_VISIBLE_DEVICES'] = '0' os.environ['IMAGE_MAX_TOKEN_NUM'] = '1024' os.environ['VIDEO_MAX_TOKEN_NUM'] = '128' os.environ['FPS_MAX_FRAMES'] = '16' from swift import get_model_processor, get_template from swift.infer_engine import TransformersEngine, InferRequest, RequestConfig model, processor = get_model_processor('Qwen/Qwen3.5-4B') # attn_impl='flash_attention_2' template = get_template(processor, enable_thinking=False) engine = TransformersEngine(model, template=template) infer_request = InferRequest(messages=[{ "role": "user", "content": '