Custom Model
The models built into ms-swift can be used directly by specifying either model_id or model_path: --model <model_id_or_path>. ms-swift determines the model_type based on the suffix of model_id/model_path and the config.json file.
Each model_type has a unique model structure, template, and loading method. Of course, you can also manually override these by passing --model_type and --template. You can check the supported model_type and templates in the Supported Models and Datasets.
The following introduces how to register a new model and its corresponding template.
Model Registration
Custom models are typically implemented using model registration. You can refer to the built-in model, the built-in dialogue template, or the example code in the examples. You can specify the --custom_register_path xxx.py to parse the externally registered content, which is convenient for users installing via pip instead of git clone.
The register_model function registers a model in the MODEL_MAPPING. You can complete the model registration by calling the function register_model(model_meta), where model_meta will store the model’s metadata. The parameter list for ModelMeta is as follows:
model_type: Required. The model type, which is also the unique ID.
model_groups: Required. Lists the ModelScope/HuggingFace model IDs and local paths. Running the run_model_info.py file will automatically generate the supported models documentation and automatically match the model_type based on the
--modelsuffix.template: Required. The default template type when
--templateis not specified in the command line.get_function: Required. The loading function for the model and tokenizer/processor (for multi-modal models). LLM is typically set to
get_model_tokenizer_with_flash_attn.model_arch: The model architecture. Defaults to None. Multi-modal model training requires setting this parameter to determine the prefix for llm/vit/aligner.
architectures: The architectures item in config.json, used to automatically match the model with its model_type. Defaults to
[].additional_saved_files: Files that need to be additionally saved during full parameter training and merge-lora. Defaults to
[].torch_dtype: The default dtype when
torch_dtypeis not passed during model loading. Defaults to None, read from config.json.is_multimodal: Indicates whether the model is multi-modal. Defaults to False.
ignore_patterns: File patterns to be ignored when downloading from the hub. Defaults to
[].
The register_template function registers a dialogue template in TEMPLATE_MAPPING. To complete the registration of the dialogue template, simply call the function register_template(template_meta), where template_meta will store the metadata of the template. The parameter list for TemplateMeta is as follows:
template_type: Required. The type of dialogue template, which also serves as a unique ID.
prefix: Required. The prefix of the dialogue template, usually encompassing parts like system, bos_token, and is generated independently of multi-turn dialogue loops. For example, the prefix for qwen is
[].prompt: Required. Represents the dialogue portion before
{{RESPONSE}}. We use{{QUERY}}as a placeholder for the user’s inquiry part. For example, the prompt for qwen is['<|im_start|>user\n{{QUERY}}<|im_end|>\n<|im_start|>assistant\n'].chat_sep: Required. The separator for each turn in multi-turn dialogues. If set to None, the template does not support multi-turn dialogue. For example, the chat_sep for qwen is
['<|im_end|>\n'].suffix: Defaults to
[['eos_token_id']]. The suffix part of the dialogue template, generated independently of multi-turn dialogue loops, usually the eos_token. For example, the suffix for qwen is['<|im_end|>'].template_cls: Defaults to
Template. Customization is generally required when defining templates for multimodal models, particularly in customizing the_encode,_post_encode, and_data_collatorfunctions.system_prefix: Defaults to None. The prefix for dialogue templates with a system. We use
{{SYSTEM}}as a placeholder for the system. For example, the system_prefix for qwen is['<|im_start|>system\n{{SYSTEM}}<|im_end|>\n'].Note: If the system is empty and
prefixcan be replaced bysystem_prefix, you can writeprefixas a prefix including the system without settingsystem_prefix.If the prefix does not include
{{SYSTEM}}and system_prefix is not set, the template does not support the system.
default_system: Defaults to None. The default system used when
--systemis not provided. For example, the default_system for qwen is'You are a helpful assistant.'.stop_words: Defaults to
[]. Additional stop words besides eos_token andsuffix[-1]. For example, the stop_words for qwen is['<|endoftext|>']Note: During inference, the output response will be filtered by eos_token and
suffix[-1], but additional stop_words will be retained.