# Embedding Training SWIFT has already supported the training of embedding models, including both pure text and multimodal types. Currently supported models include: 1. modernbert embedding model - [ModelScope](https://modelscope.cn/models/iic/gte-modernbert-base) [Hugging Face](https://huggingface.co/Alibaba-NLP/gte-modernbert-base) 2. gte embedding models - 1.5B: [ModelScope](https://www.modelscope.cn/models/iic/gte_Qwen2-1.5B-instruct) [Hugging Face](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) - 7B: [ModelScope](https://www.modelscope.cn/models/iic/gte_Qwen2-7B-instruct) [Hugging Face](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) 3. gme embedding models - 2B: [ModelScope](https://www.modelscope.cn/models/iic/gme-Qwen2-VL-2B-Instruct) [Hugging Face](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct) - 7B: [ModelScope](https://www.modelscope.cn/models/iic/gme-Qwen2-VL-7B-Instruct) [Hugging Face](https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-7B-Instruct) 4. qwen3-embedding models - 0.6B: [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-0.6B) [Hugging Face](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B) - 4B: [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-4B) [Hugging Face](https://huggingface.co/Qwen/Qwen3-Embedding-4B) - 8B: [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen3-Embedding-8B) [Hugging Face](https://huggingface.co/Qwen/Qwen3-Embedding-8B) Developers can integrate their own models by ensuring the model forward output satisfies: ```text {"last_hidden_state": some-embedding-tensor} ``` The return value should be a JSON with a `last_hidden_state` key, where the value is an embedding tensor. For the input part, you can use our already supported templates. Users can also specify the ```shell --task_type embedding ``` parameter to convert any other model into an embedding model for training. It should be noted that the embedding models currently supported by SWIFT are all based on pure text or multimodal LLMs, and CLIP-type model training is not currently supported. Additionally, all embedding models supported by SWIFT have normalization added at the end of the model forward pass. If you add new models yourself, please remember to include a normalization layer. ## Loss The Embedding models supported by SWIFT currently can use the following loss functions: - **cosine_similarity**: Cosine similarity loss, which calculates the similarity between two embeddings and fits based on the label value. It is effectively an MSE loss. - **contrastive**: Contrastive learning loss with adjustable margin. Labels are only supported as 0 and 1. - **online_contrastive**: Contrastive loss considering hard negatives and hard positives. Labels are only supported as 0 and 1. - **infonce**: Computes pairwise cosine similarities between different rows within the same batch, maximizing similarity within rows and minimizing similarity between different rows. No labels are required. The source code for the loss functions can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/loss.py). ## Dataset Format > **Note:** > 1. The `` tag can appear anywhere inside `messages`/`positive_messages`/`negative_messages`. Each group has its own image fields: `images`/`positive_images`/`negative_images` to provide paths or URLs. > 2. There is no longer any cross-field ordering requirement. Alignment rules: > - `images` length equals the number of `` tags in `messages`. > - `positive_images` and `negative_images` are both list-of-list. Their outer lengths equal the lengths of `positive_messages` and `negative_messages` respectively. For each outer item, the inner list length equals the number of `` tags in that message sequence. > 3. `messages` is the anchor sample; `positive_messages` and `negative_messages` are each a list of messages (hence one more `[]`). Accordingly, `positive_images`/`negative_images` are also list-of-list and align item-by-item. > 4. `