Reranker Training

SWIFT supports Reranker model training. Currently supported models include:

modernbert reranker model
- ModelScope Hugging Face
qwen3-reranker model
- 0.6B: ModelScope Hugging Face
- 4B: ModelScope Hugging Face
- 8B: ModelScope Hugging Face

Implementation Methods

SWIFT currently supports two implementation methods for Reranker models, which have significant differences in architecture and loss function computation:

1. Classification Reranker

Applicable Models: modernbert reranker models (e.g., gte-reranker-modernbert-base)

Core Principles:

Based on sequence classification architecture, adding a classification head on top of pre-trained models
Input: query-document pairs, Output: single relevance score

2. Generative Reranker

Applicable Models: qwen3-reranker models (0.6B/4B/8B)

Core Principles:

Based on generative language model architecture (CausalLM)
Input: query-document pairs, Output: probability of specific tokens (e.g., “yes”/”no”)
Classification is performed by comparing logits of specific tokens at the final position

Loss Function Types

SWIFT supports multiple loss functions for training Reranker models:

Pointwise Loss Functions

Pointwise methods transform the ranking problem into a binary classification problem, processing each query-document pair independently:

Core Idea: Binary classification for each query-document pair to determine document relevance to the query
Loss Function: Binary cross-entropy
Use Cases: Simple and efficient, suitable for large-scale data training

Environment variable configuration:

GENERATIVE_RERANKER_POSITIVE_TOKEN: Positive token (default: “yes”)
GENERATIVE_RERANKER_NEGATIVE_TOKEN: Negative token (default: “no”)

Listwise Loss Functions

Listwise methods transform the ranking problem into a multi-classification problem, selecting positive examples from multiple candidate documents:

Core Idea: Multi-classification for each query’s candidate document group (1 positive + n negative examples) to identify positive documents
Loss Function: Multi-class cross-entropy
Use Cases: Learning relative ranking relationships between documents, better aligned with the actual needs of information retrieval

Environment variable configuration:

LISTWISE_RERANKER_TEMPERATURE: Softmax temperature parameter (default: 1.0)
LISTWISE_RERANKER_MIN_GROUP_SIZE: Minimum group size, if the number of documents in the group is less than this value, the loss will not be calculated (default: 2)

Listwise vs Pointwise:

Pointwise: Independent relevance judgment, simple training, but ignores relative relationships between documents
Listwise: Learning relative ranking, better performance, more suitable for the essential needs of ranking tasks

The loss function source code can be found here.

Dataset Format

Common Original Data Format

{"query": "query", "positive": ["relevant_doc1", "relevant_doc2", ...], "negative": ["irrelevant_doc1", "irrelevant_doc2", ...]}

Reference: MTEB/scidocs-reranking

Converted Data Format

{"query": "query", "response": "relevant_doc1", "rejected_response": ["irrelevant_doc1", "irrelevant_doc2", ...]}
{"query": "query", "response": "relevant_doc2", "rejected_response": ["irrelevant_doc1", "irrelevant_doc2", ...]}
...

The final converted data format is required, developers can build their own dataset or reuse MTEBRerankPreprocessor to convert data format.

Training Scripts

SWIFT provides four training script templates: