Reranker Training

SWIFT supports Reranker model training. Currently supported models include:

  1. modernbert reranker model

  2. qwen3-reranker model

Implementation Methods

SWIFT currently supports two implementation methods for Reranker models, which have significant differences in architecture and loss function computation:

1. Classification Reranker

Applicable Models: modernbert reranker models (e.g., gte-reranker-modernbert-base)

Core Principles:

  • Based on sequence classification architecture, adding a classification head on top of pre-trained models

  • Input: query-document pairs, Output: single relevance score

2. Generative Reranker

Applicable Models: qwen3-reranker models (0.6B/4B/8B)

Core Principles:

  • Based on generative language model architecture (CausalLM)

  • Input: query-document pairs, Output: probability of specific tokens (e.g., “yes”/”no”)

  • Classification is performed by comparing logits of specific tokens at the final position

Loss Function Types

SWIFT supports multiple loss functions for training Reranker models:

Pointwise Loss Functions

Pointwise methods transform the ranking problem into a binary classification problem, processing each query-document pair independently:

  • Core Idea: Binary classification for each query-document pair to determine document relevance to the query

  • Loss Function: Binary cross-entropy

  • Use Cases: Simple and efficient, suitable for large-scale data training

Environment variable configuration:

  • GENERATIVE_RERANKER_POSITIVE_TOKEN: Positive token (default: “yes”)

  • GENERATIVE_RERANKER_NEGATIVE_TOKEN: Negative token (default: “no”)

Listwise Loss Functions

Listwise methods transform the ranking problem into a multi-classification problem, selecting positive examples from multiple candidate documents:

  • Core Idea: Multi-classification for each query’s candidate document group (1 positive + n negative examples) to identify positive documents

  • Loss Function: Multi-class cross-entropy

  • Use Cases: Learning relative ranking relationships between documents, better aligned with the actual needs of information retrieval

Environment variable configuration:

  • LISTWISE_RERANKER_TEMPERATURE: Softmax temperature parameter (default: 1.0)

  • LISTWISE_RERANKER_MIN_GROUP_SIZE: Minimum group size, if the number of documents in the group is less than this value, the loss will not be calculated (default: 2)

Listwise vs Pointwise:

  • Pointwise: Independent relevance judgment, simple training, but ignores relative relationships between documents

  • Listwise: Learning relative ranking, better performance, more suitable for the essential needs of ranking tasks

The loss function source code can be found here.

Dataset Format

Common Original Data Format

{"query": "query", "positive": ["relevant_doc1", "relevant_doc2", ...], "negative": ["irrelevant_doc1", "irrelevant_doc2", ...]}

Converted Data Format

{"query": "query", "response": "relevant_doc1", "rejected_response": ["irrelevant_doc1", "irrelevant_doc2", ...]}
{"query": "query", "response": "relevant_doc2", "rejected_response": ["irrelevant_doc1", "irrelevant_doc2", ...]}
...

The final converted data format is required, developers can build their own dataset or reuse MTEBRerankPreprocessor to convert data format.

Training Scripts

SWIFT provides four training script templates: