Learning to rank
Learning to rank is handled by various classes. Some are located in the learning module.
Listeners
- XPM Configxpmir.letor.learner.ValidationListener(*, id, metrics, dataset, retriever, warmup, validation_interval, early_stop, hooks)[source]
Bases:
LearnerListener
Submit type:
Any
Learning validation early-stopping
Computes a validation metric and stores the best result. If early_stop is set (> 0), then it signals to the learner that the learning process can stop.
- id: str
Unique ID to identify the listener (ignored for signature)
- metrics: Dict[str, bool] = {'map': True}
Dictionary whose keys are the metrics to record, and boolean values whether the best performance checkpoint should be kept for the associated metric ([parseable by ir-measures](https://ir-measur.es/))
- dataset: datamaestro_text.data.ir.Adhoc
The dataset to use
- retriever: xpmir.rankers.Retriever
The retriever for validation
- warmup: int = -1
How many epochs before actually computing the metric
- bestpath: Pathgenerated
Path to the best checkpoints
- info: Pathgenerated
Path to the JSON file that contains the metric values at each epoch
- validation_interval: int = 1
Epochs between each validation
- early_stop: int = 0
Number of epochs without improvement after which we stop learning. Should be a multiple of validation_interval or 0 (no early stopping)
- hooks: List[xpmir.learning.context.ValidationHook] = []
The list of the hooks during the validation
Scorers
Scorers are able to give a score to a (query, document) pair. Among the scorers, some are have learnable parameters.
- XPM Configxpmir.rankers.Scorer[source]
Bases:
Config
,Initializable
,EasyLogger
,ABC
Submit type:
xpmir.rankers.Scorer
Query-document scorer
A model able to give a score to a list of documents given a query
- getRetriever(retriever: Retriever, batch_size: int, batcher: Batcher = Config[xpmir.learning.batchers.batcher], top_k=None, device=None)[source]
Returns a two stage re-ranker from this retriever and a scorer
- Parameters:
device – Device for the ranker or None if no change should be made
batch_size – The number of documents in each batch
top_k – Number of documents to re-rank (or None for all)
- initialize(*args, **kwargs)
Main initialization
Calls
__initialize__()
once (using__initialize__()
)
- XPM Configxpmir.rankers.RandomScorer(*, random)[source]
Bases:
Scorer
Submit type:
xpmir.rankers.RandomScorer
A random scorer
- random: xpmir.learning.base.Random
The random number generator
- XPM Configxpmir.rankers.AbstractModuleScorer[source]
-
Submit type:
xpmir.rankers.AbstractModuleScorer
Base class for all learnable scorer
This class provides a compute method that calls the forward method,
- XPM Configxpmir.rankers.LearnableScorer[source]
Bases:
AbstractModuleScorer
Submit type:
xpmir.rankers.LearnableScorer
Learnable scorer
A scorer with parameters that can be learnt
Adapters
- XPM Configxpmir.rankers.adapters.ScorerTransformAdapter(*, scorer, adapter)[source]
Bases:
Scorer
Submit type:
xpmir.rankers.adapters.ScorerTransformAdapter
Transforms topic and/or documents output by a scorer when rescoring documents
- scorer: xpmir.rankers.Scorer
The original scorer to be transform
- adapter: xpmir.letor.samplers.hydrators.SampleTransform
The list of sample transforms to apply
Utility functions
- xpmir.rankers.scorer_retriever(documents: Documents, *, retrievers: RetrieverFactory, scorer: Scorer, **kwargs)[source]
Helper function that returns a two stage retriever. This is useful when used with partial (when the scorer is not known).
- Parameters:
documents – The document collection
retrievers – A retriever factory
scorer – The scorer
- Returns:
A retriever, calling the :meth:scorer.getRetriever
Retrievers
Scores can be used as retrievers through a xpmir.rankers.TwoStageRetriever
Samplers
Samplers provide samples in the form of records. They all inherit from:
- class xpmir.letor.samplers.SerializableIterator[source]
Bases:
Iterator
[T
],Generic
[T
,State
]An iterator that can be serialized through state dictionaries.
This is used when saving the sampler state
- XPM Configxpmir.letor.samplers.ModelBasedSampler(*, dataset, retriever)[source]
Bases:
Sampler
Submit type:
xpmir.letor.samplers.ModelBasedSampler
Base class for retriever-based sampler
- dataset: datamaestro_text.data.ir.Adhoc
The IR adhoc dataset
- retriever: xpmir.rankers.Retriever
A retriever to sample negative documents
Records for training
- class xpmir.letor.records.PairwiseRecord(query: Record, positive: Record, negative: Record)[source]
Bases:
object
A pairwise record is composed of a query, a positive and a negative document
- class xpmir.letor.records.PointwiseRecord(topic: Record, document: Record, relevance: float | None = None)[source]
Bases:
object
A record from a pointwise sampler
Document samplers
Useful for pre-training or when learning index parameters (e.g. for FAISS).
- XPM Configxpmir.documents.samplers.DocumentSampler(*, documents)[source]
Bases:
Config
,ABC
Submit type:
xpmir.documents.samplers.DocumentSampler
How to sample from a document store
- documents: datamaestro_text.data.ir.DocumentStore
- XPM Configxpmir.documents.samplers.HeadDocumentSampler(*, documents, max_count, max_ratio)[source]
Bases:
DocumentSampler
Submit type:
xpmir.documents.samplers.HeadDocumentSampler
A basic sampler that iterates over the first documents
if max_count is 0, it iterates over all documents
- documents: datamaestro_text.data.ir.DocumentStore
- max_count: int = 0
Maximum number of documents (if 0, no limit)
- max_ratio: float = 0
Maximum ratio of documents (if 0, no limit)
- XPM Configxpmir.documents.samplers.RandomDocumentSampler(*, documents, max_count, max_ratio, random)[source]
Bases:
DocumentSampler
Submit type:
xpmir.documents.samplers.RandomDocumentSampler
A basic sampler that iterates over the first documents
Either max_count or max_ratio should be non null
- documents: datamaestro_text.data.ir.DocumentStore
- max_count: int = 0
Maximum number of documents (if 0, no limit)
- max_ratio: float = 0
Maximum ratio of documents (if 0, no limit)
- random: xpmir.learning.base.Random
Random sampler
Adapters
- XPM Configxpmir.letor.samplers.hydrators.SampleTransform[source]
Bases:
Config
,ABC
Submit type:
xpmir.letor.samplers.hydrators.SampleTransform
- XPM Configxpmir.letor.samplers.hydrators.SampleHydrator(*, documentstore, querystore)[source]
Bases:
SampleTransform
Submit type:
xpmir.letor.samplers.hydrators.SampleHydrator
Base class for document/topic hydrators
- documentstore: datamaestro_text.data.ir.DocumentStore
The store for document texts if needed
- querystore: xpmir.datasets.adapters.TextStore
The store for query texts if needed
- XPM Configxpmir.letor.samplers.hydrators.SamplePrefixAdding(*, query_prefix, document_prefix)[source]
Bases:
SampleTransform
Submit type:
xpmir.letor.samplers.hydrators.SamplePrefixAdding
Transform the query and documents by adding the prefix
- query_prefix: str
The prefix for the query
- document_prefix: str
The prefix for the document
- XPM Configxpmir.letor.samplers.hydrators.SampleTransformList(*, adapters)[source]
Bases:
SampleTransform
Submit type:
xpmir.letor.samplers.hydrators.SampleTransformList
A class which group a list of sample transforms
- adapters: List[xpmir.letor.samplers.hydrators.SampleTransform]
The list of sample transform to be applied