Skip to content

cherche

CrossEncoder

raphaelsty/cherche

CrossEncoder¶

Cross-Encoder as a ranker. CrossEncoder takes both the query and the document as input and outputs a score. The score is a similarity score between the query and the document. The CrossEncoder cannot pre-compute the embeddings of the documents since it need both the query and the document.

Parameters¶

on (Union[List[str], str])

Fields to use to match the query to the documents.
encoder

Sentence Transformer cross-encoder.
k (Optional[int]) – defaults to None
batch_size (int) – defaults to 64

Examples¶

>>> from pprint import pprint as print
>>> from cherche import retrieve, rank, evaluate, data
>>> from sentence_transformers import CrossEncoder

>>> documents, query_answers = data.arxiv_tags(
...    arxiv_title=True, arxiv_summary=False, comment=False
... )

>>> retriever = retrieve.TfIdf(
...    key="uri",
...    on=["prefLabel_text", "altLabel_text"],
...    documents=documents,
...    k=100,
... )

>>> ranker = rank.CrossEncoder(
...     on = ["prefLabel_text", "altLabel_text"],
...     encoder = CrossEncoder("cross-encoder/mmarco-mMiniLMv2-L12-H384-v1").predict,
... )

>>> pipeline = retriever + documents + ranker

>>> match = pipeline("graph neural network", k=5)

>>> for m in match:
...     print(m.get("uri", ""))
'http://www.semanlink.net/tag/graph_neural_networks'
'http://www.semanlink.net/tag/artificial_neural_network'
'http://www.semanlink.net/tag/dans_deep_averaging_neural_networks'
'http://www.semanlink.net/tag/recurrent_neural_network'
'http://www.semanlink.net/tag/convolutional_neural_network'

Methods¶

call

Rank inputs documents based on query.

Parameters

q (str)
documents (list)
batch_size (Optional[int]) – defaults to None
k (Optional[int]) – defaults to None
kwargs

References¶