Encoder¶
Encoder as a retriever using Faiss Index.
Parameters¶
-
encoder
-
key (str)
Field identifier of each document.
-
on (Union[str, list])
Field to use to retrieve documents.
-
normalize (bool) – defaults to
TrueWhether to normalize the embeddings before adding them to the index in order to measure cosine similarity.
-
k (Optional[int]) – defaults to
None -
batch_size (int) – defaults to
64 -
index – defaults to
NoneFaiss index that will store the embeddings and perform the similarity search.
Examples¶
>>> from pprint import pprint as print
>>> from cherche import retrieve
>>> from sentence_transformers import SentenceTransformer
>>> documents = [
... {"id": 0, "title": "Paris France"},
... {"id": 1, "title": "Madrid Spain"},
... {"id": 2, "title": "Montreal Canada"}
... ]
>>> retriever = retrieve.Encoder(
... encoder = SentenceTransformer("sentence-transformers/all-mpnet-base-v2").encode,
... key = "id",
... on = ["title"],
... )
>>> retriever.add(documents, batch_size=1)
Encoder retriever
key : id
on : title
documents: 3
>>> print(retriever("Spain", k=2))
[{'id': 1, 'similarity': 0.6544566453117681},
{'id': 0, 'similarity': 0.5405465419981407}]
>>> print(retriever(["Spain", "Montreal"], k=2))
[[{'id': 1, 'similarity': 0.6544566453117681},
{'id': 0, 'similarity': 0.54054659424589}],
[{'id': 2, 'similarity': 0.7372165680578416},
{'id': 0, 'similarity': 0.5185645704259234}]]
Methods¶
call
Retrieve documents from the index.
Parameters
- q (Union[List[str], str])
- k (Optional[int]) – defaults to
None - batch_size (Optional[int]) – defaults to
None - tqdm_bar (bool) – defaults to
True - kwargs
add
Add documents to the index.
Parameters
- documents (List[Dict[str, str]])
- batch_size (int) – defaults to
64 - tqdm_bar (bool) – defaults to
True - kwargs