Encoder¶
Encoder as a retriever using Faiss Index.
Parameters¶
-
encoder
-
key (str)
Field identifier of each document.
-
on (Union[str, list])
Field to use to retrieve documents.
-
normalize (bool) – defaults to
True
Whether to normalize the embeddings before adding them to the index in order to measure cosine similarity.
-
k (Optional[int]) – defaults to
None
-
batch_size (int) – defaults to
64
-
index – defaults to
None
Faiss index that will store the embeddings and perform the similarity search.
Examples¶
>>> from pprint import pprint as print
>>> from cherche import retrieve
>>> from sentence_transformers import SentenceTransformer
>>> documents = [
... {"id": 0, "title": "Paris France"},
... {"id": 1, "title": "Madrid Spain"},
... {"id": 2, "title": "Montreal Canada"}
... ]
>>> retriever = retrieve.Encoder(
... encoder = SentenceTransformer("sentence-transformers/all-mpnet-base-v2").encode,
... key = "id",
... on = ["title"],
... )
>>> retriever.add(documents, batch_size=1)
Encoder retriever
key : id
on : title
documents: 3
>>> print(retriever("Spain", k=2))
[{'id': 1, 'similarity': 0.6544566453117681},
{'id': 0, 'similarity': 0.5405465419981407}]
>>> print(retriever(["Spain", "Montreal"], k=2))
[[{'id': 1, 'similarity': 0.6544566453117681},
{'id': 0, 'similarity': 0.54054659424589}],
[{'id': 2, 'similarity': 0.7372165680578416},
{'id': 0, 'similarity': 0.5185645704259234}]]
Methods¶
call
Retrieve documents from the index.
Parameters
- q (Union[List[str], str])
- k (Optional[int]) – defaults to
None
- batch_size (Optional[int]) – defaults to
None
- tqdm_bar (bool) – defaults to
True
- kwargs
add
Add documents to the index.
Parameters
- documents (List[Dict[str, str]])
- batch_size (int) – defaults to
64
- tqdm_bar (bool) – defaults to
True
- kwargs