Faiss¶
Faiss index dedicated to vector search.
Parameters¶
-
key
Identifier field for each document.
-
index – defaults to
None
Faiss index to use.
-
normalize (bool) – defaults to
True
Examples¶
>>> from pprint import pprint as print
>>> from cherche import index
>>> from sentence_transformers import SentenceTransformer
>>> documents = [
... {"id": 0, "title": "Paris France"},
... {"id": 1, "title": "Madrid Spain"},
... {"id": 2, "title": "Montreal Canada"}
... ]
>>> encoder = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
>>> faiss_index = index.Faiss(key="id")
>>> faiss_index = faiss_index.add(
... documents = documents,
... embeddings = encoder.encode([document["title"] for document in documents]),
... )
>>> print(faiss_index(embeddings=encoder.encode(["Spain", "Montreal"])))
[[{'id': 1, 'similarity': 0.6544566197822951},
{'id': 0, 'similarity': 0.5405466290777285},
{'id': 2, 'similarity': 0.48717489472604614}],
[{'id': 2, 'similarity': 0.7372165680578416},
{'id': 0, 'similarity': 0.5185646665953703},
{'id': 1, 'similarity': 0.4834444940712032}]]
>>> documents = [
... {"id": 3, "title": "Paris France"},
... {"id": 4, "title": "Madrid Spain"},
... {"id": 5, "title": "Montreal Canada"}
... ]
>>> faiss_index = faiss_index.add(
... documents = documents,
... embeddings = encoder.encode([document["title"] for document in documents]),
... )
>>> print(faiss_index(embeddings=encoder.encode(["Spain", "Montreal"]), k=4))
[[{'id': 1, 'similarity': 0.6544566197822951},
{'id': 4, 'similarity': 0.6544566197822951},
{'id': 0, 'similarity': 0.5405466290777285},
{'id': 3, 'similarity': 0.5405466290777285}],
[{'id': 2, 'similarity': 0.7372165680578416},
{'id': 5, 'similarity': 0.7372165680578416},
{'id': 0, 'similarity': 0.5185646665953703},
{'id': 3, 'similarity': 0.5185646665953703}]]
Methods¶
call
Call self as a function.
Parameters
- embeddings (numpy.ndarray)
- k (Optional[int]) – defaults to
None
add
Add documents to the faiss index and export embeddings if the path is provided. Streaming friendly.
Parameters
- documents (list)
- embeddings (numpy.ndarray)