Embedding¶
Collaborative filtering as a ranker. Recommend is compatible with the library Implicit.
Parameters¶
- 
key ('str') Field identifier of each document. 
- 
normalize ('bool') – defaults to TrueIf set to True, the similarity measure is cosine similarity, if set to False, similarity measure is dot product. 
- 
k ('typing.Optional[int]') – defaults to None
- 
batch_size ('int') – defaults to 1024
Examples¶
>>> from pprint import pprint as print
>>> from cherche import rank
>>> from sentence_transformers import SentenceTransformer
>>> documents = [
...    {"id": "a", "title": "Paris"},
...    {"id": "b", "title": "Madrid"},
...    {"id": "c", "title": "Montreal"},
... ]
>>> encoder = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
>>> embeddings_documents = encoder.encode([
...    document["title"] for document in documents
... ])
>>> recommend = rank.Embedding(
...    key="id",
... )
>>> recommend.add(
...    documents=documents,
...    embeddings_documents=embeddings_documents,
... )
Embedding ranker
    key      : id
    documents: 3
    normalize: True
>>> match = recommend(
...     q=encoder.encode("Paris"),
...     documents=documents,
...     k=2
... )
>>> print(match)
[{'id': 'a', 'similarity': 1.0, 'title': 'Paris'},
 {'id': 'c', 'similarity': 0.57165134, 'title': 'Montreal'}]
>>> queries = [
...    "Paris",
...    "Madrid",
...    "Montreal"
... ]
>>> match = recommend(
...     q=encoder.encode(queries),
...     documents=[documents] * 3,
...     k=2
... )
>>> print(match)
[[{'id': 'a', 'similarity': 1.0, 'title': 'Paris'},
  {'id': 'c', 'similarity': 0.57165134, 'title': 'Montreal'}],
 [{'id': 'b', 'similarity': 1.0, 'title': 'Madrid'},
  {'id': 'a', 'similarity': 0.49815434, 'title': 'Paris'}],
 [{'id': 'c', 'similarity': 0.9999999, 'title': 'Montreal'},
  {'id': 'a', 'similarity': 0.5716514, 'title': 'Paris'}]]
Methods¶
call
Retrieve documents from user id.
Parameters
- q ('np.ndarray')
- documents ('typing.Union[typing.List[typing.List[typing.Dict[str, str]]], typing.List[typing.Dict[str, str]]]')
- k     ('typing.Optional[int]')     – defaults to None
- batch_size     ('typing.Optional[int]')     – defaults to None
- kwargs
add
Add embeddings both documents and users.
Parameters
- documents ('list')
- embeddings_documents ('typing.List[np.ndarray]')
- kwargs
encode_rank
Encode documents and rank them according to the query.
Parameters
- embeddings_queries (numpy.ndarray)
- documents (List[List[Dict[str, str]]])
- k (int)
- batch_size     (Optional[int])     – defaults to None
rank
Rank inputs documents ordered by relevance among the top k.
Parameters
- embeddings_documents (Dict[str, numpy.ndarray])
- embeddings_queries (numpy.ndarray)
- documents (List[List[Dict[str, str]]])
- k (int)
- batch_size     (Optional[int])     – defaults to None