Fuzz¶
RapidFuzz wrapper. Rapid fuzzy string matching in Python and C++ using the Levenshtein Distance.
Parameters¶
-
key (str)
Field identifier of each document.
-
on (Union[str, list])
Fields to use to match the query to the documents.
-
fuzzer – defaults to
<cyfunction partial_ratio at 0x12fcc13c0>
RapidFuzz scorer: fuzz.ratio, fuzz.partial_ratio, fuzz.token_set_ratio, fuzz.partial_token_set_ratio, fuzz.token_sort_ratio, fuzz.partial_token_sort_ratio, fuzz.token_ratio, fuzz.partial_token_ratio, fuzz.WRatio, fuzz.QRatio, string_metric.levenshtein, string_metric.normalized_levenshtein
-
default_process (bool) – defaults to
True
Pre-processing step. If set to True, documents processed by RapidFuzz default process.
-
k (Optional[int]) – defaults to
None
Examples¶
>>> from pprint import pprint as print
>>> from cherche import retrieve
>>> from rapidfuzz import fuzz
>>> documents = [
... {"id": 0, "title": "Paris", "article": "Eiffel tower"},
... {"id": 1, "title": "Paris", "article": "Paris is in France."},
... {"id": 2, "title": "Montreal", "article": "Montreal is in Canada."},
... ]
>>> retriever = retrieve.Fuzz(
... key = "id",
... on = ["title", "article"],
... fuzzer = fuzz.partial_ratio,
... )
>>> retriever.add(documents=documents)
Fuzz retriever
key : id
on : title, article
documents: 3
>>> print(retriever(q="paris", k=2))
[{'id': 0, 'similarity': 100.0}, {'id': 1, 'similarity': 100.0}]
>>> print(retriever(q=["paris", "montreal"], k=2))
[[{'id': 0, 'similarity': 100.0}, {'id': 1, 'similarity': 100.0}],
[{'id': 2, 'similarity': 100.0}, {'id': 1, 'similarity': 37.5}]]
>>> print(retriever(q=["unknown", "montreal"], k=2))
[[{'id': 2, 'similarity': 40.0}, {'id': 0, 'similarity': 36.36363636363637}],
[{'id': 2, 'similarity': 100.0}, {'id': 1, 'similarity': 37.5}]]
Methods¶
call
Retrieve documents from the index.
Parameters
- q (Union[List[str], str])
- k (Optional[int]) – defaults to
None
- tqdm_bar (bool) – defaults to
True
- kwargs
add
Fuzz is streaming friendly.
Parameters
- documents (List[Dict[str, str]])
- kwargs