average¶
Replace KMeans clustering with average clustering when an existing graph is provided.
Parameters¶
-
key (str)
-
documents (list)
-
documents_embeddings (numpy.ndarray | scipy.sparse._csr.csr_matrix)
-
graph
-
scoring
-
device (str)
Examples¶
>>> from neural_tree import clustering, scoring
>>> import numpy as np
>>> documents = [
... {"id": 0, "text": "Paris is the capital of France."},
... {"id": 1, "text": "Berlin is the capital of Germany."},
... {"id": 2, "text": "Paris and Berlin are European cities."},
... {"id": 3, "text": "Paris and Berlin are beautiful cities."},
... ]
>>> documents_embeddings = np.array([
... [1, 1],
... [1, 2],
... [10, 10],
... [1, 3],
... ])
>>> graph = {1: {11: {111: [{'id': 0}, {'id': 3}], 112: [{'id': 1}]}, 12: {121: [{'id': 2}], 122: [{'id': 3}]}}}
>>> clustering.average(
... key="id",
... documents_embeddings=documents_embeddings,
... documents=documents,
... graph=graph[1],
... scoring=scoring.SentenceTransformer(key="id", on=["text"], model=None),
... )