Build an index from an existing tree¶
Neural-Tree can build a tree from an existing graph. This is useful when we have a specific use case where we want to retrieve the right leaf for a query.
The tree we want to pass should follow some rules:
-
We should avoid nodes with a lot of children. The more children a node has, the more time it will take to explore this node.
-
A node must have only one parent. This is a rule for the tree to be a tree. You can somehow duplicate a node to have it in multiple places in the tree.
Let's create a tree which has one root node, two children nodes and two leafs nodes which contains up to 3 documents.
graph = {
"root": {
"science": {
"machine learning": [
{"id": 0, "content": "bayern football team"},
{"id": 1, "content": "toulouse rugby team"},
],
"computer": [
{"id": 2, "content": "Apple Macintosh"},
{"id": 3, "content": "Microsoft Windows"},
{"id": 4, "content": "Linux Ubuntu"},
],
},
"history": {
"france": [
{"id": 5, "content": "history of france"},
{"id": 6, "content": "french revolution"},
],
"italia": [
{"id": 7, "content": "history of rome"},
{"id": 8, "content": "history of venice"},
],
},
}
}
We can now initialize either a TfIdf, a SentenceTransformer or a ColBERT tree using the graph we have created.
from neural_tree import trees
from neural_cherche import models
model = models.ColBERT(
model_name_or_path="raphaelsty/neural-cherche-colbert",
device="cuda" if torch.cuda.is_available() else "cpu",
)
tree = trees.ColBERT(
key="id",
on=["content"],
model=model,
graph=graph,
n_jobs=-1,
)
print(tree)
This will output:
Once we have created our tree we can export it back to json using the tree.to_json()
: