Dataset¶

Custom dataset creation

The Dataset class allows to iterate on the data of a dataset. Dataset takes entities as input, relations, training data and optional validation and test data. Training data, validation and testing must be organized in the form of a triplet list. Entities and relations must be in a dictionary where the key is the label of the entity or relationship and the value must be the index of the entity / relation.

Parameters¶

train
batch_size
entities – defaults to None
relations – defaults to None
valid – defaults to None
test – defaults to None
shuffle – defaults to True
pre_compute – defaults to True
num_workers – defaults to 1
seed – defaults to None

Attributes¶

n_entity (int): Number of entities.

n_relation (int): Number of relations.

Examples¶

>>> from ckb import datasets

>>> train = [
...    ('🐝', 'is', 'animal'),
...    ('🐻', 'is', 'animal'),
...    ('🐍', 'is', 'animal'),
...    ('🦔', 'is', 'animal'),
...    ('🦓', 'is', 'animal'),
...    ('🦒', 'is', 'animal'),
...    ('🦘', 'is', 'animal'),
...    ('🦝', 'is', 'animal'),
...    ('🦞', 'is', 'animal'),
...    ('🦢', 'is', 'animal'),
... ]

>>> test = [
...    ('🐝', 'is', 'animal'),
...    ('🐻', 'is', 'animal'),
...    ('🐍', 'is', 'animal'),
...    ('🦔', 'is', 'animal'),
...    ('🦓', 'is', 'animal'),
...    ('🦒', 'is', 'animal'),
...    ('🦘', 'is', 'animal'),
...    ('🦝', 'is', 'animal'),
...    ('🦞', 'is', 'animal'),
...    ('🦢', 'is', 'animal'),
... ]

>>> dataset = datasets.Dataset(train=train, test=test, batch_size=2, seed=42, shuffle=False)

>>> dataset
Dataset dataset
    Batch size  2
    Entities  11
    Relations  1
    Shuffle  False
    Train triples  10
    Validation triples  0
    Test triples  10

>>> dataset.entities
{'🐝': 0, '🐻': 1, '🐍': 2, '🦔': 3, '🦓': 4, '🦒': 5, '🦘': 6, '🦝': 7, '🦞': 8, '🦢': 9, 'animal': 10}

Methods¶

fetch

get_train_loader

Initialize train dataset loader.

Parameters

mode

mapping_entities

Construct mapping entities.

mapping_relations

Construct mapping relations.

test_dataset

test_stream

validation_dataset

Dataset¶

Parameters¶

Attributes¶

Examples¶

Methods¶

References¶