Skip to content

Dataset

Custom dataset creation

The Dataset class allows to iterate on the data of a dataset. Dataset takes entities as input, relations, training data and optional validation and test data. Training data, validation and testing must be organized in the form of a triplet list. Entities and relations must be in a dictionary where the key is the label of the entity or relationship and the value must be the index of the entity / relation.

Parameters

  • train

  • batch_size

  • entities – defaults to None

  • relations – defaults to None

  • valid – defaults to None

  • test – defaults to None

  • shuffle – defaults to True

  • pre_compute – defaults to True

  • num_workers – defaults to 1

  • seed – defaults to None

Attributes

  • n_entity (int): Number of entities.

    n_relation (int): Number of relations.

Examples

>>> from ckb import datasets

>>> train = [
...    ('🐝', 'is', 'animal'),
...    ('🐻', 'is', 'animal'),
...    ('🐍', 'is', 'animal'),
...    ('🦔', 'is', 'animal'),
...    ('🦓', 'is', 'animal'),
...    ('🦒', 'is', 'animal'),
...    ('🦘', 'is', 'animal'),
...    ('🦝', 'is', 'animal'),
...    ('🦞', 'is', 'animal'),
...    ('🦢', 'is', 'animal'),
... ]

>>> test = [
...    ('🐝', 'is', 'animal'),
...    ('🐻', 'is', 'animal'),
...    ('🐍', 'is', 'animal'),
...    ('🦔', 'is', 'animal'),
...    ('🦓', 'is', 'animal'),
...    ('🦒', 'is', 'animal'),
...    ('🦘', 'is', 'animal'),
...    ('🦝', 'is', 'animal'),
...    ('🦞', 'is', 'animal'),
...    ('🦢', 'is', 'animal'),
... ]

>>> dataset = datasets.Dataset(train=train, test=test, batch_size=2, seed=42, shuffle=False)

>>> dataset
Dataset dataset
    Batch size  2
    Entities  11
    Relations  1
    Shuffle  False
    Train triples  10
    Validation triples  0
    Test triples  10

>>> dataset.entities
{'🐝': 0, '🐻': 1, '🐍': 2, '🦔': 3, '🦓': 4, '🦒': 5, '🦘': 6, '🦝': 7, '🦞': 8, '🦢': 9, 'animal': 10}

Methods

fetch
get_train_loader

Initialize train dataset loader.

Parameters

  • mode
mapping_entities

Construct mapping entities.

mapping_relations

Construct mapping relations.

test_dataset
test_stream
validation_dataset

References