Skip to content

arxiv_tags

Semanlink tags arXiv documents. The objective of this dataset is to evaluate a neural search pipeline for automatic tagging of arXiv documents. This function returns the set of tags and the pairs arXiv documents and tags.

Parameters

  • arxiv_title (bool) – defaults to True

    Include title of the arxiv paper inside the query.

  • arxiv_summary (bool) – defaults to True

    Include summary of the arxiv paper inside the query.

  • comment (bool) – defaults to False

    Include comment of the arxiv paper inside the query.

  • broader_prefLabel_text (bool) – defaults to True

    Include broader_prefLabel as a text field.

  • broader_altLabel_text (bool) – defaults to True

    Include broader_altLabel_text as a text field.

  • prefLabel_text (bool) – defaults to True

    Include prefLabel_text as a text field.

  • altLabel_text (bool) – defaults to True

    Include altLabel_text as a text field.

Examples

>>> from pprint import pprint as print
>>> from cherche import data

>>> documents, query_answers = data.arxiv_tags()

>>> print(list(documents[0].keys()))
['prefLabel',
 'type',
 'broader',
 'creationTime',
 'creationDate',
 'comment',
 'uri',
 'broader_prefLabel',
 'broader_related',
 'broader_prefLabel_text',
 'prefLabel_text']