arxiv_tags¶
Semanlink tags arXiv documents. The objective of this dataset is to evaluate a neural search pipeline for automatic tagging of arXiv documents. This function returns the set of tags and the pairs arXiv documents and tags.
Parameters¶
-
arxiv_title (bool) – defaults to
True
Include title of the arxiv paper inside the query.
-
arxiv_summary (bool) – defaults to
True
Include summary of the arxiv paper inside the query.
-
comment (bool) – defaults to
False
Include comment of the arxiv paper inside the query.
-
broader_prefLabel_text (bool) – defaults to
True
Include broader_prefLabel as a text field.
-
broader_altLabel_text (bool) – defaults to
True
Include broader_altLabel_text as a text field.
-
prefLabel_text (bool) – defaults to
True
Include prefLabel_text as a text field.
-
altLabel_text (bool) – defaults to
True
Include altLabel_text as a text field.
Examples¶
>>> from pprint import pprint as print
>>> from cherche import data
>>> documents, query_answers = data.arxiv_tags()
>>> print(list(documents[0].keys()))
['prefLabel',
'type',
'broader',
'creationTime',
'creationDate',
'comment',
'uri',
'broader_prefLabel',
'broader_related',
'broader_prefLabel_text',
'prefLabel_text']