Union and intersection of rankers¶
Let's build a pipeline using union |
and intersection &
operators.
from cherche import data, rank, retrieve
from sentence_transformers import SentenceTransformer
The first step is to define the corpus on which we will perform the neural search. The towns dataset contains about a hundred documents, all of which have four attributes, an id
, the title
of the article, the url
and the content of the article
.
documents = data.load_towns()
documents[:4]
[{'id': 0, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'Paris (French pronunciation: \u200b[paʁi] (listen)) is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres (41 square miles).'}, {'id': 1, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': "Since the 17th century, Paris has been one of Europe's major centres of finance, diplomacy, commerce, fashion, gastronomy, science, and arts."}, {'id': 2, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017.'}, {'id': 3, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The Paris Region had a GDP of €709 billion ($808 billion) in 2017.'}]
Union¶
Let's create the union of two pipelines. The first with high precision and low recall and the second with better recall.
# Low recall, high precision
precision = retrieve.Flash(key="id", on=["title", "article"], k=30) + rank.Encoder(
key="id",
on=["title", "article"],
encoder=SentenceTransformer("sentence-transformers/all-mpnet-base-v2").encode,
)
# High recall
recall = retrieve.TfIdf(
key="id", on=["title", "article"], documents=documents, k=30
) + rank.Encoder(
key="id",
on=["title", "article"],
encoder=SentenceTransformer("sentence-transformers/all-mpnet-base-v2").encode,
)
# Union: precision | recall
search = precision | recall
search.add(documents)
Encoder ranker: 100%|████████| 2/2 [00:02<00:00, 1.35s/it] Encoder ranker: 100%|████████| 2/2 [00:02<00:00, 1.32s/it]
Union Pipeline ----- Flash retriever key : id on : title, article documents: 110 Encoder ranker key : id on : title, article normalize : True embeddings: 105 TfIdf retriever key : id on : title, article documents: 105 Encoder ranker key : id on : title, article normalize : True embeddings: 105 -----
search("Paris football", k=30)
Flash retriever: 100%|█████| 1/1 [00:00<00:00, 8473.34it/s] TfIdf retriever: 100%|██████| 1/1 [00:00<00:00, 240.38it/s]
[{'id': 20, 'similarity': 2.074074074074074}, {'id': 24, 'similarity': 0.5}, {'id': 16, 'similarity': 0.738095238095238}, {'id': 21, 'similarity': 0.5689655172413793}, {'id': 22, 'similarity': 0.4645161290322581}, {'id': 1, 'similarity': 0.3958333333333333}, {'id': 0, 'similarity': 0.3463203463203463}, {'id': 2, 'similarity': 0.3088235294117647}, {'id': 25, 'similarity': 0.27936507936507937}, {'id': 6, 'similarity': 0.25555555555555554}, {'id': 3, 'similarity': 0.23587223587223588}, {'id': 23, 'similarity': 0.21929824561403508}, {'id': 14, 'similarity': 0.20512820512820512}, {'id': 7, 'similarity': 0.19163763066202089}, {'id': 8, 'similarity': 0.18095238095238095}, {'id': 17, 'similarity': 0.17151162790697674}, {'id': 9, 'similarity': 0.16310160427807485}, {'id': 13, 'similarity': 0.15555555555555556}, {'id': 12, 'similarity': 0.14874141876430205}, {'id': 15, 'similarity': 0.1425531914893617}, {'id': 5, 'similarity': 0.13605442176870747}, {'id': 10, 'similarity': 0.13012477718360071}, {'id': 19, 'similarity': 0.1254180602006689}, {'id': 11, 'similarity': 0.12037037037037036}, {'id': 4, 'similarity': 0.11636363636363636}, {'id': 18, 'similarity': 0.11263736263736264}, {'id': 56, 'similarity': 0.03333333333333333}, {'id': 51, 'similarity': 0.025}, {'id': 53, 'similarity': 0.020833333333333332}, {'id': 52, 'similarity': 0.02}, {'id': 94, 'similarity': 0.018867924528301886}]
search("speciality Lyon", k=10)
Flash retriever: 100%|█████| 1/1 [00:00<00:00, 5377.31it/s] TfIdf retriever: 100%|██████| 1/1 [00:00<00:00, 609.37it/s]
[{'id': 49, 'similarity': 2.1818181818181817}, {'id': 45, 'similarity': 1.1538461538461537}, {'id': 48, 'similarity': 0.3333333333333333}, {'id': 41, 'similarity': 0.6428571428571428}, {'id': 47, 'similarity': 0.5333333333333333}, {'id': 50, 'similarity': 0.16666666666666666}, {'id': 42, 'similarity': 0.14285714285714285}, {'id': 46, 'similarity': 0.125}, {'id': 44, 'similarity': 0.33986928104575165}, {'id': 43, 'similarity': 0.3111111111111111}, {'id': 56, 'similarity': 0.08333333333333333}, {'id': 55, 'similarity': 0.0625}, {'id': 10, 'similarity': 0.05263157894736842}, {'id': 58, 'similarity': 0.05}]
We can automatically map document identifiers to their content.
search += documents
search("Paris football", k=30)[:5]
Flash retriever: 100%|████| 1/1 [00:00<00:00, 10866.07it/s] TfIdf retriever: 100%|█████| 1/1 [00:00<00:00, 1649.35it/s]
[{'id': 20, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The football club Paris Saint-Germain and the rugby union club Stade Français are based in Paris.', 'similarity': 2.074074074074074}, {'id': 24, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The 1938 and 1998 FIFA World Cups, the 2007 Rugby World Cup, as well as the 1960, 1984 and 2016 UEFA European Championships were also held in the city.', 'similarity': 0.5}, {'id': 16, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'Paris received 12.', 'similarity': 0.738095238095238}, {'id': 21, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The 80,000-seat Stade de France, built for the 1998 FIFA World Cup, is located just north of Paris in the neighbouring commune of Saint-Denis.', 'similarity': 0.5689655172413793}, {'id': 22, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'Paris hosts the annual French Open Grand Slam tennis tournament on the red clay of Roland Garros.', 'similarity': 0.4645161290322581}]
search("speciality Lyon", k=30)[:5]
Flash retriever: 100%|██████| 1/1 [00:00<00:00, 529.05it/s] TfIdf retriever: 100%|██████| 1/1 [00:00<00:00, 958.04it/s]
[{'id': 52, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'Economically, Lyon is a major centre for banking, as well as for the chemical, pharmaceutical and biotech industries.', 'similarity': 2.1176470588235294}, {'id': 49, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'Lyon was historically an important area for the production and weaving of silk.', 'similarity': 1.1111111111111112}, {'id': 56, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': "It ranked second in France and 40th globally in Mercer's 2019 liveability rankings.", 'similarity': 0.7719298245614035}, {'id': 45, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'Lyon is the prefecture of the Auvergne-Rhône-Alpes region and seat of the Departmental Council of Rhône (whose jurisdiction, however, no longer extends over the Metropolis of Lyon since 2015).', 'similarity': 0.6}, {'id': 48, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': "The city is recognised for its cuisine and gastronomy, as well as historical and architectural landmarks; as such, the districts of Old Lyon, the Fourvière hill, the Presqu'île and the slopes of the Croix-Rousse are inscribed on the UNESCO World Heritage List.", 'similarity': 0.49523809523809526}]
Intersection¶
retriever = retrieve.Lunr(key="id", on=["title", "article"], documents=documents)
We will build a set of rankers consisting of two different pre-trained models with the intersection operator &
. The pipeline will only offer the documents returned by the union of the two retrievers and the intersection of the rankers.
ranker = rank.Encoder(
key="id",
on=["title", "article"],
encoder=SentenceTransformer("sentence-transformers/all-mpnet-base-v2").encode,
) & rank.Encoder(
key="id",
on=["title", "article"],
encoder=SentenceTransformer(
"sentence-transformers/multi-qa-mpnet-base-cos-v1"
).encode,
)
search = retriever + ranker
search.add(documents)
Encoder ranker: 100%|████████| 2/2 [00:02<00:00, 1.43s/it] Encoder ranker: 100%|████████| 2/2 [00:02<00:00, 1.40s/it]
Lunr retriever key : id on : title, article documents: 105 Intersection ----- Encoder ranker key : id on : title, article normalize : True embeddings: 105 Encoder ranker key : id on : title, article normalize : True embeddings: 105 -----
search("Paris football")
[{'id': 20, 'similarity': 2.0588235294117645}, {'id': 24, 'similarity': 1.0571428571428572}, {'id': 16, 'similarity': 0.7207207207207207}, {'id': 21, 'similarity': 0.5555555555555556}, {'id': 22, 'similarity': 0.45263157894736844}, {'id': 1, 'similarity': 0.3833333333333333}, {'id': 0, 'similarity': 0.33699633699633696}, {'id': 2, 'similarity': 0.2965116279069767}, {'id': 25, 'similarity': 0.261437908496732}, {'id': 6, 'similarity': 0.24878048780487805}, {'id': 3, 'similarity': 0.22529644268774704}, {'id': 23, 'similarity': 0.2074829931972789}, {'id': 14, 'similarity': 0.1982905982905983}, {'id': 7, 'similarity': 0.18541033434650456}, {'id': 8, 'similarity': 0.18095238095238095}, {'id': 42, 'similarity': 0.16346153846153846}, {'id': 32, 'similarity': 0.15931372549019607}, {'id': 17, 'similarity': 0.14747474747474748}, {'id': 9, 'similarity': 0.1429990069513406}, {'id': 27, 'similarity': 0.13703703703703704}, {'id': 13, 'similarity': 0.13523809523809524}, {'id': 12, 'similarity': 0.12599681020733652}, {'id': 15, 'similarity': 0.12143928035982009}, {'id': 5, 'similarity': 0.11666666666666667}, {'id': 70, 'similarity': 0.11174603174603175}, {'id': 10, 'similarity': 0.1076923076923077}, {'id': 19, 'similarity': 0.11952861952861953}, {'id': 94, 'similarity': 0.10267857142857142}, {'id': 11, 'similarity': 0.10122358175750834}, {'id': 4, 'similarity': 0.09696969696969697}, {'id': 59, 'similarity': 0.09730301427815971}, {'id': 28, 'similarity': 0.09639830508474576}, {'id': 18, 'similarity': 0.09632034632034632}]
search("speciality Lyon")
[{'id': 52, 'similarity': 2.1}, {'id': 49, 'similarity': 1.0909090909090908}, {'id': 56, 'similarity': 0.7619047619047619}, {'id': 45, 'similarity': 0.58}, {'id': 48, 'similarity': 0.48695652173913045}, {'id': 41, 'similarity': 0.41025641025641024}, {'id': 54, 'similarity': 0.3482142857142857}, {'id': 47, 'similarity': 0.32407407407407407}, {'id': 50, 'similarity': 0.28888888888888886}, {'id': 53, 'similarity': 0.2689655172413793}, {'id': 42, 'similarity': 0.26515151515151514}, {'id': 51, 'similarity': 0.2238095238095238}, {'id': 46, 'similarity': 0.21266968325791857}, {'id': 55, 'similarity': 0.20346320346320346}, {'id': 44, 'similarity': 0.1978494623655914}, {'id': 43, 'similarity': 0.19642857142857142}, {'id': 32, 'similarity': 0.17027863777089783}, {'id': 28, 'similarity': 0.16666666666666666}, {'id': 59, 'similarity': 0.1593172119487909}]
We can automatically map document identifiers to their content.
search += documents
search("Paris football")
[{'id': 20, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The football club Paris Saint-Germain and the rugby union club Stade Français are based in Paris.', 'similarity': 2.0588235294117645}, {'id': 24, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The 1938 and 1998 FIFA World Cups, the 2007 Rugby World Cup, as well as the 1960, 1984 and 2016 UEFA European Championships were also held in the city.', 'similarity': 1.0571428571428572}, {'id': 16, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'Paris received 12.', 'similarity': 0.7207207207207207}, {'id': 21, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The 80,000-seat Stade de France, built for the 1998 FIFA World Cup, is located just north of Paris in the neighbouring commune of Saint-Denis.', 'similarity': 0.5555555555555556}, {'id': 22, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'Paris hosts the annual French Open Grand Slam tennis tournament on the red clay of Roland Garros.', 'similarity': 0.45263157894736844}, {'id': 1, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': "Since the 17th century, Paris has been one of Europe's major centres of finance, diplomacy, commerce, fashion, gastronomy, science, and arts.", 'similarity': 0.3833333333333333}, {'id': 0, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'Paris (French pronunciation: \u200b[paʁi] (listen)) is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres (41 square miles).', 'similarity': 0.33699633699633696}, {'id': 2, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017.', 'similarity': 0.2965116279069767}, {'id': 25, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'Every July, the Tour de France bicycle race finishes on the Avenue des Champs-Élysées in Paris.', 'similarity': 0.261437908496732}, {'id': 6, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'Paris is a major railway, highway, and air-transport hub served by two international airports: Paris–Charles de Gaulle (the second-busiest airport in Europe) and Paris–Orly.', 'similarity': 0.24878048780487805}, {'id': 3, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The Paris Region had a GDP of €709 billion ($808 billion) in 2017.', 'similarity': 0.22529644268774704}, {'id': 23, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The city hosted the Olympic Games in 1900, 1924 and will host the 2024 Summer Olympics.', 'similarity': 0.2074829931972789}, {'id': 14, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The historical district along the Seine in the city centre has been classified as a UNESCO World Heritage Site since 1991; popular landmarks there include the Cathedral of Notre Dame de Paris on the Île de la Cité, now closed for renovation after the 15 April 2019 fire.', 'similarity': 0.1982905982905983}, {'id': 7, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': "Opened in 1900, the city's subway system, the Paris Métro, serves 5.", 'similarity': 0.18541033434650456}, {'id': 8, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': '23 million passengers daily; it is the second-busiest metro system in Europe after the Moscow Metro.', 'similarity': 0.18095238095238095}, {'id': 42, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'It is located at the confluence of the rivers Rhône and Saône, about 470 km (292 mi) southeast of Paris, 320 km (199 mi) north of Marseille and 56 km (35 mi) northeast of Saint-Étienne.', 'similarity': 0.16346153846153846}, {'id': 32, 'title': 'Toulouse', 'url': 'https://en.wikipedia.org/wiki/Toulouse', 'article': 'The University of Toulouse is one of the oldest in Europe (founded in 1229) and, with more than 103,000 students, it is the fourth-largest university campus in France, after the universities of Paris, Lyon and Lille.', 'similarity': 0.15931372549019607}, {'id': 17, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': '6 million visitors in 2020, measured by hotel stays, a drop of 73 percent from 2019, due to the COVID-19 virus.', 'similarity': 0.14747474747474748}, {'id': 9, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'Gare du Nord is the 24th-busiest railway station in the world, but the busiest located outside Japan, with 262 million passengers in 2015.', 'similarity': 0.1429990069513406}, {'id': 27, 'title': 'Toulouse', 'url': 'https://en.wikipedia.org/wiki/Toulouse', 'article': 'The city is on the banks of the River Garonne, 150 kilometres (93 miles) from the Mediterranean Sea, 230 km (143 mi) from the Atlantic Ocean and 680 km (420 mi) from Paris.', 'similarity': 0.13703703703703704}, {'id': 13, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The Musée Rodin and Musée Picasso exhibit the works of two noted Parisians.', 'similarity': 0.13523809523809524}, {'id': 12, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': "The Pompidou Centre Musée National d'Art Moderne has the largest collection of modern and contemporary art in Europe.", 'similarity': 0.12599681020733652}, {'id': 15, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'Other popular tourist sites include the Gothic royal chapel of Sainte-Chapelle, also on the Île de la Cité; the Eiffel Tower, constructed for the Paris Universal Exposition of 1889; the Grand Palais and Petit Palais, built for the Paris Universal Exposition of 1900; the Arc de Triomphe on the Champs-Élysées, and the hill of Montmartre with its artistic history and its Basilica of Sacré-Coeur.', 'similarity': 0.12143928035982009}, {'id': 5, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'Another source ranked Paris as most expensive, on par with Singapore and Hong Kong, in 2018.', 'similarity': 0.11666666666666667}, {'id': 70, 'title': 'Bordeaux', 'url': 'https://en.wikipedia.org/wiki/Bordeaux', 'article': 'Bordeaux is an international tourist destination for its architectural and cultural heritage with more than 350 historic monuments, making it, after Paris, the city with the most listed or registered monuments in France.', 'similarity': 0.11174603174603175}, {'id': 10, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'Paris is especially known for its museums and architectural landmarks: the Louvre remained the most-visited museum in the world with 2,677,504 visitors in 2020, despite the long museum closings caused by the COVID-19 virus.', 'similarity': 0.1076923076923077}, {'id': 19, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'Museums re-opened in 2021, with limitations on the number of visitors at a time and a requirement that visitors wear masks.', 'similarity': 0.11952861952861953}, {'id': 94, 'title': 'Montreal', 'url': 'https://en.wikipedia.org/wiki/Montreal', 'article': 'Montreal is the second-largest primarily French-speaking city in the developed world, after Paris.', 'similarity': 0.10267857142857142}, {'id': 11, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': "The Musée d'Orsay, Musée Marmottan Monet and Musée de l'Orangerie are noted for their collections of French Impressionist art.", 'similarity': 0.10122358175750834}, {'id': 4, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'According to the Economist Intelligence Unit Worldwide Cost of Living Survey in 2018, Paris was the second most expensive city in the world, after Singapore and ahead of Zürich, Hong Kong, Oslo, and Geneva.', 'similarity': 0.09696969696969697}, {'id': 59, 'title': 'Bordeaux', 'url': 'https://en.wikipedia.org/wiki/Bordeaux', 'article': 'Bordeaux is the centre of Bordeaux Métropole that has a population of 796,273 (2019), the sixth-largest in France after Paris, Lyon, Marseille, Toulouse and Lille with its immediate suburbs and closest satellite towns.', 'similarity': 0.09730301427815971}, {'id': 28, 'title': 'Toulouse', 'url': 'https://en.wikipedia.org/wiki/Toulouse', 'article': 'It is the fourth-largest commune in France, with 479,553 inhabitants within its municipal boundaries (as of January 2017), after Paris, Marseille and Lyon, ahead of Nice; it has a population of 1,360,829 within its wider metropolitan area (also as of January 2017).', 'similarity': 0.09639830508474576}, {'id': 18, 'title': 'Paris', 'url': 'https://en.wikipedia.org/wiki/Paris', 'article': 'The number of foreign visitors declined by 80.', 'similarity': 0.09632034632034632}]
search("speciality Lyon")
[{'id': 52, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'Economically, Lyon is a major centre for banking, as well as for the chemical, pharmaceutical and biotech industries.', 'similarity': 2.1}, {'id': 49, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'Lyon was historically an important area for the production and weaving of silk.', 'similarity': 1.0909090909090908}, {'id': 56, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': "It ranked second in France and 40th globally in Mercer's 2019 liveability rankings.", 'similarity': 0.7619047619047619}, {'id': 45, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'Lyon is the prefecture of the Auvergne-Rhône-Alpes region and seat of the Departmental Council of Rhône (whose jurisdiction, however, no longer extends over the Metropolis of Lyon since 2015).', 'similarity': 0.58}, {'id': 48, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': "The city is recognised for its cuisine and gastronomy, as well as historical and architectural landmarks; as such, the districts of Old Lyon, the Fourvière hill, the Presqu'île and the slopes of the Croix-Rousse are inscribed on the UNESCO World Heritage List.", 'similarity': 0.48695652173913045}, {'id': 41, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'Lyon or Lyons (UK: , US: , French: [ljɔ̃] (listen); Arpitan: Liyon, pronounced [ʎjɔ̃]) is the third-largest city and second-largest urban area of France.', 'similarity': 0.41025641025641024}, {'id': 54, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'Lyon hosts the international headquarters of Interpol, the International Agency for Research on Cancer, as well as Euronews.', 'similarity': 0.3482142857142857}, {'id': 47, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'Lyon became a major economic hub during the Renaissance.', 'similarity': 0.32407407407407407}, {'id': 50, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'Lyon played a significant role in the history of cinema: it is where Auguste and Louis Lumière invented the cinematograph.', 'similarity': 0.28888888888888886}, {'id': 53, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'The city contains a significant software industry with a particular focus on video games; in recent years it has fostered a growing local start-up sector.', 'similarity': 0.2689655172413793}, {'id': 42, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'It is located at the confluence of the rivers Rhône and Saône, about 470 km (292 mi) southeast of Paris, 320 km (199 mi) north of Marseille and 56 km (35 mi) northeast of Saint-Étienne.', 'similarity': 0.26515151515151514}, {'id': 51, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'It is also known for its light festival, the Fête des Lumières, which begins every 8 December and lasts for four days, earning Lyon the title of "Capital of Lights".', 'similarity': 0.2238095238095238}, {'id': 46, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'Former capital of the Gauls at the time of the Roman Empire, Lyon is the seat of an archbishopric whose holder bears the title of Primate of the Gauls.', 'similarity': 0.21266968325791857}, {'id': 55, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'According to the Globalization and World Rankings Research Institute, Lyon is considered a Beta city, as of 2018.', 'similarity': 0.20346320346320346}, {'id': 44, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'Lyon and 58 suburban municipalities have formed since 2015 the Metropolis of Lyon, a directly elected metropolitan authority now in charge of most urban issues, with a population of 1,385,927 in 2017.', 'similarity': 0.1978494623655914}, {'id': 43, 'title': 'Lyon', 'url': 'https://en.wikipedia.org/wiki/Lyon', 'article': 'The City of Lyon proper had a population of 516,092 in 2017 within its small municipal territory of 48 km2 (19 sq mi), but together with its suburbs and exurbs the Lyon metropolitan area had a population of 2,323,221 that same year, the second-most populated in France.', 'similarity': 0.19642857142857142}, {'id': 32, 'title': 'Toulouse', 'url': 'https://en.wikipedia.org/wiki/Toulouse', 'article': 'The University of Toulouse is one of the oldest in Europe (founded in 1229) and, with more than 103,000 students, it is the fourth-largest university campus in France, after the universities of Paris, Lyon and Lille.', 'similarity': 0.17027863777089783}, {'id': 28, 'title': 'Toulouse', 'url': 'https://en.wikipedia.org/wiki/Toulouse', 'article': 'It is the fourth-largest commune in France, with 479,553 inhabitants within its municipal boundaries (as of January 2017), after Paris, Marseille and Lyon, ahead of Nice; it has a population of 1,360,829 within its wider metropolitan area (also as of January 2017).', 'similarity': 0.16666666666666666}, {'id': 59, 'title': 'Bordeaux', 'url': 'https://en.wikipedia.org/wiki/Bordeaux', 'article': 'Bordeaux is the centre of Bordeaux Métropole that has a population of 796,273 (2019), the sixth-largest in France after Paris, Lyon, Marseille, Toulouse and Lille with its immediate suburbs and closest satellite towns.', 'similarity': 0.1593172119487909}]