Datasets
TourPedia contains two main datasets, which belong to the specific domain of tourism:
- Places
- Reviews about places
License
Tourpedia is released under the Creative Commons CCZero license.
Places
Places contain accommodations, restaurants, attractions and points of interest. Places were retrieved from the following social media: Facebook, Foursquare, Google Places and Booking. They are related to the following locations: Amsterdam, Tuscany, Barcelona, Berlin, Dubai, London, Paris and Rome.
The following table shows the description of each place.
Field | Description |
id | the unique identifier of the place |
name | name of the place (e.g. a hotel name) |
address | address of the place |
category | one among accommodation, attraction, restaurant, poi (point of interest) |
location | one among Rome, Amsterdam, London, Paris, Berlin, Dubai, Barcelona, Tuscany |
lat | Latitude |
lng | Longitude |
services | the list of services provided by the place. It is set only if the place is an accommodation. |
phone_number | national phone number associated to the place |
international_phone_number | international phone number associated to the place |
website | URL of the web site associated to the place |
Icon | picture associated to the place |
description | description of the place in the six languages of the OpeNER project |
external_urls | external URLs associated to the place. It contains the URLs of Foursquare, Facebook, GooglePlaces and Booking (the last one is present only whether the place is an accommodation) |
statistics | statistics associated to the place; they are retrieved from Foursquare and Facebook |
subCategory | The category provided by the source. It is more specific than the field category |
polarity | The opinion about the place |
Reviews
The collection Reviews contains reviews on the above-described places. The following table describes the schema of each review.
Field | Description |
Id | the unique identifier of the review |
Text | the text of the review |
language | The language of the review |
source | one among GooglePlaces, Foursquare, Facebook |
rating | Rating expressed by the user. Range is between 1 and 5 |
Time | Date of the review |
wordsCount | Number of words of the text |
analysis.kaf | The result of the OpeNER pipeline in KAF |
analysis.json | The result of the OpeNER pipeline in KAF-JSON |
polarity | The polarity of the review. It is extracted from the Polarity tagger module |
place.id | id of the place associated to the review |
place.name | Name of the place associated to the review |
place.location | Location of the place associated to the review |
place.category | Category of the place associated to the review |
authorName | The name of the review author |
Download datasets
A complete RDF dump of Tourpedia is available here.
Datasets are divided per category and location.
Places
Location | Accommodation | Restaurant | POI | Attraction |
Amsterdam | CSV | CSV | CSV | CSV |
Barcelona | CSV | CSV | CSV | CSV |
Berlin | CSV | CSV | CSV | CSV |
Dubai | CSV | CSV | CSV | CSV |
London | CSV | CSV | CSV | CSV |
Paris | CSV | CSV | CSV | CSV |
Rome | CSV | CSV | CSV | CSV |
Tuscany | CSV | CSV | CSV | CSV |