public class TutorialDocumentProcessor extends Object implements EntityDocumentProcessor
EntityDocumentProcessor
that can be
modified to try your own code.
Exercise 1: Just run the code as it is and have a look at the output. It will print a lot of data about item documents to the console. You can see roughly what the data looks like. Find the data for one item and look up the item on wikidata.org. Find the data that you can see on the Web page in the print out (note that some details might have changed since you local data is based on a dump).
Exercise 2: The code below already counts how many items and properties it processes. Add additional counters to count: (1) the number of labels, (2) the number of aliases, (3) the number of statements, (4) the number of site links. Print this data at the end or write it to the file if you like.
Exercise 3: Extend your code from Exercise 2 to count how many items have a link to English Wikipedia (or another Wikipedia of your choice). The site identifier used in the data for English Wikipedia is "enwiki".
Exercise 4: Building on the code of Exercise 3, count the number of site links for all sites that are linked. Use, for example, a hashmap to store integer counters for each site id you encounter. Print the results to a CSV file and load the file into a spreadsheet application (this can also be an online application such as Google Drive). You can order the data by count and create a diagram. The number of site links should be close to the number of articles in the project.
Exercise 5: Compute the average life expectancy of people on Wikidata. To do this, consider items with a birth date (P569) and death date (P570). Whenever both dates are found, compute the difference of years between the dates. Store the sum of these lifespans (in years) and the number of people for which you recorded a lifespace to compute the average. Some hints:
TimeValue.getPrecision()
. You should only consider values with
precision greater or equal to TimeValue.PREC_DAY
.Exercise 6: Compute the average life span as in Exercise 5, but now grouped by year of birth. This will show you how life expectancy changed over time (at least for people with Wikipedia articles). For this, create arrays or maps to store the sum of the lifespan and number of people for each year of birth. Finally, compute all the averages and store them to a CSV file that gives the average life expectancy for each year of birth. Load this file into a spreadsheet too to create a diagram. What do you notice? Some hints:
Constructor and Description |
---|
TutorialDocumentProcessor() |
Modifier and Type | Method and Description |
---|---|
void |
processItemDocument(ItemDocument itemDocument)
Processes one item document.
|
void |
processPropertyDocument(PropertyDocument propertyDocument)
Processes one property document.
|
void |
storeResults()
Stores the processing results in a file.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
processEntityRedirectDocument, processLexemeDocument, processMediaInfoDocument
public void processItemDocument(ItemDocument itemDocument)
processItemDocument
in interface EntityDocumentProcessor
itemDocument
- the ItemDocumentpublic void processPropertyDocument(PropertyDocument propertyDocument)
processPropertyDocument
in interface EntityDocumentProcessor
propertyDocument
- the PropertyDocumentpublic void storeResults()
Copyright © 2014–2024 Wikidata Toolkit Developers. Generated from source code published under the Apache License 2.0. For more information, see the Wikidata Toolkit homepage