Package org.wikidata.wdtk.examples
Class DataExtractionProcessor
java.lang.Object
org.wikidata.wdtk.examples.DataExtractionProcessor
- All Implemented Interfaces:
EntityDocumentProcessor
This simple
EntityDocumentProcessor
finds all items with a GND
identifier (property P227) who are also humans (P31 with value Q5), and
extracts for each of them the id, GND value, as well as English and German
labels and Wikipedia articles, if any. The results are written to a CSV file
"extracted-data.csv". The extracted property can be modified by changing the
value for extractPropertyId
. The current code
only extracts the first value for this property if many are given. The filter
condition (P31::Q5) can also be changed in the code.- Author:
- Markus Kroetzsch
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
static void
Main method.static void
Prints some basic documentation about this program.void
Prints the current status, time and entity count.void
processItemDocument
(ItemDocument itemDocument) Processes the given ItemDocument.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.wikidata.wdtk.datamodel.interfaces.EntityDocumentProcessor
processEntityRedirectDocument, processLexemeDocument, processMediaInfoDocument, processPropertyDocument
-
Constructor Details
-
DataExtractionProcessor
- Throws:
IOException
-
-
Method Details
-
main
Main method. Processes the whole dump using this processor. To change which dump file to use and whether to run in offline mode, modify the settings inExampleHelpers
.- Parameters:
args
-- Throws:
IOException
-
processItemDocument
Description copied from interface:EntityDocumentProcessor
Processes the given ItemDocument.- Specified by:
processItemDocument
in interfaceEntityDocumentProcessor
- Parameters:
itemDocument
- the ItemDocument
-
printStatus
public void printStatus()Prints the current status, time and entity count. -
printDocumentation
public static void printDocumentation()Prints some basic documentation about this program. -
close
public void close()
-