Class DataExtractionProcessor

java.lang.Object
org.wikidata.wdtk.examples.DataExtractionProcessor
All Implemented Interfaces:
EntityDocumentProcessor

public class DataExtractionProcessor extends Object implements EntityDocumentProcessor
This simple EntityDocumentProcessor finds all items with a GND identifier (property P227) who are also humans (P31 with value Q5), and extracts for each of them the id, GND value, as well as English and German labels and Wikipedia articles, if any. The results are written to a CSV file "extracted-data.csv". The extracted property can be modified by changing the value for extractPropertyId. The current code only extracts the first value for this property if many are given. The filter condition (P31::Q5) can also be changed in the code.
Author:
Markus Kroetzsch
  • Constructor Details

  • Method Details

    • main

      public static void main(String[] args) throws IOException
      Main method. Processes the whole dump using this processor. To change which dump file to use and whether to run in offline mode, modify the settings in ExampleHelpers.
      Parameters:
      args -
      Throws:
      IOException
    • processItemDocument

      public void processItemDocument(ItemDocument itemDocument)
      Description copied from interface: EntityDocumentProcessor
      Processes the given ItemDocument.
      Specified by:
      processItemDocument in interface EntityDocumentProcessor
      Parameters:
      itemDocument - the ItemDocument
    • printStatus

      public void printStatus()
      Prints the current status, time and entity count.
    • printDocumentation

      public static void printDocumentation()
      Prints some basic documentation about this program.
    • close

      public void close()