Class GenderRatioProcessor

java.lang.Object
org.wikidata.wdtk.examples.GenderRatioProcessor
All Implemented Interfaces:
EntityDocumentProcessor

public class GenderRatioProcessor extends Object implements EntityDocumentProcessor
This document processor calculates the gender ratios of people featured on Wikimedia projects. It is inspired by the investigations of Max Klein.

For each Wikidata item we consider all the Wikimedia projects (Wikipedia etc.) that have an article on this subject. We find out if the Wikidata item is about a human and which sex/gender values it has (if any). We then count the pages, humans, humans with gender, and humans with each particular gender for each site. The script genderates intermediate status reports for the biggest sites, and eventually writes a CSV file with all the data for all the sites.

There are certainly more than two genders, but in fact we cannot even assume a previously known list of genders. So we collect the data in a way that allows arbitrary items as values for gender. We make an effort to find an English label for all of them, but we don't go as far as looking through the dump twice (if we encounter a gender value after the item for that gender was already processed, we cannot go back to fetch the value). It is possible to preconfigure some labels so as to have them set from the very start.

The program could also be used to compare the amount of other articles by language. For this, the value of filterClass can be changed.

Author:
Markus Kroetzsch
  • Constructor Details

    • GenderRatioProcessor

      public GenderRatioProcessor()
      Constructor.
  • Method Details

    • main

      public static void main(String[] args)
      Main method. Processes the whole dump using this processor and writes the results to a file. To change which dump file to use and whether to run in offline mode, modify the settings in ExampleHelpers.
    • processItemDocument

      public void processItemDocument(ItemDocument itemDocument)
      Description copied from interface: EntityDocumentProcessor
      Processes the given ItemDocument.
      Specified by:
      processItemDocument in interface EntityDocumentProcessor
      Parameters:
      itemDocument - the ItemDocument
    • writeFinalResults

      public void writeFinalResults()
      Writes the results of the processing to a CSV file.
    • printDocumentation

      public static void printDocumentation()
      Prints some basic documentation about this program.