Class GenderRatioProcessor
- All Implemented Interfaces:
EntityDocumentProcessor
For each Wikidata item we consider all the Wikimedia projects (Wikipedia etc.) that have an article on this subject. We find out if the Wikidata item is about a human and which sex/gender values it has (if any). We then count the pages, humans, humans with gender, and humans with each particular gender for each site. The script genderates intermediate status reports for the biggest sites, and eventually writes a CSV file with all the data for all the sites.
There are certainly more than two genders, but in fact we cannot even assume a previously known list of genders. So we collect the data in a way that allows arbitrary items as values for gender. We make an effort to find an English label for all of them, but we don't go as far as looking through the dump twice (if we encounter a gender value after the item for that gender was already processed, we cannot go back to fetch the value). It is possible to preconfigure some labels so as to have them set from the very start.
The program could also be used to compare the amount of other articles by
language. For this, the value of filterClass
can
be changed.
- Author:
- Markus Kroetzsch
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Class to store basic information for each site in a simple format.static class
Class to order site records human page count. -
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic void
Main method.static void
Prints some basic documentation about this program.void
processItemDocument
(ItemDocument itemDocument) Processes the given ItemDocument.void
Writes the results of the processing to a CSV file.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface org.wikidata.wdtk.datamodel.interfaces.EntityDocumentProcessor
processEntityRedirectDocument, processLexemeDocument, processMediaInfoDocument, processPropertyDocument
-
Constructor Details
-
GenderRatioProcessor
public GenderRatioProcessor()Constructor.
-
-
Method Details
-
main
Main method. Processes the whole dump using this processor and writes the results to a file. To change which dump file to use and whether to run in offline mode, modify the settings inExampleHelpers
. -
processItemDocument
Description copied from interface:EntityDocumentProcessor
Processes the given ItemDocument.- Specified by:
processItemDocument
in interfaceEntityDocumentProcessor
- Parameters:
itemDocument
- the ItemDocument
-
writeFinalResults
public void writeFinalResults()Writes the results of the processing to a CSV file. -
printDocumentation
public static void printDocumentation()Prints some basic documentation about this program.
-