public class GenderRatioProcessor extends Object implements EntityDocumentProcessor
For each Wikidata item we consider all the Wikimedia projects (Wikipedia etc.) that have an article on this subject. We find out if the Wikidata item is about a human and which sex/gender values it has (if any). We then count the pages, humans, humans with gender, and humans with each particular gender for each site. The script genderates intermediate status reports for the biggest sites, and eventually writes a CSV file with all the data for all the sites.
There are certainly more than two genders, but in fact we cannot even assume a previously known list of genders. So we collect the data in a way that allows arbitrary items as values for gender. We make an effort to find an English label for all of them, but we don't go as far as looking through the dump twice (if we encounter a gender value after the item for that gender was already processed, we cannot go back to fetch the value). It is possible to preconfigure some labels so as to have them set from the very start.
The program could also be used to compare the amount of other articles by
language. For this, the value of filterClass
can
be changed.
Modifier and Type | Class and Description |
---|---|
static class |
GenderRatioProcessor.SiteRecord
Class to store basic information for each site in a simple format.
|
static class |
GenderRatioProcessor.SiteRecordComparator
Class to order site records human page count.
|
Constructor and Description |
---|
GenderRatioProcessor()
Constructor.
|
Modifier and Type | Method and Description |
---|---|
static void |
main(String[] args)
Main method.
|
static void |
printDocumentation()
Prints some basic documentation about this program.
|
void |
processItemDocument(ItemDocument itemDocument)
Processes the given ItemDocument.
|
void |
writeFinalResults()
Writes the results of the processing to a CSV file.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
processEntityRedirectDocument, processLexemeDocument, processMediaInfoDocument, processPropertyDocument
public static void main(String[] args)
ExampleHelpers
.public void processItemDocument(ItemDocument itemDocument)
EntityDocumentProcessor
processItemDocument
in interface EntityDocumentProcessor
itemDocument
- the ItemDocumentpublic void writeFinalResults()
public static void printDocumentation()
Copyright © 2014–2024 Wikidata Toolkit Developers. Generated from source code published under the Apache License 2.0. For more information, see the Wikidata Toolkit homepage