Class StatisticsMwRevisionProcessor

java.lang.Object
org.wikidata.wdtk.dumpfiles.StatisticsMwRevisionProcessor
All Implemented Interfaces:
MwRevisionProcessor

public class StatisticsMwRevisionProcessor extends Object implements MwRevisionProcessor
A simple revision processor that counts some basic figures and logs the result.
Author:
Markus Kroetzsch
  • Constructor Details

    • StatisticsMwRevisionProcessor

      public StatisticsMwRevisionProcessor(String name, int logFrequency)
      Constructor.
      Parameters:
      name - a string name used in log messages to refer to this processor
      logFrequency - the number of revisions after which an intermediate status report should be logged; or -1 if no such reports should be logged
  • Method Details

    • getTotalRevisionCount

      public long getTotalRevisionCount()
      Returns the total number of revisions processed so far.
      Returns:
      the number of revisions
    • getCurrentRevisionCount

      public long getCurrentRevisionCount()
      Returns the number of revisions processed in the current run.
      Returns:
      the number of revisions
    • startRevisionProcessing

      public void startRevisionProcessing(String siteName, String baseUrl, Map<Integer,String> namespaces)
      Description copied from interface: MwRevisionProcessor
      Initialises the revision processor for processing revisions. General information about the configuration of the site for which revisions are being processed is provided.
      Specified by:
      startRevisionProcessing in interface MwRevisionProcessor
      Parameters:
      siteName - the name of the site
      baseUrl - the base URL of the site
      namespaces - map from integer namespace ids to namespace prefixes; namespace strings do not include the final ":" used in MediaWiki to separate namespace prefixes from article titles, and the prefixes use spaces, not underscores as in MediaWiki URLs.
    • processRevision

      public void processRevision(MwRevision mwRevision)
      Description copied from interface: MwRevisionProcessor
      Process the given MediaWiki revision.
      Specified by:
      processRevision in interface MwRevisionProcessor
      Parameters:
      mwRevision - the revision to process
    • finishRevisionProcessing

      public void finishRevisionProcessing()
      Description copied from interface: MwRevisionProcessor
      Performs final actions that should be done after all revisions in a batch of revisions have been processed. This is usually called after a whole dumpfile is completely processed.

      It is important to understand that this method might be called many times during one processing run. Its main purpose is to signal the completion of one file, not of the whole processing. This is used only to manage the control flow of revision processing (e.g., to be sure that the most recent revision of a page has certainly been found). This method must not be used to do things that should happen at the very end of a run, such as writing a file with results.

      Specified by:
      finishRevisionProcessing in interface MwRevisionProcessor