Interface MwRevisionProcessor

All Known Implementing Classes:
MwRevisionProcessorBroker, StatisticsMwRevisionProcessor, WikibaseRevisionProcessor

public interface MwRevisionProcessor
General interface for classes that process revisions of MediaWiki pages.
Author:
Markus Kroetzsch
  • Method Details

    • startRevisionProcessing

      void startRevisionProcessing(String siteName, String baseUrl, Map<Integer,String> namespaces)
      Initialises the revision processor for processing revisions. General information about the configuration of the site for which revisions are being processed is provided.
      Parameters:
      siteName - the name of the site
      baseUrl - the base URL of the site
      namespaces - map from integer namespace ids to namespace prefixes; namespace strings do not include the final ":" used in MediaWiki to separate namespace prefixes from article titles, and the prefixes use spaces, not underscores as in MediaWiki URLs.
    • processRevision

      void processRevision(MwRevision mwRevision)
      Process the given MediaWiki revision.
      Parameters:
      mwRevision - the revision to process
    • finishRevisionProcessing

      void finishRevisionProcessing()
      Performs final actions that should be done after all revisions in a batch of revisions have been processed. This is usually called after a whole dumpfile is completely processed.

      It is important to understand that this method might be called many times during one processing run. Its main purpose is to signal the completion of one file, not of the whole processing. This is used only to manage the control flow of revision processing (e.g., to be sure that the most recent revision of a page has certainly been found). This method must not be used to do things that should happen at the very end of a run, such as writing a file with results.