Class MwRevisionProcessorBroker

java.lang.Object
org.wikidata.wdtk.dumpfiles.MwRevisionProcessorBroker
All Implemented Interfaces:
MwRevisionProcessor

public class MwRevisionProcessorBroker extends Object implements MwRevisionProcessor
This MwRevisionPRocessor distributes revisions to subscribers that register their interest in some type of message (revision). Duplicate revisions are filtered. The broker also allows subscribers to receive only the most current revision of a page rather than all revisions. To compute this efficiently, the broker assumes that blocks of revisions are processed in inverse chronological order, as it is the case when processing MediaWiki dump files in inverse chronological order. Revisions within a single block of revisions for one page do not need to be ordered in any specific way.
Author:
Markus Kroetzsch
  • Constructor Details

    • MwRevisionProcessorBroker

      public MwRevisionProcessorBroker()
  • Method Details

    • registerMwRevisionProcessor

      public void registerMwRevisionProcessor(MwRevisionProcessor mwRevisionProcessor, String model, boolean onlyCurrentRevisions)
      Registers an MwRevisionProcessor, which will henceforth be notified of all revisions that are encountered in the dump.

      Importantly, the MwRevision that the registered processors will receive is owned by this MwRevisionProcessorBroker. Its data is valid only during the execution of MwRevisionProcessor.processRevision(MwRevision), but it will not be permanent. If the data is to be retained permanently, the revision processor needs to make its own copy.

      Parameters:
      mwRevisionProcessor - the revision processor to register
      model - the content model that the processor is registered for; it will only be notified of revisions in that model; if null is given, all revisions will be processed whatever their model
      onlyCurrentRevisions - if true, then the subscriber is only notified of the most current revisions; if false, then it will receive all revisions, current or not
    • startRevisionProcessing

      public void startRevisionProcessing(String siteName, String baseUrl, Map<Integer,String> namespaces)
      Description copied from interface: MwRevisionProcessor
      Initialises the revision processor for processing revisions. General information about the configuration of the site for which revisions are being processed is provided.
      Specified by:
      startRevisionProcessing in interface MwRevisionProcessor
      Parameters:
      siteName - the name of the site
      baseUrl - the base URL of the site
      namespaces - map from integer namespace ids to namespace prefixes; namespace strings do not include the final ":" used in MediaWiki to separate namespace prefixes from article titles, and the prefixes use spaces, not underscores as in MediaWiki URLs.
    • processRevision

      public void processRevision(MwRevision mwRevision)
      Description copied from interface: MwRevisionProcessor
      Process the given MediaWiki revision.
      Specified by:
      processRevision in interface MwRevisionProcessor
      Parameters:
      mwRevision - the revision to process
    • finishRevisionProcessing

      public void finishRevisionProcessing()
      Finalises the processing of one dump file (and hence of the current block of pages). In particular, this means that the most current revision found up to this point is really the most current one, so that subscribers should be notified.
      Specified by:
      finishRevisionProcessing in interface MwRevisionProcessor