Class WikibaseRevisionProcessor

java.lang.Object
org.wikidata.wdtk.dumpfiles.WikibaseRevisionProcessor
All Implemented Interfaces:
MwRevisionProcessor

public class WikibaseRevisionProcessor extends Object implements MwRevisionProcessor
A revision processor that processes Wikibase entity content from a dump file. Revisions are parsed to obtain EntityDocument objects.
Author:
Markus Kroetzsch
  • Constructor Details

    • WikibaseRevisionProcessor

      public WikibaseRevisionProcessor(EntityDocumentProcessor entityDocumentProcessor, String siteIri)
      Constructor.
      Parameters:
      entityDocumentProcessor - the object that entity documents will be forwarded to
      siteIri - the IRI of the site that the data comes from, as used in EntityIdValue.getSiteIri()
  • Method Details

    • startRevisionProcessing

      public void startRevisionProcessing(String siteName, String baseUrl, Map<Integer,String> namespaces)
      Description copied from interface: MwRevisionProcessor
      Initialises the revision processor for processing revisions. General information about the configuration of the site for which revisions are being processed is provided.
      Specified by:
      startRevisionProcessing in interface MwRevisionProcessor
      Parameters:
      siteName - the name of the site
      baseUrl - the base URL of the site
      namespaces - map from integer namespace ids to namespace prefixes; namespace strings do not include the final ":" used in MediaWiki to separate namespace prefixes from article titles, and the prefixes use spaces, not underscores as in MediaWiki URLs.
    • processRevision

      public void processRevision(MwRevision mwRevision)
      Description copied from interface: MwRevisionProcessor
      Process the given MediaWiki revision.
      Specified by:
      processRevision in interface MwRevisionProcessor
      Parameters:
      mwRevision - the revision to process
    • processItemRevision

      public void processItemRevision(MwRevision mwRevision)
    • processPropertyRevision

      public void processPropertyRevision(MwRevision mwRevision)
    • finishRevisionProcessing

      public void finishRevisionProcessing()
      Description copied from interface: MwRevisionProcessor
      Performs final actions that should be done after all revisions in a batch of revisions have been processed. This is usually called after a whole dumpfile is completely processed.

      It is important to understand that this method might be called many times during one processing run. Its main purpose is to signal the completion of one file, not of the whole processing. This is used only to manage the control flow of revision processing (e.g., to be sure that the most recent revision of a page has certainly been found). This method must not be used to do things that should happen at the very end of a run, such as writing a file with results.

      Specified by:
      finishRevisionProcessing in interface MwRevisionProcessor