Package org.wikidata.wdtk.dumpfiles
Class MwRevisionProcessorBroker
java.lang.Object
org.wikidata.wdtk.dumpfiles.MwRevisionProcessorBroker
- All Implemented Interfaces:
MwRevisionProcessor
This MwRevisionPRocessor distributes revisions to subscribers that register
their interest in some type of message (revision). Duplicate revisions are
filtered.
The broker also allows subscribers to receive only the most current revision
of a page rather than all revisions. To compute this efficiently, the broker
assumes that blocks of revisions are processed in inverse chronological
order, as it is the case when processing MediaWiki dump files in inverse
chronological order. Revisions within a single block of revisions for one
page do not need to be ordered in any specific way.
- Author:
- Markus Kroetzsch
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionvoid
Finalises the processing of one dump file (and hence of the current block of pages).void
processRevision
(MwRevision mwRevision) Process the given MediaWiki revision.void
registerMwRevisionProcessor
(MwRevisionProcessor mwRevisionProcessor, String model, boolean onlyCurrentRevisions) Registers an MwRevisionProcessor, which will henceforth be notified of all revisions that are encountered in the dump.void
Initialises the revision processor for processing revisions.
-
Constructor Details
-
MwRevisionProcessorBroker
public MwRevisionProcessorBroker()
-
-
Method Details
-
registerMwRevisionProcessor
public void registerMwRevisionProcessor(MwRevisionProcessor mwRevisionProcessor, String model, boolean onlyCurrentRevisions) Registers an MwRevisionProcessor, which will henceforth be notified of all revisions that are encountered in the dump.Importantly, the
MwRevision
that the registered processors will receive is owned by thisMwRevisionProcessorBroker
. Its data is valid only during the execution ofMwRevisionProcessor.processRevision(MwRevision)
, but it will not be permanent. If the data is to be retained permanently, the revision processor needs to make its own copy.- Parameters:
mwRevisionProcessor
- the revision processor to registermodel
- the content model that the processor is registered for; it will only be notified of revisions in that model; if null is given, all revisions will be processed whatever their modelonlyCurrentRevisions
- if true, then the subscriber is only notified of the most current revisions; if false, then it will receive all revisions, current or not
-
startRevisionProcessing
public void startRevisionProcessing(String siteName, String baseUrl, Map<Integer, String> namespaces) Description copied from interface:MwRevisionProcessor
Initialises the revision processor for processing revisions. General information about the configuration of the site for which revisions are being processed is provided.- Specified by:
startRevisionProcessing
in interfaceMwRevisionProcessor
- Parameters:
siteName
- the name of the sitebaseUrl
- the base URL of the sitenamespaces
- map from integer namespace ids to namespace prefixes; namespace strings do not include the final ":" used in MediaWiki to separate namespace prefixes from article titles, and the prefixes use spaces, not underscores as in MediaWiki URLs.
-
processRevision
Description copied from interface:MwRevisionProcessor
Process the given MediaWiki revision.- Specified by:
processRevision
in interfaceMwRevisionProcessor
- Parameters:
mwRevision
- the revision to process
-
finishRevisionProcessing
public void finishRevisionProcessing()Finalises the processing of one dump file (and hence of the current block of pages). In particular, this means that the most current revision found up to this point is really the most current one, so that subscribers should be notified.- Specified by:
finishRevisionProcessing
in interfaceMwRevisionProcessor
-