Package org.wikidata.wdtk.dumpfiles
Class MwRevisionDumpFileProcessor
java.lang.Object
org.wikidata.wdtk.dumpfiles.MwRevisionDumpFileProcessor
- All Implemented Interfaces:
MwDumpFileProcessor
This class processes MediaWiki dumpfiles that contain lists of page revisions
in the specific XML format used by MediaWiki for exporting pages. It extracts
all revisions and forwards them to any registered revision processor. The
class also keeps track of whether or not a certain article respectively
revision has already been encountered. Therefore, no revision is processed
twice and the registered revision processors can be informed whether the
revision is the first of the given article or not. The first revision of an
article that is encountered in a MediaWiki dump file is usually the most
recent one. If multiple dump files are processed in reverse chronological
order, the first revision that is encountered is also the most recent one
overall.
- Author:
- Markus Kroetzsch
-
Constructor Summary
ConstructorDescriptionMwRevisionDumpFileProcessor
(MwRevisionProcessor mwRevisionProcessor) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionvoid
processDumpFileContents
(InputStream inputStream, MwDumpFile dumpFile) Process dump file data from the given input stream.void
reset()
Resets the internal state of the object.
-
Constructor Details
-
MwRevisionDumpFileProcessor
Constructor.- Parameters:
mwRevisionProcessor
- the revision processor to which all revisions will be reported
-
-
Method Details
-
reset
public void reset()Resets the internal state of the object. All information gathered from previously processed dumps and all related statistics will be forgotten. If this method is not called, then consecutive invocations ofprocessDumpFileContents(InputStream, MwDumpFile)
will continue to add to the internal state. This is useful for processing dumps that are split into several parts.This will not unregister any MwRevisionProcessors.
-
processDumpFileContents
Description copied from interface:MwDumpFileProcessor
Process dump file data from the given input stream.The input stream is obtained from the given dump file via
MwDumpFile.getDumpFileStream()
. It will be closed by the caller.- Specified by:
processDumpFileContents
in interfaceMwDumpFileProcessor
- Parameters:
inputStream
- to access the contents of the dumpdumpFile
- to access further information about this dump
-