Package org.wikidata.wdtk.dumpfiles.wmf
Class WmfOnlineStandardDumpFile
java.lang.Object
org.wikidata.wdtk.dumpfiles.wmf.WmfDumpFile
org.wikidata.wdtk.dumpfiles.wmf.WmfOnlineStandardDumpFile
- All Implemented Interfaces:
MwDumpFile
Class for representing dump files published by the Wikimedia Foundation in
the main common location of all dump files. This excludes incremental daily
dumps, which are found in another directory. The dump file and additional
information about its status is online and web access is needed to fetch this
data on demand.
- Author:
- Markus Kroetzsch
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.wikidata.wdtk.dumpfiles.MwDumpFile
MwDumpFile.DateComparator
-
Field Summary
Fields inherited from class org.wikidata.wdtk.dumpfiles.wmf.WmfDumpFile
dateStamp, DUMP_SITE_BASE_URL, projectName
-
Constructor Summary
ConstructorDescriptionWmfOnlineStandardDumpFile
(String dateStamp, String projectName, WebResourceFetcher webResourceFetcher, DirectoryManager dumpfileDirectoryManager, DumpContentType dumpContentType) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionprotected boolean
Finds out if the dump is ready.Returns information about the content of the dump.Returns an input stream that provides access to the (uncompressed) text content of the dump file.void
Prepares the dump file for access viaMwDumpFile.getDumpFileStream()
orMwDumpFile.getDumpFileReader()
.Methods inherited from class org.wikidata.wdtk.dumpfiles.wmf.WmfDumpFile
getDateStamp, getDateStampFromDumpFileDirectoryName, getDumpFileCompressionType, getDumpFileDirectoryName, getDumpFileName, getDumpFilePostfix, getDumpFileReader, getDumpFileWebDirectory, getProjectName, isAvailable, isRevisionDumpFile, toString
-
Constructor Details
-
WmfOnlineStandardDumpFile
public WmfOnlineStandardDumpFile(String dateStamp, String projectName, WebResourceFetcher webResourceFetcher, DirectoryManager dumpfileDirectoryManager, DumpContentType dumpContentType) Constructor.- Parameters:
dateStamp
- dump date in format YYYYMMDDprojectName
- project name stringwebResourceFetcher
- object to use for accessing the webdumpfileDirectoryManager
- the directory manager for the directory where dumps should be downloaded todumpContentType
- the type of dump this represents
-
-
Method Details
-
getDumpContentType
Description copied from interface:MwDumpFile
Returns information about the content of the dump. Together with the project name and date stamp, this identifies the dump, and it is therefore always available.- Returns:
- the content type of this dump
-
getDumpFileStream
Description copied from interface:MwDumpFile
Returns an input stream that provides access to the (uncompressed) text content of the dump file.It is important to close the stream after use.
- Returns:
- an input stream to read the dump file
- Throws:
IOException
- if the dump file contents could not be accessed
-
prepareDumpFile
Description copied from interface:MwDumpFile
Prepares the dump file for access viaMwDumpFile.getDumpFileStream()
orMwDumpFile.getDumpFileReader()
. In particular, this will download any remote files.- Throws:
IOException
- if there was a problem preparing the files
-
fetchIsDone
protected boolean fetchIsDone()Description copied from class:WmfDumpFile
Finds out if the dump is ready. For online dumps, this should return true if the file can be fetched from the Web. For local dumps, this should return true if the file is complete and not corrupted. For some types of dumps, there are ways of checking this easily (i.e., without reading the full file). If this is not possible, then the method should just return "true."- Specified by:
fetchIsDone
in classWmfDumpFile
- Returns:
- true if the dump is done
-