Package org.wikidata.wdtk.dumpfiles.wmf
Class WmfDumpFile
java.lang.Object
org.wikidata.wdtk.dumpfiles.wmf.WmfDumpFile
- All Implemented Interfaces:
MwDumpFile
- Direct Known Subclasses:
JsonOnlineDumpFile
,WmfLocalDumpFile
,WmfOnlineStandardDumpFile
Abstract base class for dump files provided by the Wikimedia Foundation.
- Author:
- Markus Kroetzsch
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.wikidata.wdtk.dumpfiles.MwDumpFile
MwDumpFile.DateComparator
-
Field Summary
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionprotected abstract boolean
Finds out if the dump is ready.Returns the date stamp for this dump.static String
getDateStampFromDumpFileDirectoryName
(DumpContentType dumpContentType, String directoryName) Extracts the date stamp from a dumpfile directory name in the form that is created bygetDumpFileDirectoryName(DumpContentType, String)
.static CompressionType
getDumpFileCompressionType
(String fileName) Returns the compression type of this kind of dump file using file suffixesstatic String
getDumpFileDirectoryName
(DumpContentType dumpContentType, String dateStamp) Returns the name of the directory where the dumpfile of the given type and date should be stored.static String
getDumpFileName
(DumpContentType dumpContentType, String projectName, String dateStamp) Returns the name under which this dump file.static String
getDumpFilePostfix
(DumpContentType dumpContentType) Returns the ending used by the Wikimedia-provided dumpfile names of the given type.Returns a buffered reader that provides access to the (uncompressed) text content of the dump file.static String
getDumpFileWebDirectory
(DumpContentType dumpContentType, String projectName) Returns the absolute directory on the Web site where dumpfiles of the given type can be found.Returns the project name for this dump.boolean
Checks if the dump is actually available.static boolean
isRevisionDumpFile
(DumpContentType dumpContentType) Returns true if the given dump file type contains page revisions and false if it does not.toString()
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.wikidata.wdtk.dumpfiles.MwDumpFile
getDumpContentType, getDumpFileStream, prepareDumpFile
-
Field Details
-
DUMP_SITE_BASE_URL
The default URL of the website to obtain the dump files from.- See Also:
-
dateStamp
-
projectName
-
-
Constructor Details
-
WmfDumpFile
-
-
Method Details
-
getProjectName
Description copied from interface:MwDumpFile
Returns the project name for this dump. Together with the dump content type and date stamp, this identifies the dump, and it is therefore always available.- Specified by:
getProjectName
in interfaceMwDumpFile
- Returns:
- a project name string
-
getDateStamp
Description copied from interface:MwDumpFile
Returns the date stamp for this dump. Together with the project name and dump content type, this identifies the dump, and it is therefore always available.- Specified by:
getDateStamp
in interfaceMwDumpFile
- Returns:
- a string that represents a date in format YYYYMMDD
-
isAvailable
public boolean isAvailable()Description copied from interface:MwDumpFile
Checks if the dump is actually available. Should be called beforeMwDumpFile.getDumpFileReader()
. Depending on the type of dumpfile, this will trigger one or more checks to make sure that all relevant data can be accessed for this dump file. This is still no definite guarantee that the download will succeed, since there can always be IO errors anyway, but it helps to detect cases where the dump is clearly not in a usable state.- Specified by:
isAvailable
in interfaceMwDumpFile
- Returns:
- true if the dump file is likely to be available
-
toString
-
getDumpFileReader
Description copied from interface:MwDumpFile
Returns a buffered reader that provides access to the (uncompressed) text content of the dump file.It is important to close the reader after use.
- Specified by:
getDumpFileReader
in interfaceMwDumpFile
- Returns:
- a buffered reader to read the dump file
- Throws:
IOException
- if the dump file contents could not be accessed
-
fetchIsDone
protected abstract boolean fetchIsDone()Finds out if the dump is ready. For online dumps, this should return true if the file can be fetched from the Web. For local dumps, this should return true if the file is complete and not corrupted. For some types of dumps, there are ways of checking this easily (i.e., without reading the full file). If this is not possible, then the method should just return "true."- Returns:
- true if the dump is done
-
getDumpFilePostfix
Returns the ending used by the Wikimedia-provided dumpfile names of the given type.- Parameters:
dumpContentType
- the type of dump- Returns:
- postfix of the dumpfile name
- Throws:
IllegalArgumentException
- if the given dump file type is not known
-
getDumpFileWebDirectory
Returns the absolute directory on the Web site where dumpfiles of the given type can be found.- Parameters:
dumpContentType
- the type of dump- Returns:
- relative web directory for the current dumpfiles
- Throws:
IllegalArgumentException
- if the given dump file type is not known
-
getDumpFileCompressionType
Returns the compression type of this kind of dump file using file suffixes- Parameters:
fileName
- the name of the file- Returns:
- compression type
- Throws:
IllegalArgumentException
- if the given dump file type is not known
-
getDumpFileDirectoryName
Returns the name of the directory where the dumpfile of the given type and date should be stored.- Parameters:
dumpContentType
- the type of the dumpdateStamp
- the date of the dump in format YYYYMMDD- Returns:
- the local directory name for the dumpfile
-
getDateStampFromDumpFileDirectoryName
public static String getDateStampFromDumpFileDirectoryName(DumpContentType dumpContentType, String directoryName) Extracts the date stamp from a dumpfile directory name in the form that is created bygetDumpFileDirectoryName(DumpContentType, String)
. It is not checked that the given directory name has the right format; if it has not, the result will not be a date stamp but some other string.- Parameters:
dumpContentType
-directoryName
-- Returns:
- the date stamp
-
getDumpFileName
public static String getDumpFileName(DumpContentType dumpContentType, String projectName, String dateStamp) Returns the name under which this dump file. This is the name used online and also locally when downloading the file.- Parameters:
dumpContentType
- the type of the dumpprojectName
- the project name, e.g. "wikidatawiki"dateStamp
- the date of the dump in format YYYYMMDD- Returns:
- file name string
-
isRevisionDumpFile
Returns true if the given dump file type contains page revisions and false if it does not. Dumps that do not contain pages are for auxiliary information such as linked sites.- Parameters:
dumpContentType
- the type of dump- Returns:
- true if the dumpfile contains revisions
- Throws:
IllegalArgumentException
- if the given dump file type is not known
-