Package org.wikidata.wdtk.dumpfiles
Interface MwDumpFile
- All Known Implementing Classes:
JsonOnlineDumpFile
,MwLocalDumpFile
,WmfDumpFile
,WmfLocalDumpFile
,WmfOnlineStandardDumpFile
public interface MwDumpFile
Representation of MediaWiki dump files, which provides access to important
basic properties of dumps, and to the content of the dump itself.
- Author:
- Markus Kroetzsch
-
Nested Class Summary
Modifier and TypeInterfaceDescriptionstatic class
Comparator to sort dumps by date. -
Method Summary
Modifier and TypeMethodDescriptionReturns the date stamp for this dump.Returns information about the content of the dump.Returns a buffered reader that provides access to the (uncompressed) text content of the dump file.Returns an input stream that provides access to the (uncompressed) text content of the dump file.Returns the project name for this dump.boolean
Checks if the dump is actually available.void
Prepares the dump file for access viagetDumpFileStream()
orgetDumpFileReader()
.
-
Method Details
-
isAvailable
boolean isAvailable()Checks if the dump is actually available. Should be called beforegetDumpFileReader()
. Depending on the type of dumpfile, this will trigger one or more checks to make sure that all relevant data can be accessed for this dump file. This is still no definite guarantee that the download will succeed, since there can always be IO errors anyway, but it helps to detect cases where the dump is clearly not in a usable state.- Returns:
- true if the dump file is likely to be available
-
getProjectName
String getProjectName()Returns the project name for this dump. Together with the dump content type and date stamp, this identifies the dump, and it is therefore always available.- Returns:
- a project name string
-
getDateStamp
String getDateStamp()Returns the date stamp for this dump. Together with the project name and dump content type, this identifies the dump, and it is therefore always available.- Returns:
- a string that represents a date in format YYYYMMDD
-
getDumpContentType
DumpContentType getDumpContentType()Returns information about the content of the dump. Together with the project name and date stamp, this identifies the dump, and it is therefore always available.- Returns:
- the content type of this dump
-
getDumpFileStream
Returns an input stream that provides access to the (uncompressed) text content of the dump file.It is important to close the stream after use.
- Returns:
- an input stream to read the dump file
- Throws:
IOException
- if the dump file contents could not be accessed
-
getDumpFileReader
Returns a buffered reader that provides access to the (uncompressed) text content of the dump file.It is important to close the reader after use.
- Returns:
- a buffered reader to read the dump file
- Throws:
IOException
- if the dump file contents could not be accessed
-
prepareDumpFile
Prepares the dump file for access viagetDumpFileStream()
orgetDumpFileReader()
. In particular, this will download any remote files.- Throws:
IOException
- if there was a problem preparing the files
-