Package org.wikidata.wdtk.dumpfiles
Class MwLocalDumpFile
java.lang.Object
org.wikidata.wdtk.dumpfiles.MwLocalDumpFile
- All Implemented Interfaces:
MwDumpFile
Class for representing dump files that are found at arbitrary (local) file
paths. The meta-data for the dump file (content type, time stamp, etc.) can
be set explicitly, or be guessed from the file name (to the extent possible).
- Author:
- Markus Damm, Markus Kroetzsch
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.wikidata.wdtk.dumpfiles.MwDumpFile
MwDumpFile.DateComparator
-
Constructor Summary
ConstructorDescriptionMwLocalDumpFile
(String filepath) Constructor.MwLocalDumpFile
(String filePath, DumpContentType dumpContentType, String dateStamp, String projectName) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionReturns the date stamp for this dump.Returns information about the content of the dump.Returns a buffered reader that provides access to the (uncompressed) text content of the dump file.Returns an input stream that provides access to the (uncompressed) text content of the dump file.getPath()
Returns the absolute path to this dump file.Returns the project name for this dump.boolean
Checks if the dump is actually available.void
Prepares the dump file for access viaMwDumpFile.getDumpFileStream()
orMwDumpFile.getDumpFileReader()
.toString()
-
Constructor Details
-
MwLocalDumpFile
Constructor. The DumpContentType will be inferred by the name of the file, if possible. If it is not possible, it will be set to JSON by default.- Parameters:
filepath
- Path to the dump file in the file system
-
MwLocalDumpFile
public MwLocalDumpFile(String filePath, DumpContentType dumpContentType, String dateStamp, String projectName) Constructor.- Parameters:
filePath
- Path to the dump file in the file systemdumpContentType
- DumpContentType of the dump file, or null if not known to guess it from file name; this information is essential to invoke the correct processing code to read the dump filedateStamp
- dump date in format YYYYMMDD, or null if not known to guess it from file name; this is mainly used for logs and messagesprojectName
- project name string, or null to use a default string; this is mainly used for logs and messages
-
-
Method Details
-
getPath
Returns the absolute path to this dump file.- Returns:
- path
-
isAvailable
public boolean isAvailable()Description copied from interface:MwDumpFile
Checks if the dump is actually available. Should be called beforeMwDumpFile.getDumpFileReader()
. Depending on the type of dumpfile, this will trigger one or more checks to make sure that all relevant data can be accessed for this dump file. This is still no definite guarantee that the download will succeed, since there can always be IO errors anyway, but it helps to detect cases where the dump is clearly not in a usable state.- Specified by:
isAvailable
in interfaceMwDumpFile
- Returns:
- true if the dump file is likely to be available
-
getProjectName
Description copied from interface:MwDumpFile
Returns the project name for this dump. Together with the dump content type and date stamp, this identifies the dump, and it is therefore always available.- Specified by:
getProjectName
in interfaceMwDumpFile
- Returns:
- a project name string
-
getDateStamp
Description copied from interface:MwDumpFile
Returns the date stamp for this dump. Together with the project name and dump content type, this identifies the dump, and it is therefore always available.- Specified by:
getDateStamp
in interfaceMwDumpFile
- Returns:
- a string that represents a date in format YYYYMMDD
-
getDumpContentType
Description copied from interface:MwDumpFile
Returns information about the content of the dump. Together with the project name and date stamp, this identifies the dump, and it is therefore always available.- Specified by:
getDumpContentType
in interfaceMwDumpFile
- Returns:
- the content type of this dump
-
getDumpFileStream
Description copied from interface:MwDumpFile
Returns an input stream that provides access to the (uncompressed) text content of the dump file.It is important to close the stream after use.
- Specified by:
getDumpFileStream
in interfaceMwDumpFile
- Returns:
- an input stream to read the dump file
- Throws:
IOException
- if the dump file contents could not be accessed
-
getDumpFileReader
Description copied from interface:MwDumpFile
Returns a buffered reader that provides access to the (uncompressed) text content of the dump file.It is important to close the reader after use.
- Specified by:
getDumpFileReader
in interfaceMwDumpFile
- Returns:
- a buffered reader to read the dump file
- Throws:
IOException
- if the dump file contents could not be accessed
-
prepareDumpFile
public void prepareDumpFile()Description copied from interface:MwDumpFile
Prepares the dump file for access viaMwDumpFile.getDumpFileStream()
orMwDumpFile.getDumpFileReader()
. In particular, this will download any remote files.- Specified by:
prepareDumpFile
in interfaceMwDumpFile
-
toString
-