Interface MwDumpFile

All Known Implementing Classes:
JsonOnlineDumpFile, MwLocalDumpFile, WmfDumpFile, WmfLocalDumpFile, WmfOnlineStandardDumpFile

public interface MwDumpFile
Representation of MediaWiki dump files, which provides access to important basic properties of dumps, and to the content of the dump itself.
Author:
Markus Kroetzsch
  • Method Details

    • isAvailable

      boolean isAvailable()
      Checks if the dump is actually available. Should be called before getDumpFileReader(). Depending on the type of dumpfile, this will trigger one or more checks to make sure that all relevant data can be accessed for this dump file. This is still no definite guarantee that the download will succeed, since there can always be IO errors anyway, but it helps to detect cases where the dump is clearly not in a usable state.
      Returns:
      true if the dump file is likely to be available
    • getProjectName

      String getProjectName()
      Returns the project name for this dump. Together with the dump content type and date stamp, this identifies the dump, and it is therefore always available.
      Returns:
      a project name string
    • getDateStamp

      String getDateStamp()
      Returns the date stamp for this dump. Together with the project name and dump content type, this identifies the dump, and it is therefore always available.
      Returns:
      a string that represents a date in format YYYYMMDD
    • getDumpContentType

      DumpContentType getDumpContentType()
      Returns information about the content of the dump. Together with the project name and date stamp, this identifies the dump, and it is therefore always available.
      Returns:
      the content type of this dump
    • getDumpFileStream

      InputStream getDumpFileStream() throws IOException
      Returns an input stream that provides access to the (uncompressed) text content of the dump file.

      It is important to close the stream after use.

      Returns:
      an input stream to read the dump file
      Throws:
      IOException - if the dump file contents could not be accessed
    • getDumpFileReader

      BufferedReader getDumpFileReader() throws IOException
      Returns a buffered reader that provides access to the (uncompressed) text content of the dump file.

      It is important to close the reader after use.

      Returns:
      a buffered reader to read the dump file
      Throws:
      IOException - if the dump file contents could not be accessed
    • prepareDumpFile

      void prepareDumpFile() throws IOException
      Prepares the dump file for access via getDumpFileStream() or getDumpFileReader(). In particular, this will download any remote files.
      Throws:
      IOException - if there was a problem preparing the files