Class MwLocalDumpFile

java.lang.Object
org.wikidata.wdtk.dumpfiles.MwLocalDumpFile
All Implemented Interfaces:
MwDumpFile

public class MwLocalDumpFile extends Object implements MwDumpFile
Class for representing dump files that are found at arbitrary (local) file paths. The meta-data for the dump file (content type, time stamp, etc.) can be set explicitly, or be guessed from the file name (to the extent possible).
Author:
Markus Damm, Markus Kroetzsch
  • Constructor Details

    • MwLocalDumpFile

      public MwLocalDumpFile(String filepath)
      Constructor. The DumpContentType will be inferred by the name of the file, if possible. If it is not possible, it will be set to JSON by default.
      Parameters:
      filepath - Path to the dump file in the file system
    • MwLocalDumpFile

      public MwLocalDumpFile(String filePath, DumpContentType dumpContentType, String dateStamp, String projectName)
      Constructor.
      Parameters:
      filePath - Path to the dump file in the file system
      dumpContentType - DumpContentType of the dump file, or null if not known to guess it from file name; this information is essential to invoke the correct processing code to read the dump file
      dateStamp - dump date in format YYYYMMDD, or null if not known to guess it from file name; this is mainly used for logs and messages
      projectName - project name string, or null to use a default string; this is mainly used for logs and messages
  • Method Details

    • getPath

      public Path getPath()
      Returns the absolute path to this dump file.
      Returns:
      path
    • isAvailable

      public boolean isAvailable()
      Description copied from interface: MwDumpFile
      Checks if the dump is actually available. Should be called before MwDumpFile.getDumpFileReader(). Depending on the type of dumpfile, this will trigger one or more checks to make sure that all relevant data can be accessed for this dump file. This is still no definite guarantee that the download will succeed, since there can always be IO errors anyway, but it helps to detect cases where the dump is clearly not in a usable state.
      Specified by:
      isAvailable in interface MwDumpFile
      Returns:
      true if the dump file is likely to be available
    • getProjectName

      public String getProjectName()
      Description copied from interface: MwDumpFile
      Returns the project name for this dump. Together with the dump content type and date stamp, this identifies the dump, and it is therefore always available.
      Specified by:
      getProjectName in interface MwDumpFile
      Returns:
      a project name string
    • getDateStamp

      public String getDateStamp()
      Description copied from interface: MwDumpFile
      Returns the date stamp for this dump. Together with the project name and dump content type, this identifies the dump, and it is therefore always available.
      Specified by:
      getDateStamp in interface MwDumpFile
      Returns:
      a string that represents a date in format YYYYMMDD
    • getDumpContentType

      public DumpContentType getDumpContentType()
      Description copied from interface: MwDumpFile
      Returns information about the content of the dump. Together with the project name and date stamp, this identifies the dump, and it is therefore always available.
      Specified by:
      getDumpContentType in interface MwDumpFile
      Returns:
      the content type of this dump
    • getDumpFileStream

      public InputStream getDumpFileStream() throws IOException
      Description copied from interface: MwDumpFile
      Returns an input stream that provides access to the (uncompressed) text content of the dump file.

      It is important to close the stream after use.

      Specified by:
      getDumpFileStream in interface MwDumpFile
      Returns:
      an input stream to read the dump file
      Throws:
      IOException - if the dump file contents could not be accessed
    • getDumpFileReader

      public BufferedReader getDumpFileReader() throws IOException
      Description copied from interface: MwDumpFile
      Returns a buffered reader that provides access to the (uncompressed) text content of the dump file.

      It is important to close the reader after use.

      Specified by:
      getDumpFileReader in interface MwDumpFile
      Returns:
      a buffered reader to read the dump file
      Throws:
      IOException - if the dump file contents could not be accessed
    • prepareDumpFile

      public void prepareDumpFile()
      Description copied from interface: MwDumpFile
      Prepares the dump file for access via MwDumpFile.getDumpFileStream() or MwDumpFile.getDumpFileReader(). In particular, this will download any remote files.
      Specified by:
      prepareDumpFile in interface MwDumpFile
    • toString

      public String toString()
      Overrides:
      toString in class Object