Skip to content

Nessie Server Admin Tool

This page explains how to use the Nessie Server Admin Tool to perform repository maintenance tasks such as:

  • Obtaining information about a Nessie repository;
  • Exporting a Nessie repository to a ZIP file, e.g. to create a backup;
  • Importing a Nessie repository from a ZIP file, e.g. to restore a backup;
  • Migrating from a legacy version store type.

Usage

The Nessie Server Admin Tool requires direct access to the database used by Nessie. The executable is named nessie-server-admin-tool-x.y.z-runner.jar and can be downloaded from the release page on GitHub.

Note

The Nessie Server Admin Tool is an executable jar that can be used to interact with a Nessie database directly. It should not be confused with the Nessie CLI tool, which interacts with Nessie servers via the REST API.

The Nessie Server Admin Tool nessie-server-admin-tool-x.y.z-runner.jar should use the same configuration settings as the Nessie Quarkus server. These settings should be passed to the tool using system properties, environment variables or a configuration file. The most relevant settings are those related to the database connection.

A help command is available to list all available commands and options:

java -jar nessie-server-admin-tool-x.y.z-runner.jar help

Repository information

The simplest command is info, which prints information about the Nessie repository.

For example, here is how to print information about a Nessie repository hosted in a MongoDB database called nessie running on nessie.example.com:27017:

java \
  -Dnessie.version.store.type=MONGODB \
  -Dquarkus.mongodb.database=nessie \
  -Dquarkus.mongodb.connection-string=mongodb://<user>:<password>@nessie.example.com:27017 \
  -jar nessie-server-admin-tool-runner.jar \
  info

The output should look similar to this:

No-ancestor hash:                  2e1cfa82b035c26cbbbdae632cea070514eb8b773f616aaeaf668e2f0be8f10d
Default branch head commit ID:     11b5d0f393ad84da4ae9724654d35b96863eda02101f3ff1e633e0b25e0513db
Default branch commit count:       100
Repository description version:    0
Repository description properties:

From configuration:
-------------------
Version-store type:                MONGODB
Default branch:                    main

Exporting

The following command (replace x.y.z with the version you’re using) exports your Nessie repository to a single ZIP file called my-export-file.zip,

java -jar nessie-server-admin-tool-x.y.z-runner.jar export --path my-export-file.zip

A ZIP file export contains all necessary repository information in a single, compressed file. Note that the export will only automatically generate a ZIP file, if the output path ends with .zip, otherwise it will export to a directory. You can force either option using the --output-format option.

Note

Please use the following command for advanced options.

java -jar nessie-server-admin-tool-x.y.z-runner.jar help export

Importing

The following command (replace x.y.z with the version you’re using) imports your Nessie repository from a single ZIP file called my-export-file.zip,

java -jar nessie-server-admin-tool-x.y.z-runner.jar import --path my-export-file.zip

The import will fail, if the target Nessie repository exists and is not empty. If you intentionally want to overwrite an existing Nessie repository, then use the --erase-before-import option.

Note

Please use the following command for advanced options.

java -jar nessie-server-admin-tool-x.y.z-runner.jar help import

Migrating from a legacy version store type

The admin tool can be used to fully migrate a Nessie repository from one version store type to another, see Migration for a detailed example.

Building blocks

  • Export functionality, based on AbstractNessieExporter to dump commits, named references, heads+fork points.
  • Import functionality, based on AbstractNessieImporter to load the exported data.
  • Commit log optimization to:
  • populate the list of parent-commits in all commits, according to the target Nessie repository’s configuration
  • populate the key-lists in the commits, according to the target Nessie repository’s configuration

Code examples

class CodeExamples {

  void exportExample(Persist persist, Path exportZipFile) {

    ZipArchiveExporter.builder()
      .outputFile(exportZipFile)
      .persist(persist)
      .build()
      .exportNessieRepository();
  }

  void importExample(Persist persist, Path importZipFile) {

    ImportResult importResult =
      ZipArchiveImporter.builder()
        .sourceZipFile(importZipFile)
        .persist(persist)
        .build()
        .importNessieRepository();
  }
}

ZipArchiveImporter can be replaced with FileImporter.

Export contents

Each export contains this information:

  • All commits (no specific order)
  • All named references including their heads
  • Heads + fork-points (used to feed commit-log optimization ran after a repository import)
  • Summary and inventory

Content, CommitMeta, global state, et al

A Nessie export contains all Content information without any database internal information. This means that there is no information contained whether the source repository stored Content using e.g. global state. All Contents and CommitMeta are exported in their public JSON representation.

As a side effect, an export from a Nessie repository with commits that were persisted using global state will be imported using on-reference-state. However, for content that was persisted using global state, there will multiple on-reference-states referring to the same Iceberg table-metadata.

Technical commit information

Exported commits do not contain key-lists or commit-parents or the like, because that is rather internal, implementation specific information and, additionally, the configuration of the target repository that controls the aggregated key-lists and commit-parent-lists might be different from the source repository.

However, exported commits do contain information about the commit-sequence-number and the technical created-at-timestamp.

Note

The nessie-server-admin-tool’s import command performs a commit-log optimization after all commits and named references have been created. This optimization populates missing aggregated key-lists and commit-parents. Running commit-log optimization is necessary for good performance to access contents and commit logs, but not strictly necessary. Commit-log optimization can be disabled.

Export contents consistency

Any Nessie export guarantees that the commits referenced by the named references and all their parent commits are contained in the exported data.

A Nessie export may contain unreferenced commits, for example commits that have been created while the export is running or commits that are otherwise unreferenced.

The HEADs of the named references and the heads in the HeadsAndForks structure may not be consistent, for example when commits have been created while the export is running.

Export formats

Exported data can be written either into an empty directory or as a compressed zip file.

Users can optionally zip the contents of an export to a directory and pass that to the zip-file based importer.

Command Reference

Usage: nessie-server-admin-tool-runner.jar [-hV] [COMMAND]
Nessie Server Admin Tool
  -h, --help      Show this help message and exit.
  -V, --version   Print version information and exit.
Commands:
  info                  Nessie repository information
  help                  Display help information about the specified command.
  cleanup-repository    Cleanup unreferenced data from Nessie's repository.
  cut-history           Advanced commit log manipulation command that removes
                          parents from the specified commit. Read the full help
                          message before using!
  check-content         Check content readability of active keys.
  delete-catalog-tasks  Delete persisted state of Iceberg snapshot loading
                          tasks previously executed by the Nessie Catalog.
  erase-repository      Erase current Nessie repository (all data will be lost)
                          and optionally re-initialize it.
  export                Exports a Nessie repository to the local file system.
  import                Imports a Nessie repository from the local file system.
  show-licenses         Show 3rd party license information.

Below is the output of the Nessie Server Admin tool help for all commands.

info

Usage: nessie-server-admin-tool-runner.jar info [-hV]
Nessie repository information
  -h, --help      Show this help message and exit.
  -V, --version   Print version information and exit.

check-content

Usage: nessie-server-admin-tool-runner.jar check-content [-cEhsV]
       [-B=<batchSize>] [-H=<hash>] [-o=<outputSpec>] [-r=<ref>]
       [-k=<keyElements>]...
Check content readability of active keys.
  -B, --batch=<batchSize>   The max number of keys to load at the same time.
                            If an error occurs while loading or parsing the
                              values for a single key, the error will be
                              propagated to all keys processed in the same
                              batch. In such a case, rerun the check for the
                              affected keys with a batch size of 1.
  -c, --show-content        Include content for each valid key in the output.
  -E, --error-only          Produce JSON only for keys with errors.
  -h, --help                Show this help message and exit.
  -H, --hash=<hash>         Commit hash to use (defaults to the HEAD of the
                              specified reference).
  -k, --key-element=<keyElements>
                            Elements or a specific content key to check (zero
                              or more). If not set, all current keys will be
                              checked.
  -o, --output=<outputSpec> JSON output file name or '-' for STDOUT. If not
                              set, per-key status is not reported.
  -r, --ref=<ref>           Reference name to use (default branch, if not set).
  -s, --summary             Print a summary of results to STDOUT (irrespective
                              of the --output option).
  -V, --version             Print version information and exit.

delete-catalog-tasks

Usage: nessie-server-admin-tool-runner.jar delete-catalog-tasks [-hV]
       [-B=<batchSize>] [-H=<hash>] [-r=<ref>] [-k=<keyElements>]...
       [-s=<statuses>]...
Delete persisted state of Iceberg snapshot loading tasks previously executed by
the Nessie Catalog.
  -B, --batch=<batchSize>   The max number of task IDs to process at the same
                              time.
  -h, --help                Show this help message and exit.
  -H, --hash=<hash>         Commit hash to use (defaults to the HEAD of the
                              specified reference).
  -k, --key-element=<keyElements>
                            Elements or a specific content key to process (zero
                              or more). If not set, all current keys will get
                              their snapshot tasks expired.
  -r, --ref=<ref>           Reference name to use (default branch, if not set).
  -s, --task-status=<statuses>
                            Delete tasks having these statuses (zero or more).
                              If not set, only failed tasks for matching
                              content objects are deleted.
  -V, --version             Print version information and exit.

cleanup-repository

Usage: nessie-server-admin-tool-runner.jar cleanup-repository [-hV]
       [--allow-duplicate-commit-traversal] [--dry-run]
       [--allowed-fpp=<allowedFalsePositiveProbability>]
       [--commit-rate=<resolveCommitRatePerSecond>]
       [--fpp=<falsePositiveProbability>] [--obj-count=<expectedObjCount>]
       [--obj-rate=<resolveObjRatePerSecond>]
       [--pending-objs-batch-size=<pendingObjsBatchSize>]
       [--purge-obj-rate=<purgeDeleteObjRatePerSecond>]
       [--recent-objs-ids-filter-size=<recentObjIdsFilterSize>]
       [--referenced-grace=<objReferencedGrace>]
       [--scan-obj-rate=<purgeScanObjRatePerSecond>]
Cleanup unreferenced data from Nessie's repository.
This is a two-phase implementation that first identifies the objects that are
referenced and the second phase scans the whole repository and deletes objects
that are unreferenced.
It is recommended to run this command regularly, but with appropriate rate
limits using the --commit-rate, --obj-rate, --scan-obj-rate, --purge-obj-rate
which does not overload your backend database system.
The implementation uses a bloom-filter to identify the IDs of referenced
objects. The default setting is to allow for 1000000 objects in the backend
database with an FPP of 1.0E-5. These values should serve most repositories.
However, if your repository is quite big, you should supply a higher expected
object count using the --obj-count option. If the implementation detected that
the bloom-filter would exceed the maximum allowed FPP, it would restart with a
higher number of expected objects.
In rare situations with an extremely huge amount of objects, the data
structures may require a lot of memory. The estimated heap pressure for the
contextual data structures is printed to the console.
If you are unsure whether this command works fine, specify the --dry-run option
to perform all operations except deleting objects.
      --allow-duplicate-commit-traversal
                  Allow traversal of the same commit more than once. This is
                    disabled by default.
      --allowed-fpp=<allowedFalsePositiveProbability>
                  Maximum allowed false-positive-probability to detect
                    referenced objects, defaults to 1.0E-4.
      --commit-rate=<resolveCommitRatePerSecond>
                  Allowed number of commits to process during the 'resolve'
                    phase per second. Default is unlimited.
      --dry-run   Perform all operations, but do not delete any object .
      --fpp=<falsePositiveProbability>
                  Default false-positive-probability to detect referenced
                    objects, defaults to 1.0E-5.
  -h, --help      Show this help message and exit.
      --obj-count=<expectedObjCount>
                  Number of expected objects, defaults to 1000000.
      --obj-rate=<resolveObjRatePerSecond>
                  Allowed number of objects to process during the 'resolve'
                    phase per second. Default is unlimited.
      --pending-objs-batch-size=<pendingObjsBatchSize>

      --purge-obj-rate=<purgeDeleteObjRatePerSecond>
                  Allowed number of objects to delete during the 'purge' phase
                    per second. Default is unlimited.
      --recent-objs-ids-filter-size=<recentObjIdsFilterSize>
                  Size of the filter to recognize recently processed objects.
                    This helps to reduce effort, but should be kept to a
                    reasonable number. Defaults to 100000.
      --referenced-grace=<objReferencedGrace>
                  Grace-time for newly created objects to not be deleted.
                    Default is just "now". Specified using the ISO-8601 format,
                    for example P1D (24 hours) or PT2H (2 hours) or P10D12H (10
                    * 24 + 10 hours).
      --scan-obj-rate=<purgeScanObjRatePerSecond>
                  Allowed number of objects to scan during the 'purge' phase
                    per second. Default is unlimited.
  -V, --version   Print version information and exit.

erase-repository

Usage: nessie-server-admin-tool-runner.jar erase-repository [-hV]
       [--confirmation-code=<confirmationCode>] [-r=<newDefaultBranch>]
Erase current Nessie repository (all data will be lost) and optionally
re-initialize it.
      --confirmation-code=<confirmationCode>
                  Confirmation code for erasing the repository (will be emitted
                    by this command if not set).
  -h, --help      Show this help message and exit.
  -r, --re-initialize=<newDefaultBranch>
                  Re-initialize the repository after erasure. If set, provides
                    the default branch name for the new repository.
  -V, --version   Print version information and exit.

export

Usage: nessie-server-admin-tool-runner.jar export [-hV] [--full-scan]
       [-C=<expectedCommitCount>] [--commit-batch-size=<commitBatchSize>]
       [--content-batch-size=<number>] [--export-version=<exportVersion>]
       [-F=<output-format>] [--max-file-size=<maxFileSize>]
       [--output-buffer-size=<outputBufferSize>] -p=<export-to>
       [--single-branch-current-content=<branch-name>]
       [--object-resolvers=<genericObjectResolvers>]...
Exports a Nessie repository to the local file system.
  -C, --expected-commit-count=<expectedCommitCount>
                           Expected number of commits in the repository,
                             defaults to 1000000.
      --commit-batch-size=<commitBatchSize>
                           Batch size when reading commits and their associated
                             contents, defaults to 20.
      --content-batch-size=<number>
                           Group the specified number of content objects into
                             each commit at export time. This option is ignored
                             unless --single-branch-current-content is set. The
                             default value is 100.
      --export-version=<exportVersion>
                           The export version, defaults to 3.
  -F, --output-format=<output-format>
                           Explicitly define the output format to use to the
                             export.
                           If not specified, the implementation chooses the ZIP
                             export, if --path ends in .zip, otherwise will use
                             the directory output format.
                           Possible values: ZIP, DIRECTORY
      --full-scan          Export all commits, including those that are no
                             longer reachable any named reference.Using this
                             option is _not_ recommended.
  -h, --help               Show this help message and exit.
      --max-file-size=<maxFileSize>
                           Maximum size of a file in bytes inside the export.
      --object-resolvers=<genericObjectResolvers>
                           Additional jars that provide
                             `TransferRelatedObjects` implementations.
                           Jars can be provided as file paths or as URLs.
      --output-buffer-size=<outputBufferSize>
                           Output buffer size, defaults to 32768.
  -p, --path=<export-to>   The ZIP file or directory to create with the export
                             contents.
      --single-branch-current-content=<branch-name>
                           Export only the most recent contents from the
                             specified branch.
  -V, --version            Print version information and exit.

import

Usage: nessie-server-admin-tool-runner.jar import [-ehV]
       [--commit-batch-size=<commitBatchSize>]
       [--input-buffer-size=<inputBufferSize>] -p=<import-from>
Imports a Nessie repository from the local file system.
      --commit-batch-size=<commitBatchSize>
                             Batch size when writing commits, defaults to 20.
  -e, --erase-before-import  Erase an existing repository before the import is
                               started.
                             This will delete all previously existing Nessie
                               data.
                             Using this option has no effect, if the Nessie
                               repository does not already exist.
  -h, --help                 Show this help message and exit.
      --input-buffer-size=<inputBufferSize>
                             Input buffer size, defaults to 32768.
  -p, --path=<import-from>   The ZIP file or directory to read the export from.
                             If this parameter refers to a file, the import
                               will assume that it is a ZIP file, otherwise a
                               directory.
  -V, --version              Print version information and exit.

show-licenses

Usage: nessie-server-admin-tool-runner.jar show-licenses [-hV]
Show 3rd party license information.
  -h, --help      Show this help message and exit.
  -V, --version   Print version information and exit.