Skip to content

Server Configuration

The Nessie server is configurable via properties as listed in the application.properties file.

These properties can be set when starting up the docker image in two different ways. For example, if you want to set Nessie to use the JDBC version store and provide a JDBC connection URL, you can either:

  1. Set these values via the JAVA_OPTS_APPEND option in the Docker invocation. Each setting should be inserted inside the variable’s value as -D<name>=<value> pairs:

    docker run  -p 19120:19120 \
      -e JAVA_OPTS_APPEND="-Dnessie.version.store.type=JDBC -Dquarkus.datasource.jdbc.url=jdbc:postgresql://host.com:5432/db" \
      ghcr.io/projectnessie/nessie
    
  2. Alternatively, set them via the --env (or -e) option in the Docker invocation. Each setting must be provided separately as --env NAME=value options:

    docker run -p 19120:19120 \
      --env NESSIE_VERSION_STORE_TYPE=JDBC \
      --env QUARKUS_DATASOURCE_JDBC_URL="jdbc:postgresql://host.com:5432/db" \
      ghcr.io/projectnessie/nessie
    

Note how the original property name is converted to an environment variable, e.g. nessie.version.store.type becomes NESSIE_VERSION_STORE_TYPE. The conversion is done by replacing all . with _ and converting the name to upper case. See here for more details.

For more information on docker images, see Docker image options below.

Providing secrets

Instead of providing secrets like passwords in clear text, you can also use a keystore. This functionality is provided natively via Quarkus.

Core Nessie Configuration Settings

Core Settings

Nessie server configuration to be injected into the JAX-RS application.

Property Default Value Type Description
nessie.server.default-branch main string The default branch to use if not provided by the user.
nessie.server.send-stacktrace-to-client false boolean Whether stack traces should be sent to the client in case of error. The default is false to not expose internal details for security reasons.
nessie.server.access-checks-batch-size 100 int The number of entity-checks that are grouped into a call to BatchAccessChecker. The default value is quite conservative, it is the responsibility of the operator to adjust this value according to the capabilities of the actual authz implementation. Note that the number of checks can be slightly exceeded by the implementation, depending on the call site.

Related Quarkus settings:

Property Default values Type Description
quarkus.http.port 19120 int Sets the HTTP port for the Nessie REST API endpoints.
quarkus.management.port 9000 int Sets the HTTP port for management endpoints (health, metrics, Swagger)

Info

A complete set of configuration options for Quarkus can be found on quarkus.io

Catalog and Iceberg REST Settings

Property Default Value Type Description
nessie.catalog.default-warehouse string Name of the default warehouse. This one is used when a warehouse is not specified in a query. If no default warehouse is configured and a request does not specify a warehouse, the request will fail.
nessie.catalog.warehouses.<warehouse-name> `` Map of warehouse names to warehouse configurations.
nessie.catalog.warehouses.<warehouse-name>.iceberg-config-defaults.<iceberg-property> string Iceberg config defaults specific to this warehouse. They override any defaults specified in (CatalogConfig#icebergConfigDefaults()).
nessie.catalog.warehouses.<warehouse-name>.iceberg-config-overrides.<iceberg-property> string Iceberg config overrides specific to this warehouse. They override any overrides specified in (CatalogConfig#icebergConfigOverrides()).
nessie.catalog.warehouses.<warehouse-name>.location string Location of the warehouse. Used to determine the base location of a table.
nessie.catalog.iceberg-config-defaults.<iceberg-property> string Iceberg config defaults applicable to all clients and warehouses. Any properties that are common to all iceberg clients should be included here. They will be passed to all clients on all warehouses as config defaults. These defaults can be overridden on a per-warehouse basis, see (WarehouseConfig#icebergConfigDefaults()).
nessie.catalog.iceberg-config-overrides.<iceberg-property> string Iceberg config overrides applicable to all clients and warehouses. Any properties that are common to all iceberg clients should be included here. They will be passed to all clients on all warehouses as config overrides. These overrides can be overridden on a per-warehouse basis, see (WarehouseConfig#icebergConfigOverrides()).

S3 settings

Configuration for S3 compatible object stores.

Contains the default settings to be applied to all buckets. Specific settings for each bucket can be specified via the buckets map.

All settings are optional. The defaults of these settings are defined by the AWSSDK Java client.

Property Default Value Type Description
nessie.catalog.service.s3.throttled-retry-after PT10S duration Interval after which a request is retried when S3 response with some “retry later” response.
nessie.catalog.service.s3.http.expect-continue-enabled boolean Override default behavior whether to expect an HTTP/100-Continue.
nessie.catalog.service.s3.http.connection-time-to-live duration Override default time-time of a pooled connection.
nessie.catalog.service.s3.http.connection-max-idle-time duration Override default max idle time of a pooled connection.
nessie.catalog.service.s3.http.connection-acquisition-timeout duration Override default connection acquisition timeout. This is the time a request will wait for a connection from the pool.
nessie.catalog.service.s3.http.connect-timeout duration Override the default TCP connect timeout.
nessie.catalog.service.s3.http.read-timeout duration Override the default connection read timeout.
nessie.catalog.service.s3.http.max-http-connections int Override the default maximum number of pooled connections.
nessie.catalog.service.s3.endpoint uri The default endpoint override to use, if not configured per bucket (see buckets). The endpoint must be specified for private (non-AWS) clouds, either per bucket or here.

If the endpoint URIs for the Nessie server and clients differ, this one defines the endpoint used for the Nessie server.
nessie.catalog.service.s3.external-endpoint uri When using a specific endpoint and the endpoint URIs for the Nessie server differ, you can specify the URI passed down to clients using this setting. Otherwise, clients will receive the value from the endpoint setting.
nessie.catalog.service.s3.path-style-access boolean Whether to use path-style access. If true, path-style access will be used, as in: https://<domain>/<bucket> . If false, a virtual-hosted style will be used instead, as in: https://<bucket>.<domain>. If unspecified, the default will depend on the cloud provider.
nessie.catalog.service.s3.region string The default DNS name of the region to use, if not configured per bucket. The region must be specified for AWS, either per bucket or here.
nessie.catalog.service.s3.access-key `` The default access-key-id and secret-access-key to use, if not configured per bucket. An access-key-id must be configured, either per bucket or here.
nessie.catalog.service.s3.access-key.name string
nessie.catalog.service.s3.access-key.secret string
nessie.catalog.service.s3.access-point string AWS Access point for this bucket. Access points can be used to perform S3 operations by specifying a mapping of bucket to access points. This is useful for multi-region access, cross-region access, disaster recovery, etc.

See: Access Points
nessie.catalog.service.s3.allow-cross-region-access-point boolean Authorize cross-region calls when contacting an access-point.

By default, attempting to use an access point in a different region will throw an exception. When enabled, this property allows using access points in other regions.
nessie.catalog.service.s3.sts.session-grace-period duration The time period to subtract from the S3 session credentials (assumed role credentials) expiry time to define the time when those credentials become eligible for refreshing.
nessie.catalog.service.s3.sts.session-cache-max-size int Maximum number of entries to keep in the session credentials cache (assumed role credentials).
nessie.catalog.service.s3.sts.clients-cache-max-size int Maximum number of entries to keep in the STS clients cache.
nessie.catalog.service.s3.sts.endpoint uri The Security Token Service endpoint.

This parameter must be set when running in a private (non-AWS) cloud and the catalog is configured to use S3 sessions (e.g. to use the “assume role” functionality).
nessie.catalog.service.s3.assumed-role string The ARN of the role to assume for accessing S3 data. This parameter is required for Amazon S3, but may not be required for other storage providers (e.g. Minio does not use it at all).
nessie.catalog.service.s3.session-iam-policy string IAM policy in JSON format to be used as an inline session policy (optional).

See: AssumeRoleRequest#policy()
nessie.catalog.service.s3.role-session-name string An identifier for the assumed role session. This parameter is most important in cases when the same role is assumed by different principals in different use cases.

See: AssumeRoleRequest#roleSessionName()
nessie.catalog.service.s3.external-id string An identifier for the party assuming the role. This parameter must match the external ID configured in IAM rules that govern the assume role process for the specified role-arn.

This parameter is essential in preventing the Confused Deputy problem.

See: AssumeRoleRequest#externalId()
nessie.catalog.service.s3.auth-mode REQUEST_SIGNING, ASSUME_ROLE Controls the authentication mode for Catalog clients accessing this bucket.
nessie.catalog.service.s3.client-session-duration duration A higher bound estimate of the expected duration of client “sessions” working with data in this bucket. A session, for example, is the lifetime of an Iceberg REST catalog object on the client side. This value is used for validating expiration times of credentials associated with the warehouse.

This parameter is relevant only when client-authentication-mode is ASSUME_ROLE .
nessie.catalog.service.s3.buckets.<bucket-name> `` Per-bucket configurations. The effective value for a bucket is taken from the per-bucket setting. If no per-bucket setting is present, uses the values from top-level S3 settings.
nessie.catalog.service.s3.buckets.<bucket-name>.endpoint uri Endpoint URI, required for private (non-AWS) clouds, specified either per bucket or in the top-level S3 settings.

If the endpoint URIs for the Nessie server and clients differ, this one defines the endpoint used for the Nessie server.
nessie.catalog.service.s3.buckets.<bucket-name>.external-endpoint uri When using a specific endpoint (endpoint) and the endpoint URIs for the Nessie server differ, you can specify the URI passed down to clients using this setting. Otherwise, clients will receive the value from the endpoint setting.
nessie.catalog.service.s3.buckets.<bucket-name>.path-style-access boolean Whether to use path-style access. If true, path-style access will be used, as in: https://<domain>/<bucket> . If false, a virtual-hosted style will be used instead, as in: https://<bucket>.<domain>. If unspecified, the default will depend on the cloud provider.
nessie.catalog.service.s3.buckets.<bucket-name>.region string DNS name of the region, required for AWS. The region must be specified for AWS, either per bucket or in the top-level S3 settings.
nessie.catalog.service.s3.buckets.<bucket-name>.access-key `` An access-key-id and secret-access-key must be configured using the name and secret fields, either per bucket or in the top-level S3 settings. For STS, this defines the Access Key ID and Secret Key ID to be used as a basic credential for obtaining temporary session credentials.
nessie.catalog.service.s3.buckets.<bucket-name>.access-key.name string
nessie.catalog.service.s3.buckets.<bucket-name>.access-key.secret string
nessie.catalog.service.s3.buckets.<bucket-name>.access-point string AWS Access point for this bucket. Access points can be used to perform S3 operations by specifying a mapping of bucket to access points. This is useful for multi-region access, cross-region access, disaster recovery, etc.

See: Access Points
nessie.catalog.service.s3.buckets.<bucket-name>.allow-cross-region-access-point boolean Authorize cross-region calls when contacting an access-point.

By default, attempting to use an access point in a different region will throw an exception. When enabled, this property allows using access points in other regions.
nessie.catalog.service.s3.buckets.<bucket-name>.sts-endpoint uri The Security Token Service endpoint.

This parameter must be set when running in a private (non-AWS) cloud and the catalog is configured to use S3 sessions (e.g. to use the “assume role” functionality).
nessie.catalog.service.s3.buckets.<bucket-name>.role-arn string The ARN of the role to assume for accessing S3 data. This parameter is required for Amazon S3, but may not be required for other storage providers (e.g. Minio does not use it at all).
nessie.catalog.service.s3.buckets.<bucket-name>.session-iam-policy string IAM policy in JSON format to be used as an inline session policy (optional).

See: AssumeRoleRequest#policy()
nessie.catalog.service.s3.buckets.<bucket-name>.role-session-name string An identifier for the assumed role session. This parameter is most important in cases when the same role is assumed by different principals in different use cases.

See: AssumeRoleRequest#roleSessionName()
nessie.catalog.service.s3.buckets.<bucket-name>.external-id string An identifier for the party assuming the role. This parameter must match the external ID configured in IAM rules that govern the assume role process for the specified role-arn.

This parameter is essential in preventing the Confused Deputy problem.

See: AssumeRoleRequest#externalId()
nessie.catalog.service.s3.buckets.<bucket-name>.auth-mode REQUEST_SIGNING, ASSUME_ROLE Controls the authentication mode for Catalog clients accessing this bucket.
nessie.catalog.service.s3.buckets.<bucket-name>.client-session-duration duration A higher bound estimate of the expected duration of client “sessions” working with data in this bucket. A session, for example, is the lifetime of an Iceberg REST catalog object on the client side. This value is used for validating expiration times of credentials associated with the warehouse.

This parameter is relevant only when client-authentication-mode is ASSUME_ROLE .

Google Cloud Storage settings

Note

Support for GCS is experimental.

Configuration for Google Cloud Storage (GCS) object stores.

Contains the default settings to be applied to all buckets. Specific settings for each bucket can be specified via the buckets map.

All settings are optional. The defaults of these settings are defined by the Google Java SDK client.

Property Default Value Type Description
nessie.catalog.service.gcs.buckets.<bucket-name> `` Per-bucket configurations. The effective value for a bucket is taken from the per-bucket setting. If no per-bucket setting is present, uses the defaults from the top-level GCS settings.
nessie.catalog.service.gcs.buckets.<bucket-name>.host uri The default endpoint override to use. The endpoint is almost always used for testing purposes.

If the endpoint URIs for the Nessie server and clients differ, this one defines the endpoint used for the Nessie server.
nessie.catalog.service.gcs.buckets.<bucket-name>.external-host uri When using a specific endpoint, see host, and the endpoint URIs for the Nessie server differ, you can specify the URI passed down to clients using this setting. Otherwise, clients will receive the value from the host setting.
nessie.catalog.service.gcs.buckets.<bucket-name>.user-project string Optionally specify the user project (Google term).
nessie.catalog.service.gcs.buckets.<bucket-name>.read-timeout duration Override the default read timeout.
nessie.catalog.service.gcs.buckets.<bucket-name>.connect-timeout duration Override the default connection timeout.
nessie.catalog.service.gcs.buckets.<bucket-name>.project-id string The Google project ID.
nessie.catalog.service.gcs.buckets.<bucket-name>.quota-project-id string The Google quota project ID.
nessie.catalog.service.gcs.buckets.<bucket-name>.client-lib-token string The Google client lib token.
nessie.catalog.service.gcs.buckets.<bucket-name>.auth-type NONE, USER, SERVICE_ACCOUNT, ACCESS_TOKEN The authentication type to use.
nessie.catalog.service.gcs.buckets.<bucket-name>.auth-credentials-json string Auth-credentials-JSON, this value is the name of the credential to use, the actual credential is defined via secrets.
nessie.catalog.service.gcs.buckets.<bucket-name>.oauth2-token `` OAuth2 token, this value is the name of the credential to use, the actual credential is defined via secrets.
nessie.catalog.service.gcs.buckets.<bucket-name>.oauth2-token.token string
nessie.catalog.service.gcs.buckets.<bucket-name>.oauth2-token.expires-at instant
nessie.catalog.service.gcs.buckets.<bucket-name>.max-attempts int Override the default maximum number of attempts.
nessie.catalog.service.gcs.buckets.<bucket-name>.logical-timeout duration Override the default logical request timeout.
nessie.catalog.service.gcs.buckets.<bucket-name>.total-timeout duration Override the default total timeout.
nessie.catalog.service.gcs.buckets.<bucket-name>.initial-retry-delay duration Override the default initial retry delay.
nessie.catalog.service.gcs.buckets.<bucket-name>.max-retry-delay duration Override the default maximum retry delay.
nessie.catalog.service.gcs.buckets.<bucket-name>.retry-delay-multiplier double Override the default retry delay multiplier.
nessie.catalog.service.gcs.buckets.<bucket-name>.initial-rpc-timeout duration Override the default initial RPC timeout.
nessie.catalog.service.gcs.buckets.<bucket-name>.max-rpc-timeout duration Override the default maximum RPC timeout.
nessie.catalog.service.gcs.buckets.<bucket-name>.rpc-timeout-multiplier double Override the default RPC timeout multiplier.
nessie.catalog.service.gcs.buckets.<bucket-name>.read-chunk-size int The read chunk size in bytes.
nessie.catalog.service.gcs.buckets.<bucket-name>.write-chunk-size int The write chunk size in bytes.
nessie.catalog.service.gcs.buckets.<bucket-name>.delete-batch-size int The delete batch size.
nessie.catalog.service.gcs.buckets.<bucket-name>.encryption-key string Customer-supplied AES256 key for blob encryption when writing.
nessie.catalog.service.gcs.buckets.<bucket-name>.decryption-key string Customer-supplied AES256 key for blob decryption when reading.
nessie.catalog.service.gcs.host uri The default endpoint override to use. The endpoint is almost always used for testing purposes.

If the endpoint URIs for the Nessie server and clients differ, this one defines the endpoint used for the Nessie server.
nessie.catalog.service.gcs.external-host uri When using a specific endpoint, see host, and the endpoint URIs for the Nessie server differ, you can specify the URI passed down to clients using this setting. Otherwise, clients will receive the value from the host setting.
nessie.catalog.service.gcs.project-id string The Google project ID.
nessie.catalog.service.gcs.quota-project-id string The Google quota project ID.
nessie.catalog.service.gcs.client-lib-token string The Google client lib token.
nessie.catalog.service.gcs.auth-type NONE, USER, SERVICE_ACCOUNT, ACCESS_TOKEN The authentication type to use.
nessie.catalog.service.gcs.auth-credentials-json string Auth-credentials-JSON, this value is the name of the credential to use, the actual credential is defined via secrets.
nessie.catalog.service.gcs.oauth2-token `` OAuth2 token, this value is the name of the credential to use, the actual credential is defined via secrets.
nessie.catalog.service.gcs.oauth2-token.token string
nessie.catalog.service.gcs.oauth2-token.expires-at instant
nessie.catalog.service.gcs.max-attempts int Override the default maximum number of attempts.
nessie.catalog.service.gcs.logical-timeout duration Override the default logical request timeout.
nessie.catalog.service.gcs.total-timeout duration Override the default total timeout.
nessie.catalog.service.gcs.initial-retry-delay duration Override the default initial retry delay.
nessie.catalog.service.gcs.max-retry-delay duration Override the default maximum retry delay.
nessie.catalog.service.gcs.retry-delay-multiplier double Override the default retry delay multiplier.
nessie.catalog.service.gcs.initial-rpc-timeout duration Override the default initial RPC timeout.
nessie.catalog.service.gcs.max-rpc-timeout duration Override the default maximum RPC timeout.
nessie.catalog.service.gcs.rpc-timeout-multiplier double Override the default RPC timeout multiplier.
nessie.catalog.service.gcs.read-chunk-size int The read chunk size in bytes.
nessie.catalog.service.gcs.write-chunk-size int The write chunk size in bytes.
nessie.catalog.service.gcs.delete-batch-size int The delete batch size.
nessie.catalog.service.gcs.encryption-key string Customer-supplied AES256 key for blob encryption when writing.
nessie.catalog.service.gcs.decryption-key string Customer-supplied AES256 key for blob decryption when reading.
nessie.catalog.service.gcs.user-project string Optionally specify the user project (Google term).
nessie.catalog.service.gcs.read-timeout duration Override the default read timeout.
nessie.catalog.service.gcs.connect-timeout duration Override the default connection timeout.

ADLS settings

Note

Support for ADLS is experimental.

Configuration for ADLS Gen2 object stores.

Contains the default settings to be applied to all “file systems” (think: buckets). Specific settings for each file system can be specified via the file-systems map.

All settings are optional. The defaults of these settings are defined by the ADLS client supplied by Microsoft. See Azure SDK for Java documentation

Property Default Value Type Description
nessie.catalog.service.adls.max-http-connections int Override the default maximum number of HTTP connections that Nessie can use against all ADLS Gen2 object stores.
nessie.catalog.service.adls.connect-timeout duration Override the default TCP connect timeout for HTTP connections against ADLS Gen2 object stores.
nessie.catalog.service.adls.connection-idle-timeout duration Override the default idle timeout for HTTP connections.
nessie.catalog.service.adls.write-timeout duration Override the default write timeout for HTTP connections.
nessie.catalog.service.adls.read-timeout duration Override the default read timeout for HTTP connections.
nessie.catalog.service.adls.response-timeout duration Override the default response timeout for HTTP connections.
nessie.catalog.service.adls.configuration.<name> string Custom settings for the ADLS Java client.
nessie.catalog.service.adls.write-block-size long Override the default write block size used when writing to ADLS.
nessie.catalog.service.adls.read-block-size int Override the default read block size used when writing to ADLS.
nessie.catalog.service.adls.account `` Fully-qualified account name, e.g. "myaccount.dfs.core.windows.net" and account key, configured using the name and secret fields. If not specified, it will be queried via the configured credentials provider.
nessie.catalog.service.adls.account.name string
nessie.catalog.service.adls.account.secret string
nessie.catalog.service.adls.sas-token string SAS token to access the ADLS file system.
nessie.catalog.service.adls.endpoint string Define a custom HTTP endpoint. In case clients need to use a different URI, use the .external-endpoint setting.
nessie.catalog.service.adls.external-endpoint string Define a custom HTTP endpoint, this value is used by clients.
nessie.catalog.service.adls.retry-policy NONE, EXPONENTIAL_BACKOFF, FIXED_DELAY Configure the retry strategy.
nessie.catalog.service.adls.max-retries int Mandatory, if any retry-policy is configured.
nessie.catalog.service.adls.try-timeout duration Mandatory, if any retry-policy is configured.
nessie.catalog.service.adls.retry-delay duration Mandatory, if any retry-policy is configured.
nessie.catalog.service.adls.max-retry-delay duration Mandatory, if EXPONENTIAL_BACKOFF is configured.
nessie.catalog.service.adls.file-systems.<filesystem-name> `` ADLS file-system specific options, per file system name.
nessie.catalog.service.adls.file-systems.<filesystem-name>.account `` Fully-qualified account name, e.g. "myaccount.dfs.core.windows.net" and account key, configured using the name and secret fields. If not specified, it will be queried via the configured credentials provider.
nessie.catalog.service.adls.file-systems.<filesystem-name>.account.name string
nessie.catalog.service.adls.file-systems.<filesystem-name>.account.secret string
nessie.catalog.service.adls.file-systems.<filesystem-name>.sas-token string SAS token to access the ADLS file system.
nessie.catalog.service.adls.file-systems.<filesystem-name>.endpoint string Define a custom HTTP endpoint. In case clients need to use a different URI, use the .external-endpoint setting.
nessie.catalog.service.adls.file-systems.<filesystem-name>.external-endpoint string Define a custom HTTP endpoint, this value is used by clients.
nessie.catalog.service.adls.file-systems.<filesystem-name>.retry-policy NONE, EXPONENTIAL_BACKOFF, FIXED_DELAY Configure the retry strategy.
nessie.catalog.service.adls.file-systems.<filesystem-name>.max-retries int Mandatory, if any retry-policy is configured.
nessie.catalog.service.adls.file-systems.<filesystem-name>.try-timeout duration Mandatory, if any retry-policy is configured.
nessie.catalog.service.adls.file-systems.<filesystem-name>.retry-delay duration Mandatory, if any retry-policy is configured.
nessie.catalog.service.adls.file-systems.<filesystem-name>.max-retry-delay duration Mandatory, if EXPONENTIAL_BACKOFF is configured.

Advanced catalog settings

Property Default Value Type Description
nessie.catalog.service.imports.max-concurrent 32 int Advanced property, defines the maximum number of concurrent imports from object stores.
nessie.catalog.service.tasks.threads.max -1 int Advanced property, defines the maximum number of threads for async tasks like imports.
nessie.catalog.service.tasks.threads.keep-alive PT2S duration Advanced thread pool setting for async tasks like imports.
nessie.catalog.service.tasks.minimum-delay PT0.001S duration Advanced thread pool setting for async tasks like imports.
nessie.catalog.service.race.wait.min PT0.005S duration Advanced thread pool setting for async tasks like imports.
nessie.catalog.service.race.wait.max PT0.250S duration Advanced thread pool setting for async tasks like imports.

Version Store Settings

Version store configuration.

Property Default Value Type Description
nessie.version.store.type IN_MEMORY IN_MEMORY, ROCKSDB, DYNAMODB, MONGODB, CASSANDRA, JDBC, BIGTABLE Sets which type of version store to use by Nessie.
nessie.version.store.events.enable true boolean Sets whether events for the version-store are enabled. In order for events to be published, it’s not enough to enable them in the configuration; you also need to provide at least one implementation of Nessie’s EventListener SPI.

Support for the database specific implementations

Database Status Configuration value for nessie.version.store.type Notes
“in memory” only for development and local testing IN_MEMORY Do not use for any serious use case.
RocksDB production, single node only ROCKSDB
Google BigTable production BIGTABLE
MongoDB production MONGODB
Amazon DynamoDB beta, only tested against the simulator DYNAMODB
PostgreSQL production JDBC
H2 only for development and local testing JDBC Do not use for any serious use case.
MariaDB experimental, feedback welcome JDBC
MySQL experimental, feedback welcome JDBC Works by connecting the MariaDB driver to a MySQL server.
CockroachDB experimental, known issues JDBC Known to raise user-facing “write too old” errors under contention.
Apache Cassandra experimental, known issues CASSANDRA Known to raise user-facing errors due to Cassandra’s concept of letting the driver timeout too early, or database timeouts.
ScyllaDB experimental, known issues CASSANDRA Known to raise user-facing errors due to Cassandra’s concept of letting the driver timeout too early, or database timeouts. Known to be slow in container based testing. Unclear how good Scylla’s LWT implementation performs.

BigTable Version Store Settings

When setting nessie.version.store.type=BIGTABLE which enables Google BigTable as the version store used by the Nessie server, the following configurations are applicable.

Property Default Value Type Description
nessie.version.store.persist.bigtable.instance-id nessie string Sets the instance-id to be used with Google BigTable.
nessie.version.store.persist.bigtable.emulator-port 8086 int When using the BigTable emulator, used to configure the port.
nessie.version.store.persist.bigtable.enable-telemetry true boolean Enables telemetry with OpenCensus.
nessie.version.store.persist.bigtable.table-prefix string Prefix for tables, default is no prefix.
nessie.version.store.persist.bigtable.no-table-admin-client false boolean
nessie.version.store.persist.bigtable.app-profile-id string Sets the profile-id to be used with Google BigTable.
nessie.version.store.persist.bigtable.quota-project-id string Google BigTable quote project ID (optional).
nessie.version.store.persist.bigtable.endpoint string Google BigTable endpoint (if not default).
nessie.version.store.persist.bigtable.mtls-endpoint string Google BigTable MTLS endpoint (if not default).
nessie.version.store.persist.bigtable.emulator-host string When using the BigTable emulator, used to configure the host.
nessie.version.store.persist.bigtable.jwt-audience-mapping.<mapping> string Google BigTable JWT audience mappings (if necessary).
nessie.version.store.persist.bigtable.initial-retry-delay duration Initial retry delay.
nessie.version.store.persist.bigtable.max-retry-delay duration Max retry-delay.
nessie.version.store.persist.bigtable.retry-delay-multiplier double
nessie.version.store.persist.bigtable.max-attempts int Maximum number of attempts for each Bigtable API call (including retries).
nessie.version.store.persist.bigtable.initial-rpc-timeout duration Initial RPC timeout.
nessie.version.store.persist.bigtable.max-rpc-timeout duration
nessie.version.store.persist.bigtable.rpc-timeout-multiplier double
nessie.version.store.persist.bigtable.total-timeout duration Total timeout (including retries) for Bigtable API calls.
nessie.version.store.persist.bigtable.min-channel-count int Minimum number of gRPC channels. Refer to Google docs for details.
nessie.version.store.persist.bigtable.max-channel-count int Maximum number of gRPC channels. Refer to Google docs for details.
nessie.version.store.persist.bigtable.initial-channel-count int Initial number of gRPC channels. Refer to Google docs for details
nessie.version.store.persist.bigtable.min-rpcs-per-channel int Minimum number of RPCs per channel. Refer to Google docs for details.
nessie.version.store.persist.bigtable.max-rpcs-per-channel int Maximum number of RPCs per channel. Refer to Google docs for details.

Related Quarkus settings:

Property Default values Type Description
quarkus.google.cloud.project-id String The Google project ID, mandatory.
(Google authentication) See Quarkiverse for documentation.

Info

A complete set of Google Cloud & BigTable configuration options for Quarkus can be found on Quarkiverse.

JDBC Version Store Settings

Setting nessie.version.store.type=JDBC enables transactional/RDBMS as the version store used by the Nessie server.

Configuration of the datastore will be done by Quarkus and depends on many factors, such as the actual database to use. The property nessie.version.store.persist.jdbc.datasource will be used to select one of the built-in datasources; currently supported values are: postgresql (which activates the PostgresQL driver), mariadb (which activates the MariaDB driver), and mysql (which targets MySQL backends, but using the MariaDB driver).

For example, to configure a PostgresQL connection, the following configuration should be used:

  • nessie.version.store.type=JDBC
  • nessie.version.store.persist.jdbc.datasource=postgresql
  • quarkus.datasource.postgresql.jdbc.url=jdbc:postgresql://localhost:5432/my_database
  • quarkus.datasource.postgresql.username=<your username>
  • quarkus.datasource.postgresql.password=<your password>
  • Other PostgresQL-specific properties can be set using quarkus.datasource.postgresql.*

To connect to a MariaDB database instead, the following configuration should be used:

  • nessie.version.store.type=JDBC
  • nessie.version.store.persist.jdbc.datasource=mariadb
  • quarkus.datasource.mariadb.jdbc.url=jdbc:mariadb://localhost:3306/my_database
  • quarkus.datasource.mariadb.username=<your username>
  • quarkus.datasource.mariadb.password=<your password>
  • Other MariaDB-specific properties can be set using quarkus.datasource.mariadb.*

To connect to a MySQL database instead, the following configuration should be used:

  • nessie.version.store.type=JDBC
  • nessie.version.store.persist.jdbc.datasource=mysql
  • quarkus.datasource.mysql.jdbc.url=jdbc:mysql://localhost:3306/my_database
  • quarkus.datasource.mysql.username=<your username>
  • quarkus.datasource.mysql.password=<your password>
  • Other MySQL-specific properties can be set using quarkus.datasource.mysql.*

To connect to an H2 in-memory database, the following configuration should be used (note that H2 is not recommended for production):

  • nessie.version.store.type=JDBC
  • nessie.version.store.persist.jdbc.datasource=h2

Note: for MySQL, the MariaDB driver is used, as it is compatible with MySQL. You can use either jdbc:mysql or jdbc:mariadb as the URL prefix.

A complete set of JDBC configuration options can be found on quarkus.io.

Property Default Value Type Description
nessie.version.store.persist.jdbc.datasource string The name of the datasource to use. Must correspond to a configured datasource under quarkus.datasource.<name> . Supported values are: postgresql mariadb, mysql and h2. If not provided, the default Quarkus datasource, defined using the quarkus.datasource.* configuration keys, will be used (the corresponding driver is PostgresQL). Note that it is recommended to define “named” JDBC datasources, see Quarkus JDBC config reference .
nessie.version.store.persist.jdbc.catalog string The JDBC catalog name.

Deprecated This setting has never worked as expected and is now ineffective. The catalog must be specified directly in the JDBC URL using the option quarkus.datasource.*.jdbc.url .
nessie.version.store.persist.jdbc.schema string The JDBC schema name.

Deprecated This setting has never worked as expected and is now ineffective. The schema must be specified directly in the JDBC URL using the option quarkus.datasource.*.jdbc.url .

RocksDB Version Store Settings

When setting nessie.version.store.type=ROCKSDB which enables RocksDB as the version store used by the Nessie server, the following configurations are applicable.

Property Default Value Type Description
nessie.version.store.persist.rocks.database-path /tmp/nessie-rocksdb-store path Sets RocksDB storage path.

Cassandra Version Store Settings

When setting nessie.version.store.type=CASSANDRA which enables Apache Cassandra or ScyllaDB as the version store used by the Nessie server, the following configurations are applicable.

Property Default Value Type Description
nessie.version.store.cassandra.dml-timeout PT3S duration Timeout used for queries and updates.
nessie.version.store.cassandra.ddl-timeout PT5S duration Timeout used when creating tables.

Related Quarkus settings:

Property Default values Type Description
quarkus.cassandra.keyspace String The Cassandra keyspace to use.
quarkus.cassandra.contact-points String The Cassandra contact points, see Quarkus docs.
quarkus.cassandra.local-datacenter String The Cassandra local datacenter to use, see Quarkus docs.
quarkus.cassandra.auth.username String Cassandra authentication username, see Quarkus docs.
quarkus.cassandra.auth.password String Cassandra authentication password, see Quarkus docs.
quarkus.cassandra.health.enabled false boolean See Quarkus docs.

Info

A complete set of the Quarkus Cassandra extension configuration options can be found on quarkus.io

DynamoDB Version Store Settings

When setting nessie.version.store.type=DYNAMODB which enables DynamoDB as the version store used by the Nessie server, the following configurations are applicable.

Property Default Value Type Description
nessie.version.store.persist.dynamodb.table-prefix string Prefix for tables, default is no prefix.

Related Quarkus settings:

Property Default values Type Description
quarkus.dynamodb.aws.region String Sets DynamoDB AWS region.
quarkus.dynamodb.aws.credentials.type default String See Quarkiverse docs for possible values. Sets the credentials provider that should be used to authenticate with AWS.
quarkus.dynamodb.endpoint-override URI Sets the endpoint URI with which the SDK should communicate. If not specified, an appropriate endpoint to be used for the given service and region.
quarkus.dynamodb.sync-client.type url String Possible values are: url, apache. Sets the type of the sync HTTP client implementation

Info

A complete set of DynamoDB configuration options for Quarkus can be found on Quarkiverse.

MongoDB Version Store Settings

When setting nessie.version.store.type=MONGODB which enables MongoDB as the version store used by the Nessie server, the following configurations are applicable in combination with nessie.version.store.type.

Related Quarkus settings:

Property Default values Type Description
quarkus.mongodb.database String Sets MongoDB database name.
quarkus.mongodb.connection-string String Sets MongoDB connection string.

Info

A complete set of MongoDB configuration options for Quarkus can be found on quarkus.io.

In-Memory Version Store Settings

No special configuration options for this store type.

Version Store Advanced Settings

The following configurations are advanced configurations for version stores to configure how Nessie will store the data into the configured data store:

Usually, only the cache-capacity should be adjusted to the amount of the Java heap “available” for the cache. The default is conservative, bumping the cache size is recommended.

Property Default Value Type Description
nessie.version.store.persist.repository-id (empty) string Nessie repository ID (optional) that identifies a particular Nessie storage repository.

When remote (shared) database is used, multiple Nessie repositories may co-exist in the same database (and in the same schema). In that case this configuration parameter can be used to distinguish those repositories.
nessie.version.store.persist.commit-retries 2147483647 int maximum retries for CAS-like operations. Used when committing to Nessie, when the HEAD (or tip) of a branch changed during the commit, this value defines the maximum number of retries. Default means unlimited.

See: #retryMaxSleepMillis()
nessie.version.store.persist.commit-timeout-millis 5000 long Timeout for CAS-like operations in milliseconds.

See: #retryMaxSleepMillis()
nessie.version.store.persist.retry-initial-sleep-millis-lower 5 long When the commit logic has to retry an operation due to a concurrent, conflicting update to the database state, usually a concurrent change to a branch HEAD, this parameter defines the initial lower bound of the exponential backoff.

See: #retryMaxSleepMillis()
nessie.version.store.persist.retry-initial-sleep-millis-upper 25 long When the commit logic has to retry an operation due to a concurrent, conflicting update to the database state, usually a concurrent change to a branch HEAD, this parameter defines the initial upper bound of the exponential backoff.

See: #retryMaxSleepMillis()
nessie.version.store.persist.retry-max-sleep-millis 250 long When the commit logic has to retry an operation due to a concurrent, conflicting update to the database state, usually a concurrent change to a branch HEAD, this parameter defines the maximum sleep time. Each retry doubles the lower and upper bounds of the random sleep time, unless the doubled upper bound would exceed the value of this configuration property.

See: #retryInitialSleepMillisUpper()
nessie.version.store.persist.parents-per-commit 20 int Number of parent-commit-hashes stored in each commit. This is used to allow bulk-fetches when accessing the commit log.
nessie.version.store.persist.max-serialized-index-size 204800 int The maximum allowed serialized size of the content index structure in a reference index segment. This value is used to determine, when elements in a reference index segment need to be split.

Note: this value must be smaller than a database’s hard item/row size limit.
nessie.version.store.persist.max-incremental-index-size 51200 int The maximum allowed serialized size of the content index structure in a Nessie commit, called incremental index. This value is used to determine, when elements in an incremental index, which were kept from previous commits, need to be pushed to a new or updated reference index.

Note: this value must be smaller than a database’s hard item/row size limit.
nessie.version.store.persist.max-reference-stripes-per-commit 50 int Maximum number of referenced index objects stored inside commit objects.

If the external reference index for this commit consists of up to this amount of stripes, the references to the stripes will be stored inside the commit object. If there are more than this amount of stripes, an external index segment will be created instead.
nessie.version.store.persist.assumed-wall-clock-drift-micros 5000000 long Assumed wall-clock drift between multiple Nessie instances in microseconds.
nessie.version.store.persist.ref-previous-head-count 20 int Named references keep a history of up to this amount of previous HEAD pointers, and up to the configured age.
nessie.version.store.persist.ref-previous-head-time-span-seconds 300 long Named references keep a history of previous HEAD pointers with this age in seconds, and up to the configured amount.
nessie.version.store.persist.cache-capacity-mb int Fixed amount of heap used to cache objects, set to 0 to disable the cache entirely. Must not be used with fractional cache sizing. See description for cache-capacity-fraction-of-heap for the default value.
nessie.version.store.persist.cache-capacity-fraction-min-size-mb int When using fractional cache sizing, this amount in MB is the minimum cache size.
nessie.version.store.persist.cache-capacity-fraction-of-heap double Fraction of Java’s max heap size to use for cache objects, set to 0 to disable. Must not be used with fixed cache sizing. If neither this value nor a fixed size is configured, a default of .7 (70%) is assumed.
nessie.version.store.persist.cache-capacity-fraction-adjust-mb int When using fractional cache sizing, this amount in MB of the heap will always be “kept free” when calculating the cache size.
nessie.version.store.persist.reference-cache-ttl duration Defines the duration how long references shall be kept in the cache. Enables reference-caching, if configured with a positive duration value, defaults to not cache references. If reference caching is enabled, it is highly recommended to also enable negative reference caching.

This is an experimental feature, currently only for single Nessie node deployments! If in doubt, leave this un-configured!
nessie.version.store.persist.reference-cache-negative-ttl duration Defines the duration how long sentinels for non-existing references shall be kept in the cache (negative reference caching). Enabled, if configured with a positive duration value, default is not enabled. If reference caching is enabled, it is highly recommended to also enable negative reference caching.

This is an experimental feature, currently only for single Nessie node deployments! If in doubt, leave this un-configured!
nessie.version.store.persist.cache-invalidations.service-names list of string Host names or IP addresses or kubernetes headless-service name of all Nessie server instances accessing the same repository.

This value is automatically configured via the Nessie Helm chart or the Kubernetes operator (not released yet), you don’t need any additional configuration for distributed cache invalidations - it’s setup and configured automatically. If you have your own Helm chart or custom deployment, make sure to configure the IPs of all Nessie instances here.

Names that start with an equal sign are not resolved but used “as is”.
nessie.version.store.persist.cache-invalidations.valid-tokens list of string List of cache-invalidation tokens to authenticate incoming cache-invalidation messages.
nessie.version.store.persist.cache-invalidations.uri /nessie-management/cache-coherency string URI of the cache-invalidation endpoint, only available on the Quarkus management port, defaults to 9000.
nessie.version.store.persist.cache-invalidations.service-name-lookup-interval PT10S duration Interval of service-name lookups to resolve the service names (#cacheInvalidationServiceNames()) into IP addresses.
nessie.version.store.persist.cache-invalidations.batch-size 20 int
nessie.version.store.persist.cache-invalidations.request-timeout duration

Authentication settings

Configuration for Nessie authentication settings.

Property Default Value Type Description
nessie.server.authentication.enabled false boolean Enable Nessie authentication.

Related Quarkus settings:

Property Default values Type Description
quarkus.oidc.auth-server-url String Sets the base URL of the OpenID Connect (OIDC) server if nessie.server.authentication.enabled=true
quarkus.oidc.client-id String Sets client-id of the application if nessie.server.authentication.enabled=true. Each application has a client-id that is used to identify the application.

Authorization settings

Configuration for Nessie authorization settings.

Property Default Value Type Description
nessie.server.authorization.enabled false boolean Enable Nessie authorization.
nessie.server.authorization.type CEL string Sets the authorizer type to use.
nessie.server.authorization.rules.<name> string CEL authorization rules where the key represents the rule id and the value the CEL expression.

Metrics

Metrics are published using prometheus and can be collected via standard methods. See: Prometheus.

Traces

Since Nessie 0.46.0, traces are published using OpenTelemetry. See Using OpenTelemetry in the Quarkus documentation.

In order for the server to enable OpenTelemetry and publish its traces, the quarkus.otel.exporter.otlp.traces.endpoint property must be defined. Its value must be a valid collector endpoint URL, with either http:// or https:// scheme. The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port (by default 4317), e.g. “http://otlp-collector:4317”. If this property is not set, the server will not publish traces.

Alternatively, it’s possible to forcibly disable OpenTelemetry at runtime by setting the following property: quarkus.otel.sdk.disabled=true.

Troubleshooting traces

If the server is unable to publish traces, check first for a log warning message like the following:

SEVERE [io.ope.exp.int.grp.OkHttpGrpcExporter] (OkHttp http://localhost:4317/...) Failed to export spans. 
The request could not be executed. Full error message: Failed to connect to localhost/0:0:0:0:0:0:0:1:4317

This means that the server is unable to connect to the collector. Check that the collector is running and that the URL is correct.

Swagger UI

The Swagger UI allows for testing the REST API and reading the API docs. It is available via localhost:9000/q/swagger-ui

Docker image options

By default, Nessie listens on port 19120. To expose that port on the host, use -p 19120:19120. To expose that port on a different port on the host system, use the -p option and map the internal port to some port on the host. For example, to expose Nessie on port 8080 of the host system, use the following command:

docker run -p 8080:19120 ghcr.io/projectnessie/nessie

Then you can browse Nessie’s UI on the host by pointing your browser to http://localhost:8080.

Note: this doesn’t change the port Nessie listens on, it only changes the port on the host system that is mapped to the port Nessie listens on. Nessie still listens on port 19120 inside the container. If you want to change the port Nessie listens on, you can use the QUARKUS_HTTP_PORT environment variable. For example, to make Nessie listen on port 8080 inside the container, and expose it to the host system also on 8080, use the following command:

docker run -p 8080:8080 -e QUARKUS_HTTP_PORT=8080 ghcr.io/projectnessie/nessie

Nessie Docker image types

Nessie publishes a Java based multiplatform (for amd64, arm64, ppc64le, s390x) image running on OpenJDK 17.

Advanced Docker image tuning (Java images only)

There are many environment variables available to configure the Docker image. If in doubt, leave everything at its default. You can configure the behavior using the following environment variables. They come from the base image used by Nessie, ubi9/openjdk-21-runtime. The extensive list of supported environment variables can be found here.

Examples

Example docker run option
Using another GC -e GC_CONTAINER_OPTIONS="-XX:+UseShenandoahGC" lets Nessie use Shenandoah GC instead of the default parallel GC.
Set the Java heap size to a fixed amount -e JAVA_OPTS_APPEND="-Xms8g -Xmx8g" lets Nessie use a Java heap of 8g.

Reference

Environment variable Description
JAVA_OPTS or JAVA_OPTIONS NOT RECOMMENDED. JVM options passed to the java command (example: “-verbose:class”). Setting this variable will override all options set by any of the other variables in this table. To pass extra settings, use JAVA_OPTS_APPEND instead.
JAVA_OPTS_APPEND User specified Java options to be appended to generated options in JAVA_OPTS (example: “-Dsome.property=foo”).
JAVA_TOOL_OPTIONS This variable is defined and honored by all OpenJDK distros, see here. Options defined here take precedence over all else; using this variable is generally not necessary, but can be useful e.g. to enforce JVM startup parameters, to set up remote debug, or to define JVM agents.
JAVA_MAX_MEM_RATIO Is used when no -Xmx option is given in JAVA_OPTS. This is used to calculate a default maximal heap memory based on a containers restriction. If used in a container without any memory constraints for the container then this option has no effect. If there is a memory constraint then -Xmx is set to a ratio of the container available memory as set here. The default is 50 which means 50% of the available memory is used as an upper boundary. You can skip this mechanism by setting this value to 0 in which case no -Xmx option is added.
JAVA_INITIAL_MEM_RATIO Is used when no -Xms option is given in JAVA_OPTS. This is used to calculate a default initial heap memory based on the maximum heap memory. If used in a container without any memory constraints for the container then this option has no effect. If there is a memory constraint then -Xms is set to a ratio of the -Xmx memory as set here. The default is 25 which means 25% of the -Xmx is used as the initial heap size. You can skip this mechanism by setting this value to 0 in which case no -Xms option is added (example: “25”)
JAVA_MAX_INITIAL_MEM Is used when no -Xms option is given in JAVA_OPTS. This is used to calculate the maximum value of the initial heap memory. If used in a container without any memory constraints for the container then this option has no effect. If there is a memory constraint then -Xms is limited to the value set here. The default is 4096MB which means the calculated value of -Xms never will be greater than 4096MB. The value of this variable is expressed in MB (example: “4096”)
JAVA_DIAGNOSTICS Set this to get some diagnostics information to standard output when things are happening. This option, if set to true, will set -XX:+UnlockDiagnosticVMOptions. Disabled by default (example: “true”).
JAVA_DEBUG If set remote debugging will be switched on. Disabled by default (example: true”).
JAVA_DEBUG_PORT Port used for remote debugging. Defaults to 5005 (example: “8787”).
CONTAINER_CORE_LIMIT A calculated core limit as described in https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt. (example: “2”)
CONTAINER_MAX_MEMORY Memory limit given to the container (example: “1024”).
GC_MIN_HEAP_FREE_RATIO Minimum percentage of heap free after GC to avoid expansion.(example: “20”)
GC_MAX_HEAP_FREE_RATIO Maximum percentage of heap free after GC to avoid shrinking.(example: “40”)
GC_TIME_RATIO Specifies the ratio of the time spent outside the garbage collection.(example: “4”)
GC_ADAPTIVE_SIZE_POLICY_WEIGHT The weighting given to the current GC time versus previous GC times. (example: “90”)
GC_METASPACE_SIZE The initial metaspace size. (example: “20”)
GC_MAX_METASPACE_SIZE The maximum metaspace size. (example: “100”)
GC_CONTAINER_OPTIONS Specify Java GC to use. The value of this variable should contain the necessary JRE command-line options to specify the required GC, which will override the default of -XX:+UseParallelGC (example: -XX:+UseG1GC).
HTTPS_PROXY The location of the https proxy. (example: “myuser@127.0.0.1:8080”)
HTTP_PROXY The location of the http proxy. (example: “myuser@127.0.0.1:8080”)
NO_PROXY A comma separated lists of hosts, IP addresses or domains that can be accessed directly. (example: “foo.example.com,bar.example.com”)