Server Configuration¶
The Nessie server is configurable via properties as listed in the application.properties file.
These properties can be set when starting up the docker image in two different ways. For example, if you want to set Nessie to use the JDBC
version store and provide a JDBC connection URL, you can either:
-
Set these values via the
JAVA_OPTS_APPEND
option in the Docker invocation. Each setting should be inserted inside the variable’s value as-D<name>=<value>
pairs:docker run -p 19120:19120 \ -e JAVA_OPTS_APPEND="-Dnessie.version.store.type=JDBC -Dquarkus.datasource.jdbc.url=jdbc:postgresql://host.com:5432/db" \ ghcr.io/projectnessie/nessie
-
Alternatively, set them via the
--env
(or-e
) option in the Docker invocation. Each setting must be provided separately as--env NAME=value
options:docker run -p 19120:19120 \ --env NESSIE_VERSION_STORE_TYPE=JDBC \ --env QUARKUS_DATASOURCE_JDBC_URL="jdbc:postgresql://host.com:5432/db" \ ghcr.io/projectnessie/nessie
Note how the original property name is converted to an environment variable, e.g. nessie.version.store.type
becomes NESSIE_VERSION_STORE_TYPE
. The conversion is done by replacing all .
with _
and converting the name to upper case. See here for more details.
For more information on docker images, see Docker image options below.
Server sizing¶
The minimum resources for Nessie are 4 CPUs and 4 GB RAM.
The recommended resources for Nessie depend on the actual use case and usage pattern(s). We recommend to try various configurations, starting with 8 CPUs and 8 GB RAM.
The efficiency of Nessie’s cache can be monitored using the metrics provided with the cache=nessie-objects
tag, especially the cache.gets
values for hit
/miss
and the cause
s provided by cache.evictions
.
Note
Nessie is a stateless service that heavily depends on the performance of the backend database (request duration and throughput) and works best with distributed key-value databases. Nessie has a built-in cache. Caches require memory, the more memory, the more efficient is the cache and the fewer operations need to be performed against the backend database.
Tip
You can set the the nessie.version.store.persist.reference-cache-ttl
configuration option to further reduce the load against the backing database. See Version Store Advanced Settings below.
Note
Many things happen in parallel and some libraries that we have to depend on are not written in a “reactive way”, especially with Iceberg REST. While the Iceberg REST parts in Nessie are built in a “reactive way”, most Nessie core APIs are not.
Supported operating systems¶
Operating System | Production | Development & prototyping | Comments |
---|---|---|---|
Linux | Primarily supported operating systems, assuming recent kernel and distribution versions. | ||
macOS | Supported for development and testing purposes. | ||
AIX | Not tested, might work or not. | ||
Solaris | Not tested, might work or not. | ||
Windows | Not supported in any way. Nessie server and admin tool refuse to start. |
Providing secrets¶
Instead of providing secrets like passwords in clear text, you can also use a keystore. This functionality is provided natively via Quarkus.
See also the secrets manager settings below for information about Hashicorp Vault, Google Cloud and Amazon Services Secrets Managers.
Core Nessie Configuration Settings¶
Core Settings¶
Nessie server configuration to be injected into the JAX-RS application.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.server.default-branch | main | string | The default branch to use if not provided by the user. |
nessie.server.send-stacktrace-to-client | false | boolean | Whether stack traces should be sent to the client in case of error. The default is false to not expose internal details for security reasons. |
nessie.server.access-checks-batch-size | 100 | int | The number of entity-checks that are grouped into a call to BatchAccessChecker . The default value is quite conservative, it is the responsibility of the operator to adjust this value according to the capabilities of the actual authz implementation. Note that the number of checks can be slightly exceeded by the implementation, depending on the call site. |
Related Quarkus settings:
Property | Default values | Type | Description |
---|---|---|---|
quarkus.http.port | 19120 | int | Sets the HTTP port for the Nessie REST API endpoints. |
quarkus.management.port | 9000 | int | Sets the HTTP port for management endpoints (health, metrics, Swagger) |
Info
A complete set of configuration options for Quarkus can be found on quarkus.io
Info
Reverse Proxy Settings
These config options are mentioned only for documentation purposes. Consult the Quarkus documentation for “Running behind a reverse proxy” and configure those depending on your actual needs.
Do NOT enable these option unless your reverse proxy (for example istio or nginx) is properly setup to set these headers but also filter those from incoming requests.
Catalog and Iceberg REST Settings¶
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.validate-secrets | false | boolean | Optional: validate at server startup that all referenced secrets can be resolved. Startup will fail, it one or more secrets cannot be resolved at startup time, hence the default is false . |
Warehouse defaults¶
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.default-warehouse | string | Name of the default warehouse. This one is used when a warehouse is not specified in a query. If no default warehouse is configured and a request does not specify a warehouse, the request will fail. | |
nessie.catalog.iceberg-config-defaults. <iceberg-property> | string | Iceberg config defaults applicable to all clients and warehouses. Any properties that are common to all iceberg clients should be included here. They will be passed to all clients on all warehouses as config defaults. These defaults can be overridden on a per-warehouse basis, see iceberg-config-defaults in Warehouses. | |
nessie.catalog.iceberg-config-overrides. <iceberg-property> | string | Iceberg config overrides applicable to all clients and warehouses. Any properties that are common to all iceberg clients should be included here. They will be passed to all clients on all warehouses as config overrides. These overrides can be overridden on a per-warehouse basis, see iceberg-config-overrides in Warehouses. |
Warehouses¶
Map of warehouse names to warehouse configurations.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.warehouses. <warehouse-name> .iceberg-config-defaults. <iceberg-property> | string | Iceberg config defaults specific to this warehouse, potentially overriding any defaults specified in iceberg-config-defaults in Warehouse defaults . | |
nessie.catalog.warehouses. <warehouse-name> .iceberg-config-overrides. <iceberg-property> | string | Iceberg config overrides specific to this warehouse. They override any overrides specified in iceberg-config-overrides in Warehouse defaults. | |
nessie.catalog.warehouses. <warehouse-name> .location | string | Location of the warehouse. Used to determine the base location of a table. |
S3 settings¶
Configuration for S3 compatible object stores.
Default settings to be applied to all buckets can be set in the default-options
group. Specific settings for each bucket can be specified via the buckets
map.
All settings are optional. The defaults of these settings are defined by the AWSSDK Java client.
S3 default bucket settings¶
Default bucket configuration, default/fallback values for all buckets are taken from this one.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.service.s3.default-options.endpoint | uri | Endpoint URI, required for private (non-AWS) clouds, specified either per bucket or in the top-level S3 settings. If the endpoint URIs for the Nessie server and clients differ, this one defines the endpoint used for the Nessie server. | |
nessie.catalog.service.s3.default-options.external-endpoint | uri | When using a specific endpoint (endpoint ) and the endpoint URIs for the Nessie server differ, you can specify the URI passed down to clients using this setting. Otherwise, clients will receive the value from the endpoint setting. | |
nessie.catalog.service.s3.default-options.path-style-access | boolean | Whether to use path-style access. If true, path-style access will be used, as in: https://<domain>/<bucket> . If false, a virtual-hosted style will be used instead, as in: https://<bucket>.<domain> . If unspecified, the default will depend on the cloud provider. | |
nessie.catalog.service.s3.default-options.access-point | string | AWS Access point for this bucket. Access points can be used to perform S3 operations by specifying a mapping of bucket to access points. This is useful for multi-region access, cross-region access, disaster recovery, etc. See: Access Points | |
nessie.catalog.service.s3.default-options.allow-cross-region-access-point | boolean | Authorize cross-region calls when contacting an access-point . By default, attempting to use an access point in a different region will throw an exception. When enabled, this property allows using access points in other regions. | |
nessie.catalog.service.s3.default-options.region | string | DNS name of the region, required for AWS. The region must be specified for AWS, either per bucket or in the top-level S3 settings. | |
nessie.catalog.service.s3.default-options.auth-type | APPLICATION_GLOBAL, STATIC | The authentication mode to use by the Catalog server. If not set, the default is STATIC . Depending on the authentication mode, other properties may be required. Valid values are: * APPLICATION_GLOBAL : Use the AWSSDK default credentials provider . * STATIC : Static credentials provided through the access-key option. | |
nessie.catalog.service.s3.default-options.access-key | uri | Name of the basic-credentials secret containing the access-key-id and secret-access-key, either per bucket or in the top-level S3 settings. Required when auth-type is STATIC . For STS, this defines the Access Key ID and Secret Key ID to be used as a basic credential for obtaining temporary session credentials. | |
nessie.catalog.service.s3.default-options.request-signing-enabled | boolean | Optional parameter to disable S3 request signing. Default is to enable S3 request signing. | |
nessie.catalog.service.s3.default-options.sts-endpoint | uri | The Security Token Service endpoint. This parameter must be set when running in a private (non-AWS) cloud and the catalog is configured to use S3 sessions (e.g. to use the “assume role” functionality). | |
nessie.catalog.service.s3.default-options.server-iam.enabled | boolean | Optional parameter to enable assume role (vended credentials). Default is to disable assume role. | |
nessie.catalog.service.s3.default-options.server-iam.policy | string | IAM policy in JSON format to be used as an inline session policy (optional). If specified, this policy will be used for all clients for all locations. Related docs: S3 with IAM and about actions, resources, conditions and policy reference . | |
nessie.catalog.service.s3.default-options.server-iam.assume-role | string | The ARN of the role to assume for accessing S3 data. This parameter is required for Amazon S3, but may not be required for other storage providers (e.g. Minio does not use it at all). If this option is defined, the server will attempt to assume the role at startup and cache the returned session credentials. | |
nessie.catalog.service.s3.default-options.server-iam.role-session-name | string | An identifier for the assumed role session. This parameter is most important in cases when the same role is assumed by different principals in different use cases. | |
nessie.catalog.service.s3.default-options.server-iam.external-id | string | An identifier for the party assuming the role. This parameter must match the external ID configured in IAM rules that govern the assume role process for the specified role-arn . This parameter is essential in preventing the Confused Deputy problem. | |
nessie.catalog.service.s3.default-options.server-iam.session-duration | duration | A higher bound estimate of the expected duration of client “sessions” working with data in this bucket. A session, for example, is the lifetime of an Iceberg REST catalog object on the client side. This value is used for validating expiration times of credentials associated with the warehouse. Must be >= 1 second. | |
nessie.catalog.service.s3.default-options.client-iam.statements | list of string | Additional IAM policy statements to be inserted after the automatically generated S3 location dependent Allow policy statement. Example: ...client-iam.statements[0]={"Effect":"Allow", "Action":"s3:*", "Resource":"arn:aws:s3:::* /alwaysAllowed/*"} ...client-iam.statements[1]={"Effect":"Deny", "Action":"s3:*", "Resource":"arn:aws:s3:::* /blocked/*"} Related docs: S3 with IAM and about actions, resources, conditions and policy reference . | |
nessie.catalog.service.s3.default-options.client-iam.enabled | boolean | Optional parameter to enable assume role (vended credentials). Default is to disable assume role. | |
nessie.catalog.service.s3.default-options.client-iam.policy | string | IAM policy in JSON format to be used as an inline session policy (optional). If specified, this policy will be used for all clients for all locations. Related docs: S3 with IAM and about actions, resources, conditions and policy reference . | |
nessie.catalog.service.s3.default-options.client-iam.assume-role | string | The ARN of the role to assume for accessing S3 data. This parameter is required for Amazon S3, but may not be required for other storage providers (e.g. Minio does not use it at all). If this option is defined, the server will attempt to assume the role at startup and cache the returned session credentials. | |
nessie.catalog.service.s3.default-options.client-iam.role-session-name | string | An identifier for the assumed role session. This parameter is most important in cases when the same role is assumed by different principals in different use cases. | |
nessie.catalog.service.s3.default-options.client-iam.external-id | string | An identifier for the party assuming the role. This parameter must match the external ID configured in IAM rules that govern the assume role process for the specified role-arn . This parameter is essential in preventing the Confused Deputy problem. | |
nessie.catalog.service.s3.default-options.client-iam.session-duration | duration | A higher bound estimate of the expected duration of client “sessions” working with data in this bucket. A session, for example, is the lifetime of an Iceberg REST catalog object on the client side. This value is used for validating expiration times of credentials associated with the warehouse. Must be >= 1 second. |
S3 per bucket settings¶
Per-bucket configurations. The effective value for a bucket is taken from the per-bucket setting. If no per-bucket setting is present, uses the defaults from the top-level S3 settings in default-options
.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.service.s3.buckets. <key> .endpoint | uri | Endpoint URI, required for private (non-AWS) clouds, specified either per bucket or in the top-level S3 settings. If the endpoint URIs for the Nessie server and clients differ, this one defines the endpoint used for the Nessie server. | |
nessie.catalog.service.s3.buckets. <key> .external-endpoint | uri | When using a specific endpoint (endpoint ) and the endpoint URIs for the Nessie server differ, you can specify the URI passed down to clients using this setting. Otherwise, clients will receive the value from the endpoint setting. | |
nessie.catalog.service.s3.buckets. <key> .path-style-access | boolean | Whether to use path-style access. If true, path-style access will be used, as in: https://<domain>/<bucket> . If false, a virtual-hosted style will be used instead, as in: https://<bucket>.<domain> . If unspecified, the default will depend on the cloud provider. | |
nessie.catalog.service.s3.buckets. <key> .access-point | string | AWS Access point for this bucket. Access points can be used to perform S3 operations by specifying a mapping of bucket to access points. This is useful for multi-region access, cross-region access, disaster recovery, etc. See: Access Points | |
nessie.catalog.service.s3.buckets. <key> .allow-cross-region-access-point | boolean | Authorize cross-region calls when contacting an access-point . By default, attempting to use an access point in a different region will throw an exception. When enabled, this property allows using access points in other regions. | |
nessie.catalog.service.s3.buckets. <key> .region | string | DNS name of the region, required for AWS. The region must be specified for AWS, either per bucket or in the top-level S3 settings. | |
nessie.catalog.service.s3.buckets. <key> .auth-type | APPLICATION_GLOBAL, STATIC | The authentication mode to use by the Catalog server. If not set, the default is STATIC . Depending on the authentication mode, other properties may be required. Valid values are: * APPLICATION_GLOBAL : Use the AWSSDK default credentials provider . * STATIC : Static credentials provided through the access-key option. | |
nessie.catalog.service.s3.buckets. <key> .access-key | uri | Name of the basic-credentials secret containing the access-key-id and secret-access-key, either per bucket or in the top-level S3 settings. Required when auth-type is STATIC . For STS, this defines the Access Key ID and Secret Key ID to be used as a basic credential for obtaining temporary session credentials. | |
nessie.catalog.service.s3.buckets. <key> .request-signing-enabled | boolean | Optional parameter to disable S3 request signing. Default is to enable S3 request signing. | |
nessie.catalog.service.s3.buckets. <key> .sts-endpoint | uri | The Security Token Service endpoint. This parameter must be set when running in a private (non-AWS) cloud and the catalog is configured to use S3 sessions (e.g. to use the “assume role” functionality). | |
nessie.catalog.service.s3.buckets. <key> .server-iam.enabled | boolean | Optional parameter to enable assume role (vended credentials). Default is to disable assume role. | |
nessie.catalog.service.s3.buckets. <key> .server-iam.policy | string | IAM policy in JSON format to be used as an inline session policy (optional). If specified, this policy will be used for all clients for all locations. Related docs: S3 with IAM and about actions, resources, conditions and policy reference . | |
nessie.catalog.service.s3.buckets. <key> .server-iam.assume-role | string | The ARN of the role to assume for accessing S3 data. This parameter is required for Amazon S3, but may not be required for other storage providers (e.g. Minio does not use it at all). If this option is defined, the server will attempt to assume the role at startup and cache the returned session credentials. | |
nessie.catalog.service.s3.buckets. <key> .server-iam.role-session-name | string | An identifier for the assumed role session. This parameter is most important in cases when the same role is assumed by different principals in different use cases. | |
nessie.catalog.service.s3.buckets. <key> .server-iam.external-id | string | An identifier for the party assuming the role. This parameter must match the external ID configured in IAM rules that govern the assume role process for the specified role-arn . This parameter is essential in preventing the Confused Deputy problem. | |
nessie.catalog.service.s3.buckets. <key> .server-iam.session-duration | duration | A higher bound estimate of the expected duration of client “sessions” working with data in this bucket. A session, for example, is the lifetime of an Iceberg REST catalog object on the client side. This value is used for validating expiration times of credentials associated with the warehouse. Must be >= 1 second. | |
nessie.catalog.service.s3.buckets. <key> .client-iam.statements | list of string | Additional IAM policy statements to be inserted after the automatically generated S3 location dependent Allow policy statement. Example: ...client-iam.statements[0]={"Effect":"Allow", "Action":"s3:*", "Resource":"arn:aws:s3:::* /alwaysAllowed/*"} ...client-iam.statements[1]={"Effect":"Deny", "Action":"s3:*", "Resource":"arn:aws:s3:::* /blocked/*"} Related docs: S3 with IAM and about actions, resources, conditions and policy reference . | |
nessie.catalog.service.s3.buckets. <key> .client-iam.enabled | boolean | Optional parameter to enable assume role (vended credentials). Default is to disable assume role. | |
nessie.catalog.service.s3.buckets. <key> .client-iam.policy | string | IAM policy in JSON format to be used as an inline session policy (optional). If specified, this policy will be used for all clients for all locations. Related docs: S3 with IAM and about actions, resources, conditions and policy reference . | |
nessie.catalog.service.s3.buckets. <key> .client-iam.assume-role | string | The ARN of the role to assume for accessing S3 data. This parameter is required for Amazon S3, but may not be required for other storage providers (e.g. Minio does not use it at all). If this option is defined, the server will attempt to assume the role at startup and cache the returned session credentials. | |
nessie.catalog.service.s3.buckets. <key> .client-iam.role-session-name | string | An identifier for the assumed role session. This parameter is most important in cases when the same role is assumed by different principals in different use cases. | |
nessie.catalog.service.s3.buckets. <key> .client-iam.external-id | string | An identifier for the party assuming the role. This parameter must match the external ID configured in IAM rules that govern the assume role process for the specified role-arn . This parameter is essential in preventing the Confused Deputy problem. | |
nessie.catalog.service.s3.buckets. <key> .client-iam.session-duration | duration | A higher bound estimate of the expected duration of client “sessions” working with data in this bucket. A session, for example, is the lifetime of an Iceberg REST catalog object on the client side. This value is used for validating expiration times of credentials associated with the warehouse. Must be >= 1 second. | |
nessie.catalog.service.s3.buckets. <key> .name | string | The human consumable name of the bucket. If unset, the name of the bucket will be extracted from the configuration option name, e.g. if nessie.catalog.service.s3.bucket1.name=my-bucket is set, the bucket name will be my-bucket ; otherwise, it will be bucket1 . This can be used; if the bucket name contains non-alphanumeric characters, such as dots or dashes. | |
nessie.catalog.service.s3.buckets. <key> .authority | string | The authority part in a storage location URI. This is the bucket name for S3 and GCS, for ADLS this is the storage account name (optionally prefixed with the container/file-system name). Defaults to (#name() ). For S3 and GCS this option should mention the name of the bucket. For ADLS: The value of this option is using the container@storageAccount syntax. It is mentioned as <file_system>@<account_name> in the Azure Docs . Note that the <file_system>@ part is optional, <account_name> is the fully qualified name, usually ending in .dfs.core.windows.net . | |
nessie.catalog.service.s3.buckets. <key> .path-prefix | string | The path prefix for this storage location. |
S3 transport¶
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.service.s3.http.max-http-connections | int | Override the default maximum number of pooled connections. | |
nessie.catalog.service.s3.http.read-timeout | duration | Override the default connection read timeout. | |
nessie.catalog.service.s3.http.connect-timeout | duration | Override the default TCP connect timeout. | |
nessie.catalog.service.s3.http.connection-acquisition-timeout | duration | Override default connection acquisition timeout. This is the time a request will wait for a connection from the pool. | |
nessie.catalog.service.s3.http.connection-max-idle-time | duration | Override default max idle time of a pooled connection. | |
nessie.catalog.service.s3.http.connection-time-to-live | duration | Override default time-time of a pooled connection. | |
nessie.catalog.service.s3.http.expect-continue-enabled | boolean | Override default behavior whether to expect an HTTP/100-Continue. | |
nessie.catalog.service.s3.trust-all-certificates | boolean | Instruct the S3 HTTP client to accept all SSL certificates, if set to true . Enabling this option is dangerous, it is strongly recommended to leave this option unset or false . | |
nessie.catalog.service.s3.trust-store.path | path | Override to set the file path to a custom SSL key or trust store. nessie.catalog.service.s3.trust-store.type and nessie.catalog.service.s3.trust-store.password must be supplied as well when providing a custom trust store. When running in k8s or Docker, the path is local within the pod/container and must be explicitly mounted. | |
nessie.catalog.service.s3.trust-store.type | string | Override to set the type of the custom SSL key or trust store specified in nessie.catalog.service.s3.trust-store.path . Supported types include JKS , PKCS12 , and all key store types supported by Java 17. | |
nessie.catalog.service.s3.trust-store.password | uri | Name of the key-secret containing the password for the custom SSL key or trust store specified in nessie.catalog.service.s3.trust-store.path . | |
nessie.catalog.service.s3.key-store.path | path | Override to set the file path to a custom SSL key or trust store. nessie.catalog.service.s3.trust-store.type and nessie.catalog.service.s3.trust-store.password must be supplied as well when providing a custom trust store. When running in k8s or Docker, the path is local within the pod/container and must be explicitly mounted. | |
nessie.catalog.service.s3.key-store.type | string | Override to set the type of the custom SSL key or trust store specified in nessie.catalog.service.s3.trust-store.path . Supported types include JKS , PKCS12 , and all key store types supported by Java 17. | |
nessie.catalog.service.s3.key-store.password | uri | Name of the key-secret containing the password for the custom SSL key or trust store specified in nessie.catalog.service.s3.trust-store.path . |
S3 STS, assume-role global settings¶
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.service.s3.sts.session-grace-period | duration | The time period to subtract from the S3 session credentials (assumed role credentials) expiry time to define the time when those credentials become eligible for refreshing. | |
nessie.catalog.service.s3.sts.session-cache-max-size | int | Maximum number of entries to keep in the session credentials cache (assumed role credentials). | |
nessie.catalog.service.s3.sts.clients-cache-max-size | int | Maximum number of entries to keep in the STS clients cache. |
Google Cloud Storage settings¶
Note
Support for GCS is experimental.
GCS buckets¶
Configuration for Google Cloud Storage (GCS) object stores.
Default settings to be applied to all buckets can be set in the default-options
group. Specific settings for each bucket can be specified via the buckets
map.
All settings are optional. The defaults of these settings are defined by the Google Java SDK client.
GCS default bucket settings¶
Default bucket configuration, default/fallback values for all buckets are taken from this one.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.service.gcs.default-options.host | uri | The default endpoint override to use. The endpoint is almost always used for testing purposes. If the endpoint URIs for the Nessie server and clients differ, this one defines the endpoint used for the Nessie server. | |
nessie.catalog.service.gcs.default-options.external-host | uri | When using a specific endpoint, see host , and the endpoint URIs for the Nessie server differ, you can specify the URI passed down to clients using this setting. Otherwise, clients will receive the value from the host setting. | |
nessie.catalog.service.gcs.default-options.user-project | string | Optionally specify the user project (Google term). | |
nessie.catalog.service.gcs.default-options.project-id | string | The Google project ID. | |
nessie.catalog.service.gcs.default-options.quota-project-id | string | The Google quota project ID. | |
nessie.catalog.service.gcs.default-options.client-lib-token | string | The Google client lib token. | |
nessie.catalog.service.gcs.default-options.auth-type | NONE, USER, SERVICE_ACCOUNT, ACCESS_TOKEN, APPLICATION_DEFAULT | The authentication type to use. If not set, the default is NONE . | |
nessie.catalog.service.gcs.default-options.auth-credentials-json | uri | Name of the key-secret containing the auth-credentials-JSON, this value is the name of the credential to use, the actual credential is defined via secrets. | |
nessie.catalog.service.gcs.default-options.oauth2-token | uri | Name of the token-secret containing the OAuth2 token, this value is the name of the credential to use, the actual credential is defined via secrets. | |
nessie.catalog.service.gcs.default-options.downscoped-credentials.enable | boolean | Flag to enable the currently experimental option to send short-lived and scoped-down credentials to clients. The current default is to not enable short-lived and scoped-down credentials, but the default may change to enable in the future. | |
nessie.catalog.service.gcs.default-options.downscoped-credentials.expiration-margin | duration | The expiration margin for the scoped down OAuth2 token. Defaults to the Google defaults. | |
nessie.catalog.service.gcs.default-options.downscoped-credentials.refresh-margin | duration | The refresh margin for the scoped down OAuth2 token. Defaults to the Google defaults. | |
nessie.catalog.service.gcs.default-options.read-chunk-size | int | The read chunk size in bytes. | |
nessie.catalog.service.gcs.default-options.write-chunk-size | int | The write chunk size in bytes. | |
nessie.catalog.service.gcs.default-options.delete-batch-size | int | The delete batch size. | |
nessie.catalog.service.gcs.default-options.encryption-key | uri | Name of the key-secret containing the customer-supplied AES256 key for blob encryption when writing. | |
nessie.catalog.service.gcs.default-options.decryption-key | uri | Name of the key-secret containing the customer-supplied AES256 key for blob decryption when reading. |
GCS per bucket settings¶
Per-bucket configurations. The effective value for a bucket is taken from the per-bucket setting. If no per-bucket setting is present, uses the defaults from the top-level GCS settings in default-options
.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.service.gcs.buckets. <key> .host | uri | The default endpoint override to use. The endpoint is almost always used for testing purposes. If the endpoint URIs for the Nessie server and clients differ, this one defines the endpoint used for the Nessie server. | |
nessie.catalog.service.gcs.buckets. <key> .external-host | uri | When using a specific endpoint, see host , and the endpoint URIs for the Nessie server differ, you can specify the URI passed down to clients using this setting. Otherwise, clients will receive the value from the host setting. | |
nessie.catalog.service.gcs.buckets. <key> .user-project | string | Optionally specify the user project (Google term). | |
nessie.catalog.service.gcs.buckets. <key> .project-id | string | The Google project ID. | |
nessie.catalog.service.gcs.buckets. <key> .quota-project-id | string | The Google quota project ID. | |
nessie.catalog.service.gcs.buckets. <key> .client-lib-token | string | The Google client lib token. | |
nessie.catalog.service.gcs.buckets. <key> .auth-type | NONE, USER, SERVICE_ACCOUNT, ACCESS_TOKEN, APPLICATION_DEFAULT | The authentication type to use. If not set, the default is NONE . | |
nessie.catalog.service.gcs.buckets. <key> .auth-credentials-json | uri | Name of the key-secret containing the auth-credentials-JSON, this value is the name of the credential to use, the actual credential is defined via secrets. | |
nessie.catalog.service.gcs.buckets. <key> .oauth2-token | uri | Name of the token-secret containing the OAuth2 token, this value is the name of the credential to use, the actual credential is defined via secrets. | |
nessie.catalog.service.gcs.buckets. <key> .downscoped-credentials.enable | boolean | Flag to enable the currently experimental option to send short-lived and scoped-down credentials to clients. The current default is to not enable short-lived and scoped-down credentials, but the default may change to enable in the future. | |
nessie.catalog.service.gcs.buckets. <key> .downscoped-credentials.expiration-margin | duration | The expiration margin for the scoped down OAuth2 token. Defaults to the Google defaults. | |
nessie.catalog.service.gcs.buckets. <key> .downscoped-credentials.refresh-margin | duration | The refresh margin for the scoped down OAuth2 token. Defaults to the Google defaults. | |
nessie.catalog.service.gcs.buckets. <key> .read-chunk-size | int | The read chunk size in bytes. | |
nessie.catalog.service.gcs.buckets. <key> .write-chunk-size | int | The write chunk size in bytes. | |
nessie.catalog.service.gcs.buckets. <key> .delete-batch-size | int | The delete batch size. | |
nessie.catalog.service.gcs.buckets. <key> .encryption-key | uri | Name of the key-secret containing the customer-supplied AES256 key for blob encryption when writing. | |
nessie.catalog.service.gcs.buckets. <key> .decryption-key | uri | Name of the key-secret containing the customer-supplied AES256 key for blob decryption when reading. | |
nessie.catalog.service.gcs.buckets. <key> .name | string | The human consumable name of the bucket. If unset, the name of the bucket will be extracted from the configuration option name, e.g. if nessie.catalog.service.s3.bucket1.name=my-bucket is set, the bucket name will be my-bucket ; otherwise, it will be bucket1 . This can be used; if the bucket name contains non-alphanumeric characters, such as dots or dashes. | |
nessie.catalog.service.gcs.buckets. <key> .authority | string | The authority part in a storage location URI. This is the bucket name for S3 and GCS, for ADLS this is the storage account name (optionally prefixed with the container/file-system name). Defaults to (#name() ). For S3 and GCS this option should mention the name of the bucket. For ADLS: The value of this option is using the container@storageAccount syntax. It is mentioned as <file_system>@<account_name> in the Azure Docs . Note that the <file_system>@ part is optional, <account_name> is the fully qualified name, usually ending in .dfs.core.windows.net . | |
nessie.catalog.service.gcs.buckets. <key> .path-prefix | string | The path prefix for this storage location. |
GCS transport¶
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.service.gcs.read-timeout | duration | Override the default read timeout. | |
nessie.catalog.service.gcs.connect-timeout | duration | Override the default connection timeout. | |
nessie.catalog.service.gcs.max-attempts | int | Override the default maximum number of attempts. | |
nessie.catalog.service.gcs.logical-timeout | duration | Override the default logical request timeout. | |
nessie.catalog.service.gcs.total-timeout | duration | Override the default total timeout. | |
nessie.catalog.service.gcs.initial-retry-delay | duration | Override the default initial retry delay. | |
nessie.catalog.service.gcs.max-retry-delay | duration | Override the default maximum retry delay. | |
nessie.catalog.service.gcs.retry-delay-multiplier | double | Override the default retry delay multiplier. | |
nessie.catalog.service.gcs.initial-rpc-timeout | duration | Override the default initial RPC timeout. | |
nessie.catalog.service.gcs.max-rpc-timeout | duration | Override the default maximum RPC timeout. | |
nessie.catalog.service.gcs.rpc-timeout-multiplier | double | Override the default RPC timeout multiplier. |
ADLS settings¶
Note
Support for ADLS is experimental.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.service.adls.read-block-size | int | Override the default read block size used when writing to ADLS. | |
nessie.catalog.service.adls.write-block-size | long | Override the default write block size used when writing to ADLS. |
ADLS default file-system settings¶
Default file-system configuration, default/fallback values for all file-systems are taken from this one.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.service.adls.default-options.auth-type | NONE, STORAGE_SHARED_KEY, SAS_TOKEN, APPLICATION_DEFAULT | The authentication type to use. | |
nessie.catalog.service.adls.default-options.account | uri | Name of the basic-credentials secret containing the fully-qualified account name, e.g. "myaccount.dfs.core.windows.net" and account key, configured using the name and secret fields. If not specified, it will be queried via the configured credentials provider. | |
nessie.catalog.service.adls.default-options.sas-token | uri | Name of the key-secret containing the SAS token to access the ADLS file system. | |
nessie.catalog.service.adls.default-options.user-delegation.enable | boolean | Enable short-lived user-delegation SAS tokens per file-system. The current default is to not enable short-lived and scoped-down credentials, but the default may change to enable in the future. | |
nessie.catalog.service.adls.default-options.user-delegation.key-expiry | duration | Expiration time / validity duration of the user-delegation key, this key is not passed to the client. Defaults to 7 days minus 1 minute (the maximum), must be >= 1 second. | |
nessie.catalog.service.adls.default-options.user-delegation.sas-expiry | duration | Expiration time / validity duration of the user-delegation SAS token, which is sent to the client. Defaults to 3 hours, must be >= 1 second. | |
nessie.catalog.service.adls.default-options.endpoint | string | Define a custom HTTP endpoint. In case clients need to use a different URI, use the .external-endpoint setting. | |
nessie.catalog.service.adls.default-options.external-endpoint | string | Define a custom HTTP endpoint, this value is used by clients. | |
nessie.catalog.service.adls.default-options.retry-policy | NONE, EXPONENTIAL_BACKOFF, FIXED_DELAY | Configure the retry strategy. | |
nessie.catalog.service.adls.default-options.max-retries | int | Mandatory, if any retry-policy is configured. | |
nessie.catalog.service.adls.default-options.try-timeout | duration | Mandatory, if any retry-policy is configured. | |
nessie.catalog.service.adls.default-options.retry-delay | duration | Mandatory, if any retry-policy is configured. | |
nessie.catalog.service.adls.default-options.max-retry-delay | duration | Mandatory, if EXPONENTIAL_BACKOFF is configured. |
ADLS per file-system settings¶
Per-bucket configurations. The effective value for a bucket is taken from the per-bucket setting. If no per-bucket setting is present, uses the defaults from the top-level ADLS settings in default-options
.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.service.adls.file-systems. <key> .auth-type | NONE, STORAGE_SHARED_KEY, SAS_TOKEN, APPLICATION_DEFAULT | The authentication type to use. | |
nessie.catalog.service.adls.file-systems. <key> .account | uri | Name of the basic-credentials secret containing the fully-qualified account name, e.g. "myaccount.dfs.core.windows.net" and account key, configured using the name and secret fields. If not specified, it will be queried via the configured credentials provider. | |
nessie.catalog.service.adls.file-systems. <key> .sas-token | uri | Name of the key-secret containing the SAS token to access the ADLS file system. | |
nessie.catalog.service.adls.file-systems. <key> .user-delegation.enable | boolean | Enable short-lived user-delegation SAS tokens per file-system. The current default is to not enable short-lived and scoped-down credentials, but the default may change to enable in the future. | |
nessie.catalog.service.adls.file-systems. <key> .user-delegation.key-expiry | duration | Expiration time / validity duration of the user-delegation key, this key is not passed to the client. Defaults to 7 days minus 1 minute (the maximum), must be >= 1 second. | |
nessie.catalog.service.adls.file-systems. <key> .user-delegation.sas-expiry | duration | Expiration time / validity duration of the user-delegation SAS token, which is sent to the client. Defaults to 3 hours, must be >= 1 second. | |
nessie.catalog.service.adls.file-systems. <key> .endpoint | string | Define a custom HTTP endpoint. In case clients need to use a different URI, use the .external-endpoint setting. | |
nessie.catalog.service.adls.file-systems. <key> .external-endpoint | string | Define a custom HTTP endpoint, this value is used by clients. | |
nessie.catalog.service.adls.file-systems. <key> .retry-policy | NONE, EXPONENTIAL_BACKOFF, FIXED_DELAY | Configure the retry strategy. | |
nessie.catalog.service.adls.file-systems. <key> .max-retries | int | Mandatory, if any retry-policy is configured. | |
nessie.catalog.service.adls.file-systems. <key> .try-timeout | duration | Mandatory, if any retry-policy is configured. | |
nessie.catalog.service.adls.file-systems. <key> .retry-delay | duration | Mandatory, if any retry-policy is configured. | |
nessie.catalog.service.adls.file-systems. <key> .max-retry-delay | duration | Mandatory, if EXPONENTIAL_BACKOFF is configured. | |
nessie.catalog.service.adls.file-systems. <key> .name | string | The human consumable name of the bucket. If unset, the name of the bucket will be extracted from the configuration option name, e.g. if nessie.catalog.service.s3.bucket1.name=my-bucket is set, the bucket name will be my-bucket ; otherwise, it will be bucket1 . This can be used; if the bucket name contains non-alphanumeric characters, such as dots or dashes. | |
nessie.catalog.service.adls.file-systems. <key> .authority | string | The authority part in a storage location URI. This is the bucket name for S3 and GCS, for ADLS this is the storage account name (optionally prefixed with the container/file-system name). Defaults to (#name() ). For S3 and GCS this option should mention the name of the bucket. For ADLS: The value of this option is using the container@storageAccount syntax. It is mentioned as <file_system>@<account_name> in the Azure Docs . Note that the <file_system>@ part is optional, <account_name> is the fully qualified name, usually ending in .dfs.core.windows.net . | |
nessie.catalog.service.adls.file-systems. <key> .path-prefix | string | The path prefix for this storage location. |
ADLS transport¶
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.service.adls.max-http-connections | int | Override the default maximum number of HTTP connections that Nessie can use against all ADLS Gen2 object stores. | |
nessie.catalog.service.adls.connect-timeout | duration | Override the default TCP connect timeout for HTTP connections against ADLS Gen2 object stores. | |
nessie.catalog.service.adls.connection-idle-timeout | duration | Override the default idle timeout for HTTP connections. | |
nessie.catalog.service.adls.write-timeout | duration | Override the default write timeout for HTTP connections. | |
nessie.catalog.service.adls.read-timeout | duration | Override the default read timeout for HTTP connections. |
Advanced catalog settings¶
Error Handling¶
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.object-stores.health-check.enabled | true | boolean | Nessie tries to verify the connectivity to the object stores configured for each warehouse and exposes this information as a readiness check. It is recommended to leave this setting enabled. |
nessie.catalog.error-handling.throttled-retry-after | PT10S | duration | Advanced property. The time interval after which a request is retried when storage I/O responds with some “retry later” response. |
Performance Tuning¶
Property | Default Value | Type | Description |
---|---|---|---|
nessie.catalog.service.imports.max-concurrent | 32 | int | Advanced property, defines the maximum number of concurrent imports from object stores. |
nessie.catalog.service.tasks.threads.max | -1 | int | Advanced property, defines the maximum number of threads for async tasks like imports. |
nessie.catalog.service.tasks.threads.keep-alive | PT2S | duration | Advanced thread pool setting for async tasks like imports. |
nessie.catalog.service.tasks.minimum-delay | PT0.001S | duration | Advanced thread pool setting for async tasks like imports. |
nessie.catalog.service.race.wait.min | PT0.005S | duration | Advanced thread pool setting for async tasks like imports. |
nessie.catalog.service.race.wait.max | PT0.250S | duration | Advanced thread pool setting for async tasks like imports. |
Secrets manager settings¶
Secrets for object stores are strictly separated from the actual configuration entries. This enables the use of external secrets managers. Secrets are referenced using a URN notation.
The URN notation for Nessie secrets is urn:nessie-secret:<provider>:<secret-name>
. <provider>
references the name of the provider, for example quarkus
to resolve secrets via the Quarkus configuration. <secret-name>
is the secrets manager specific name for the secret to resolve.
Retrieving secrets from external secrets managers like Hashicorp Vault and the Amazon, Google and Azure secrets managers can take some time. Nessie mitigates this cost by caching retrieved secrets for some time, by default 15 minutes (see config reference below). The default allows you to regularly rotate the object store secrets by updating those in the external secrets manager, Nessie will pick those up within the configured cache TTL. If you do not intent to rotate your secrets, you can bump the TTL to a very high value to prevent cached secrets from being expired and hence perform unneeded requests to secrets managers.
Secrets manager and mapping configuration.
Currently the following secrets managers are supported:
Secrets can always be provided using Quarkus’ built-in mechanisms . Additionally, the following external secrets managers can be enabled:
VAULT
Hashicorp Vault. See the Quarkus docs for Hashicorp Vault for specific information.AMAZON
AWS Secrets Manager. See the Quarkus docs for Amazon Services / Secrets Manager for specific information.AZURE
AWS Secrets Manager. NOT SUPPORTED YET! See the Quarkus docs for Azure Key Vault for specific information.GOOGLE
Google Cloud Secrets Manager. NOT SUPPORTED YET!
For details how secrets are stored, see below
Property | Default Value | Type | Description |
---|---|---|---|
nessie.secrets.type | ExternalSecretsManagerType | Choose the secrets manager to use, defaults to no secrets manager. | |
nessie.secrets.path | string | The path/prefix used when accessing secrets from the secrets manager. This setting can be useful, if all Nessie related secrets have the same prefix in your external secrets manager. | |
nessie.secrets.cache.enabled | true | boolean | Flag whether the secrets cache is enabled. |
nessie.secrets.cache.max-elements | 1000 | long | Maximum number of cached secrets. |
nessie.secrets.cache.ttl | PT15M | duration | Time until cached secrets expire. |
nessie.secrets.get-secret-timeout | PT2S | duration | Timeout when retrieving a secret from the external secret manager, not supported for AWS. |
Types of Secrets¶
- Basic credentials are composites of a
name
attribute and asecret
attribute. AWS credentials are managed as basic credentials, where thename
represents the access key ID and thesecret
represents the secret access key. - Tokens are composites of a
token
attribute and an optionalexpiresAt
attribute, latter represented as an instant. - Keys consist of a single
key
attribute.
Quarkus configuration (incl environment variables)¶
Object store secrets managed via Quarkus’ configuration mechanism (SmallRye Config) resolve components of the secret types (basic credentials, tokens, keys) via individual configuration keys.
The Quarkus configuration key prefix (or environment variable name) is specified for the secret using the URN notation urn:nessie-secret:quarkus:<quarkus-configuration-key-prefix>.<secret-part>
.
The following example illustrates the Quarkus configuration entries to define the default S3 access-key and secret-access-key:
#
# Prefix of the Quarkus configuration keys for this secret ---+
# |
# The URN for Quarkus secrets ---+ |
# | |
# |------------------------- |--------------------
nessie.catalog.service.s3.default-options.access-key=urn:nessie-secret:quarkus:my-secrets.s3-default
# The AWS access-key and secret-access-key are referenced via the "secret-part" name,
# see 'Types of Secrets' above.
# `my-secrets.s3-default` is the `secret-part` as in the last part of the above property
my-secrets.s3-default.name=awsAccessKeyId
my-secrets.s3-default.secret=awsSecretAccessKey
Storing Secrets in Hashicorp Vault¶
Secrets in Hashicorp Vault are referenced using the URN prefix urn:nessie-secret:vault:
followed by the name/path of the secrets in Hashicorp Vault.
When using Hashicorp Vault make sure to configure the connection settings described in the Quarkus docs.
In Hashicorp Vault, secrets are stored as a map of strings to strings, where the map keys are defined by the type of the secret as mentioned above.
For example, using the vault
tool, a basic credential is stored like this:
vault kv put secret/nessie-secrets/... name=the_username secret=the_secret_password
vault kv put secret/nessie-secrets/... name=access_key secret=secret_access_key
A token is stored like this:
vault kv put secret/nessie-secrets/... token=value_of_the_secret_token
vault kv put secret/nessie-secrets/... token=value_of_the_token expiresAt=2024-12-24T18:00:00Z
A key is stored like this:
vault kv put secret/nessie-secrets/... key=value_of_the_secret_key
The paths mentioned above (secret/nessie-secrets/...
) contain the path within Hashicorp Vault. Those need to be specified in the Nessie secrets URN notation starting with urn:nessie-secret:vault:
.
Storing Secrets in Google Cloud and Amazon Services Secrets Managers and Azure Key Vault¶
Warn
Google Secrets Manager and Azure Key Vault are both not yet supported and considered experimental! The reason is that there is no good way to test those locally and in CI.
Secrets Store specifics:
Secrets Manager | Nessie URN prefix | Configuration Details |
---|---|---|
Google Cloud Secrets Manager | urn:nessie-secret:google: | Quarkus Reference Docs |
Amazon Services Secrets Manager | urn:nessie-secret:amazon: | Quarkus Reference Docs |
Azure Key Vault | urn:nessie-secret:azure: | Quarkus Reference Docs |
In Google Cloud and Amazon Services Secrets Managers and Azure Key Vault all secrets are stored as a single string.
Since credentials consist of multiple values, Nessie expects the stored secret to be a JSON encoded object.
Secrets are generally stored as JSON objects representing a map of strings to strings, where the map keys are defined by the type of the secret as mentioned above.
For example, a basic credential has to be stored as JSON like this, without any leading or trailing whitespaces or newlines:
{"name": "mysecret", "secret": "mypassword"}
A token with an expiration date has to be stored as JSON like this, without any leading or trailing whitespaces or newlines, where the expiresAt
attribute is only needed for tokens that expire:
{"token": "rkljmnfgoi4jfgoiujh23o4irj", "expiresAt": "2024-06-05T20:38:16Z"}
A key however is always stored as is and not encoded in JSON.
Version Store Settings¶
Version store configuration.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.version.store.type | IN_MEMORY | IN_MEMORY, ROCKSDB, DYNAMODB, DYNAMODB2, MONGODB, MONGODB2, CASSANDRA, CASSANDRA2, JDBC, JDBC2, BIGTABLE | Sets which type of version store to use by Nessie. |
nessie.version.store.events.enable | true | boolean | Sets whether events for the version-store are enabled. In order for events to be published, it’s not enough to enable them in the configuration; you also need to provide at least one implementation of Nessie’s EventListener SPI. |
Support for the database specific implementations¶
Database | Status | Configuration value for nessie.version.store.type | Notes |
---|---|---|---|
“in memory” | only for development and local testing | IN_MEMORY | Do not use for any serious use case. |
RocksDB | production, single node only | ROCKSDB | |
Google BigTable | production | BIGTABLE | |
MongoDB | production | MONGODB2 & MONGODB (deprecated) | |
Amazon DynamoDB | beta, only tested against the simulator | DYNAMODB | Not recommended for use with Nessie Catalog (Iceberg REST) due to its restrictive row-size limit. |
PostgreSQL | production | JDBC2 & JDBC (deprecated) | |
H2 | only for development and local testing | JDBC2 & JDBC (deprecated) | Do not use for any serious use case. |
MariaDB | experimental, feedback welcome | JDBC2 & JDBC (deprecated) | |
MySQL | experimental, feedback welcome | JDBC2 & JDBC (deprecated) | Works by connecting the MariaDB driver to a MySQL server. |
CockroachDB | experimental, known issues | JDBC2 & JDBC (deprecated) | Known to raise user-facing “write too old” errors under contention. |
Apache Cassandra | experimental, known issues | CASSANDRA2 & CASSANDRA (deprecated) | Known to raise user-facing errors due to Cassandra’s concept of letting the driver timeout too early, or database timeouts. |
Warn
Warn
Prefer the CASSANDRA2
version store type over the CASSANDRA
version store type, because it has way less storage overhead. The CASSANDRA
version store type is deprecated for removal, please use the Nessie Server Admin Tool to migrate from the CASSANDRA
version store type to CASSANDRA2
.
Warn
Prefer the MONGODB2
version store type over the MONGODB
version store type, because it has way less storage overhead. The MONGODB
version store type is deprecated for removal, please use the Nessie Server Admin Tool to migrate from the MONGODB
version store type to MONGODB2
.
Warn
Prefer the JDBC2
version store type over the JDBC
version store type, because it has way less storage overhead. The JDBC
version store type is deprecated for removal, please use the Nessie Server Admin Tool to migrate from the JDBC
version store type to JDBC2
.
Note
Relational databases are generally slower and tend to become a bottleneck when concurrent Nessie commits against the same branch happen. This is a general limitation of relational databases and the actual unpleasant performance penalty depends on the relational database itself, its configuration and whether and how replication is enabled.
BigTable Version Store Settings¶
When setting nessie.version.store.type=BIGTABLE
which enables Google BigTable as the version store used by the Nessie server, the following configurations are applicable.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.version.store.persist.bigtable.instance-id | nessie | string | Sets the instance-id to be used with Google BigTable. |
nessie.version.store.persist.bigtable.emulator-port | 8086 | int | When using the BigTable emulator, used to configure the port. |
nessie.version.store.persist.bigtable.enable-telemetry | true | boolean | Enables telemetry with OpenCensus. |
nessie.version.store.persist.bigtable.table-prefix | string | Prefix for tables, default is no prefix. | |
nessie.version.store.persist.bigtable.no-table-admin-client | false | boolean | |
nessie.version.store.persist.bigtable.app-profile-id | string | Sets the profile-id to be used with Google BigTable. | |
nessie.version.store.persist.bigtable.quota-project-id | string | Google BigTable quote project ID (optional). | |
nessie.version.store.persist.bigtable.endpoint | string | Google BigTable endpoint (if not default). | |
nessie.version.store.persist.bigtable.mtls-endpoint | string | Google BigTable MTLS endpoint (if not default). | |
nessie.version.store.persist.bigtable.emulator-host | string | When using the BigTable emulator, used to configure the host. | |
nessie.version.store.persist.bigtable.jwt-audience-mapping. <mapping> | string | Google BigTable JWT audience mappings (if necessary). | |
nessie.version.store.persist.bigtable.initial-retry-delay | duration | Initial retry delay. | |
nessie.version.store.persist.bigtable.max-retry-delay | duration | Max retry-delay. | |
nessie.version.store.persist.bigtable.retry-delay-multiplier | double | ||
nessie.version.store.persist.bigtable.max-attempts | int | Maximum number of attempts for each Bigtable API call (including retries). | |
nessie.version.store.persist.bigtable.initial-rpc-timeout | duration | Initial RPC timeout. | |
nessie.version.store.persist.bigtable.max-rpc-timeout | duration | ||
nessie.version.store.persist.bigtable.rpc-timeout-multiplier | double | ||
nessie.version.store.persist.bigtable.total-timeout | duration | Total timeout (including retries) for Bigtable API calls. | |
nessie.version.store.persist.bigtable.min-channel-count | int | Minimum number of gRPC channels. Refer to Google docs for details. | |
nessie.version.store.persist.bigtable.max-channel-count | int | Maximum number of gRPC channels. Refer to Google docs for details. | |
nessie.version.store.persist.bigtable.initial-channel-count | int | Initial number of gRPC channels. Refer to Google docs for details | |
nessie.version.store.persist.bigtable.min-rpcs-per-channel | int | Minimum number of RPCs per channel. Refer to Google docs for details. | |
nessie.version.store.persist.bigtable.max-rpcs-per-channel | int | Maximum number of RPCs per channel. Refer to Google docs for details. |
Related Quarkus settings:
Property | Default values | Type | Description |
---|---|---|---|
quarkus.google.cloud.project-id | String | The Google project ID, mandatory. | |
(Google authentication) | See Quarkiverse for documentation. |
Info
A complete set of Google Cloud & BigTable configuration options for Quarkus can be found on Quarkiverse.
JDBC Version Store Settings¶
Setting nessie.version.store.type=JDBC2
enables transactional/RDBMS as the version store used by the Nessie server.
Configuration of the datastore will be done by Quarkus and depends on many factors, such as the actual database to use. The property nessie.version.store.persist.jdbc.datasource
will be used to select one of the built-in datasources; currently supported values are: postgresql
(which activates the PostgresQL driver), mariadb
(which activates the MariaDB driver), and mysql
(which targets MySQL backends, but using the MariaDB driver).
For example, to configure a PostgresQL connection, the following configuration should be used:
nessie.version.store.type=JDBC2
nessie.version.store.persist.jdbc.datasource=postgresql
quarkus.datasource.postgresql.jdbc.url=jdbc:postgresql://localhost:5432/my_database
quarkus.datasource.postgresql.username=<your username>
quarkus.datasource.postgresql.password=<your password>
- Other PostgresQL-specific properties can be set using
quarkus.datasource.postgresql.*
To connect to a MariaDB database instead, the following configuration should be used:
nessie.version.store.type=JDBC2
nessie.version.store.persist.jdbc.datasource=mariadb
quarkus.datasource.mariadb.jdbc.url=jdbc:mariadb://localhost:3306/my_database
quarkus.datasource.mariadb.username=<your username>
quarkus.datasource.mariadb.password=<your password>
- Other MariaDB-specific properties can be set using
quarkus.datasource.mariadb.*
To connect to a MySQL database instead, the following configuration should be used:
nessie.version.store.type=JDBC2
nessie.version.store.persist.jdbc.datasource=mysql
quarkus.datasource.mysql.jdbc.url=jdbc:mysql://localhost:3306/my_database
quarkus.datasource.mysql.username=<your username>
quarkus.datasource.mysql.password=<your password>
- Other MySQL-specific properties can be set using
quarkus.datasource.mysql.*
To connect to an H2 in-memory database, the following configuration should be used (note that H2 is not recommended for production):
nessie.version.store.type=JDBC2
nessie.version.store.persist.jdbc.datasource=h2
Note: for MySQL, the MariaDB driver is used, as it is compatible with MySQL. You can use either jdbc:mysql
or jdbc:mariadb
as the URL prefix.
A complete set of JDBC configuration options can be found on quarkus.io.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.version.store.persist.jdbc.datasource | string | The name of the datasource to use. Must correspond to a configured datasource under quarkus.datasource.<name> . Supported values are: postgresql mariadb , mysql and h2 . If not provided, the default Quarkus datasource, defined using the quarkus.datasource.* configuration keys, will be used (the corresponding driver is PostgresQL). Note that it is recommended to define “named” JDBC datasources, see Quarkus JDBC config reference . |
RocksDB Version Store Settings¶
When setting nessie.version.store.type=ROCKSDB
which enables RocksDB as the version store used by the Nessie server, the following configurations are applicable.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.version.store.persist.rocks.database-path | /tmp/nessie-rocksdb-store | path | Sets RocksDB storage path. |
Cassandra Version Store Settings¶
When setting nessie.version.store.type=CASSANDRA
which enables Apache Cassandra as the version store used by the Nessie server, the following configurations are applicable.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.version.store.cassandra.dml-timeout | PT3S | duration | Timeout used for queries and updates. |
nessie.version.store.cassandra.ddl-timeout | PT5S | duration | Timeout used when creating tables. |
Related Quarkus settings:
Property | Default values | Type | Description |
---|---|---|---|
quarkus.cassandra.keyspace | String | The Cassandra keyspace to use. | |
quarkus.cassandra.contact-points | String | The Cassandra contact points, see Quarkus docs. | |
quarkus.cassandra.local-datacenter | String | The Cassandra local datacenter to use, see Quarkus docs. | |
quarkus.cassandra.auth.username | String | Cassandra authentication username, see Quarkus docs. | |
quarkus.cassandra.auth.password | String | Cassandra authentication password, see Quarkus docs. | |
quarkus.cassandra.health.enabled | false | boolean | See Quarkus docs. |
Info
A complete set of the Quarkus Cassandra extension configuration options can be found on quarkus.io
DynamoDB Version Store Settings¶
When setting nessie.version.store.type=DYNAMODB
which enables DynamoDB as the version store used by the Nessie server, the following configurations are applicable.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.version.store.persist.dynamodb.table-prefix | string | Prefix for tables, default is no prefix. |
Related Quarkus settings:
Property | Default values | Type | Description |
---|---|---|---|
quarkus.dynamodb.aws.region | String | Sets DynamoDB AWS region. | |
quarkus.dynamodb.aws.credentials.type | default | String | See Quarkiverse docs for possible values. Sets the credentials provider that should be used to authenticate with AWS. |
quarkus.dynamodb.endpoint-override | URI | Sets the endpoint URI with which the SDK should communicate. If not specified, an appropriate endpoint to be used for the given service and region. | |
quarkus.dynamodb.sync-client.type | url | String | Possible values are: url , apache . Sets the type of the sync HTTP client implementation |
Info
A complete set of DynamoDB configuration options for Quarkus can be found on Quarkiverse.
MongoDB Version Store Settings¶
When setting nessie.version.store.type=MONGODB2
which enables MongoDB as the version store used by the Nessie server, the following configurations are applicable in combination with nessie.version.store.type
.
Related Quarkus settings:
Property | Default values | Type | Description |
---|---|---|---|
quarkus.mongodb.database | String | Sets MongoDB database name. | |
quarkus.mongodb.connection-string | String | Sets MongoDB connection string. |
Info
A complete set of MongoDB configuration options for Quarkus can be found on quarkus.io.
In-Memory Version Store Settings¶
No special configuration options for this store type.
Version Store Advanced Settings¶
The following configurations are advanced configurations for version stores to configure how Nessie will store the data into the configured data store:
Usually, only the cache-capacity should be adjusted to the amount of the Java heap “available” for the cache. The default is conservative, bumping the cache size is recommended.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.version.store.persist.repository-id | (empty) | string | Nessie repository ID (optional) that identifies a particular Nessie storage repository. When remote (shared) database is used, multiple Nessie repositories may co-exist in the same database (and in the same schema). In that case this configuration parameter can be used to distinguish those repositories. |
nessie.version.store.persist.commit-retries | 2147483647 | int | maximum retries for CAS-like operations. Used when committing to Nessie, when the HEAD (or tip) of a branch changed during the commit, this value defines the maximum number of retries. Default means unlimited. See: #retryMaxSleepMillis() |
nessie.version.store.persist.commit-timeout-millis | 5000 | long | Timeout for CAS-like operations in milliseconds. See: #retryMaxSleepMillis() |
nessie.version.store.persist.retry-initial-sleep-millis-lower | 5 | long | When the commit logic has to retry an operation due to a concurrent, conflicting update to the database state, usually a concurrent change to a branch HEAD, this parameter defines the initial lower bound of the exponential backoff. See: #retryMaxSleepMillis() |
nessie.version.store.persist.retry-initial-sleep-millis-upper | 25 | long | When the commit logic has to retry an operation due to a concurrent, conflicting update to the database state, usually a concurrent change to a branch HEAD, this parameter defines the initial upper bound of the exponential backoff. See: #retryMaxSleepMillis() |
nessie.version.store.persist.retry-max-sleep-millis | 250 | long | When the commit logic has to retry an operation due to a concurrent, conflicting update to the database state, usually a concurrent change to a branch HEAD, this parameter defines the maximum sleep time. Each retry doubles the lower and upper bounds of the random sleep time, unless the doubled upper bound would exceed the value of this configuration property. See: #retryInitialSleepMillisUpper() |
nessie.version.store.persist.parents-per-commit | 20 | int | Number of parent-commit-hashes stored in each commit. This is used to allow bulk-fetches when accessing the commit log. |
nessie.version.store.persist.max-serialized-index-size | 204800 | int | The maximum allowed serialized size of the content index structure in a reference index segment. This value is used to determine, when elements in a reference index segment need to be split. Note: this value must be smaller than a database’s hard item/row size limit. |
nessie.version.store.persist.max-incremental-index-size | 51200 | int | The maximum allowed serialized size of the content index structure in a Nessie commit, called incremental index. This value is used to determine, when elements in an incremental index, which were kept from previous commits, need to be pushed to a new or updated reference index. Note: this value must be smaller than a database’s hard item/row size limit. |
nessie.version.store.persist.max-reference-stripes-per-commit | 50 | int | Maximum number of referenced index objects stored inside commit objects. If the external reference index for this commit consists of up to this amount of stripes, the references to the stripes will be stored inside the commit object. If there are more than this amount of stripes, an external index segment will be created instead. |
nessie.version.store.persist.assumed-wall-clock-drift-micros | 5000000 | long | Assumed wall-clock drift between multiple Nessie instances in microseconds. |
nessie.version.store.persist.ref-previous-head-count | 20 | int | Named references keep a history of up to this amount of previous HEAD pointers, and up to the configured age. |
nessie.version.store.persist.ref-previous-head-time-span-seconds | 300 | long | Named references keep a history of previous HEAD pointers with this age in seconds, and up to the configured amount. |
nessie.version.store.persist.cache-capacity-mb | int | Fixed amount of heap used to cache objects, set to 0 to disable the cache entirely. Must not be used with fractional cache sizing. See description for cache-capacity-fraction-of-heap for the default value. | |
nessie.version.store.persist.cache-enable-soft-references | true | boolean | Nessie keeps so called soft-references of the cached Java objects in addition to the serialized representation around. This toggle optionally enables this behavior. |
nessie.version.store.persist.cache-capacity-fraction-min-size-mb | int | When using fractional cache sizing, this amount in MB is the minimum cache size. | |
nessie.version.store.persist.cache-capacity-fraction-of-heap | double | Fraction of Java’s max heap size to use for cache objects, set to 0 to disable. Must not be used with fixed cache sizing. If neither this value nor a fixed size is configured, a default of .6 (60%) is assumed. | |
nessie.version.store.persist.cache-capacity-fraction-adjust-mb | int | When using fractional cache sizing, this amount in MB of the heap will always be “kept free” when calculating the cache size. | |
nessie.version.store.persist.reference-cache-ttl | duration | Defines the duration how long references shall be kept in the cache. Defaults to not cache references. If reference caching is enabled, it is highly recommended to also enable negative reference caching. It is safe to enable this for single node Nessie deployments. Recommended value is currently PT5M for distributed and high values like PT1H for single node Nessie deployments. This feature is experimental except for single Nessie node deployments! If in doubt, leave this un-configured! | |
nessie.version.store.persist.reference-cache-negative-ttl | duration | Defines the duration how long sentinels for non-existing references shall be kept in the cache (negative reference caching). Defaults to reference-cache-ttl . Has no effect, if reference-cache-ttl is not configured. Default is not enabled. If reference caching is enabled, it is highly recommended to also enable negative reference caching. It is safe to enable this for single node Nessie deployments. Recommended value is currently PT5M for distributed and high values like PT1H for single node Nessie deployments. This feature is experimental except for single Nessie node deployments! If in doubt, leave this un-configured! | |
nessie.version.store.persist.cache-invalidations.service-names | list of string | Host names or IP addresses or kubernetes headless-service name of all Nessie server instances accessing the same repository. This value is automatically configured via the Nessie Helm chart or the Kubernetes operator (not released yet), you don’t need any additional configuration for distributed cache invalidations - it’s setup and configured automatically. If you have your own Helm chart or custom deployment, make sure to configure the IPs of all Nessie instances here. Names that start with an equal sign are not resolved but used “as is”. | |
nessie.version.store.persist.cache-invalidations.valid-tokens | list of string | List of cache-invalidation tokens to authenticate incoming cache-invalidation messages. | |
nessie.version.store.persist.cache-invalidations.uri | /nessie-management/cache-coherency | string | URI of the cache-invalidation endpoint, only available on the Quarkus management port, defaults to 9000. |
nessie.version.store.persist.cache-invalidations.service-name-lookup-interval | PT10S | duration | Interval of service-name lookups to resolve the service names (#cacheInvalidationServiceNames()) into IP addresses. |
nessie.version.store.persist.cache-invalidations.batch-size | 20 | int | |
nessie.version.store.persist.cache-invalidations.request-timeout | duration |
Authentication settings¶
Configuration for Nessie authentication settings.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.server.authentication.enabled | false | boolean | Enable Nessie authentication. |
Related Quarkus settings:
Property | Default values | Type | Description |
---|---|---|---|
quarkus.oidc.auth-server-url | String | Sets the base URL of the OpenID Connect (OIDC) server if nessie.server.authentication.enabled=true | |
quarkus.oidc.client-id | String | Sets client-id of the application if nessie.server.authentication.enabled=true . Each application has a client-id that is used to identify the application. |
Authorization settings¶
Configuration for Nessie authorization settings.
Property | Default Value | Type | Description |
---|---|---|---|
nessie.server.authorization.enabled | false | boolean | Enable Nessie authorization. |
nessie.server.authorization.type | CEL | string | Sets the authorizer type to use. |
nessie.server.authorization.rules. <name> | string | CEL authorization rules where the key represents the rule id and the value the CEL expression. |
Metrics¶
Metrics are published using Micrometer; they are available from Nessie’s management interface (port 9000 by default) under the path /q/metrics
. For example, if the server is running on localhost, the metrics can be accessed via http://localhost:9000/q/metrics.
Metrics can be scraped by Prometheus or any compatible metrics scraping server. See: Prometheus for more information.
Additional tags can be added to the metrics by setting the nessie.metrics.tags.*
property. Each tag is a key-value pair, where the key is the tag name and the value is the tag value. For example, to add a tag environment=prod
to all metrics, set nessie.metrics.tags.environment=prod
. Many tags can be added, such as below:
nessie.metrics.tags.service=nessie
nessie.metrics.tags.environment=prod
nessie.metrics.tags.region=us-west-2
Note that by default Nessie adds one tag: application=Nessie
. You can override this tag by setting the nessie.metrics.tags.application=<new-value>
property.
A standard Grafana dashboard is available in the grafana
directory of the Nessie repository [here] (https://github.com/projectnessie/nessie/blob/main/grafana/nessie.json). You can use this dashboard to visualize the metrics scraped by Prometheus. Note that this dashboard is a starting point and may need to be customized to fit your specific needs.
This Grafana dashboard expects the metrics to have a few tags defined: service
and instance
. The instance
tag is generally added by Prometheus automatically, but the service
tag needs to be added manually. You can configure Nessie to add this tag to all metrics by setting the below property:
nessie.metrics.tags.service=<service-name>
Alternatively, you can modify the dashboard to remove unnecessary tags, or configure Prometheus to add the missing ones. Here is an example configuration showing how to have the service
tag added by Prometheus:
scrape_configs:
- job_name: 'nessie'
metrics_path: /q/metrics
static_configs:
- targets: ['nessie:9000']
labels:
service: nessie
Traces¶
Since Nessie 0.46.0, traces are published using OpenTelemetry. See Using OpenTelemetry in the Quarkus documentation.
In order for the server to enable OpenTelemetry and publish its traces, the quarkus.otel.exporter.otlp.traces.endpoint
property must be defined. Its value must be a valid collector endpoint URL, with either http://
or https://
scheme. The collector must talk the OpenTelemetry protocol (OTLP) and the port must be its gRPC port (by default 4317), e.g. “http://otlp-collector:4317”. If this property is not set, the server will not publish traces.
Alternatively, it’s possible to forcibly disable OpenTelemetry at runtime by setting the following property: quarkus.otel.sdk.disabled=true
.
Troubleshooting traces¶
If the server is unable to publish traces, check first for a log warning message like the following:
SEVERE [io.ope.exp.int.grp.OkHttpGrpcExporter] (OkHttp http://localhost:4317/...) Failed to export spans.
The request could not be executed. Full error message: Failed to connect to localhost/0:0:0:0:0:0:0:1:4317
This means that the server is unable to connect to the collector. Check that the collector is running and that the URL is correct.
Swagger UI¶
The Swagger UI allows for testing the REST API and reading the API docs. It is available at SwaggerHub.
Docker image options¶
By default, Nessie listens on port 19120. To expose that port on the host, use -p 19120:19120
. To expose that port on a different port on the host system, use the -p
option and map the internal port to some port on the host. For example, to expose Nessie on port 8080 of the host system, use the following command:
docker run -p 8080:19120 ghcr.io/projectnessie/nessie
Then you can browse Nessie’s UI on the host by pointing your browser to http://localhost:8080.
Note: this doesn’t change the port Nessie listens on, it only changes the port on the host system that is mapped to the port Nessie listens on. Nessie still listens on port 19120 inside the container. If you want to change the port Nessie listens on, you can use the QUARKUS_HTTP_PORT
environment variable. For example, to make Nessie listen on port 8080 inside the container, and expose it to the host system also on 8080, use the following command:
docker run -p 8080:8080 -e QUARKUS_HTTP_PORT=8080 ghcr.io/projectnessie/nessie
Nessie Docker image types¶
Nessie publishes a Java based multiplatform (for amd64, arm64, ppc64le, s390x) image running on OpenJDK 17.
Advanced Docker image tuning (Java images only)¶
There are many environment variables available to configure the Docker image. If in doubt, leave everything at its default. You can configure the behavior using the environment variables listed below, which come from the base image used by Nessie, ubi9/openjdk-21-runtime.
Examples¶
Example | docker run option |
---|---|
Using another GC | -e GC_CONTAINER_OPTIONS="-XX:+UseShenandoahGC" lets Nessie use Shenandoah GC instead of the default parallel GC. |
Set the Java heap size to a fixed amount | -e JAVA_OPTS_APPEND="-Xms8g -Xmx8g" lets Nessie use a Java heap of 8g. |
Reference¶
Environment variable | Description |
---|---|
JAVA_OPTS or JAVA_OPTIONS | NOT RECOMMENDED. JVM options passed to the java command (example: “-verbose:class”). Setting this variable will override all options set by any of the other variables in this table. To pass extra settings, use JAVA_OPTS_APPEND instead. |
JAVA_OPTS_APPEND | User specified Java options to be appended to generated options in JAVA_OPTS (example: “-Dsome.property=foo”). |
JAVA_TOOL_OPTIONS | This variable is defined and honored by all OpenJDK distros, see here. Options defined here take precedence over all else; using this variable is generally not necessary, but can be useful e.g. to enforce JVM startup parameters, to set up remote debug, or to define JVM agents. |
JAVA_MAX_MEM_RATIO | Is used to calculate a default maximal heap memory based on a containers restriction. If used in a container without any memory constraints for the container then this option has no effect. If there is a memory constraint then -XX:MaxRAMPercentage is set to a ratio of the container available memory as set here. The default is 80 which means 80% of the available memory is used as an upper boundary. You can skip this mechanism by setting this value to 0 in which case no -XX:MaxRAMPercentage option is added. |
JAVA_DEBUG | If set remote debugging will be switched on. Disabled by default (example: true”). |
JAVA_DEBUG_PORT | Port used for remote debugging. Defaults to “5005” (tip: use “*:5005” to enable debugging on all network interfaces). |
GC_MIN_HEAP_FREE_RATIO | Minimum percentage of heap free after GC to avoid expansion. Default is 10. |
GC_MAX_HEAP_FREE_RATIO | Maximum percentage of heap free after GC to avoid shrinking. Default is 20. |
GC_TIME_RATIO | Specifies the ratio of the time spent outside the garbage collection. Default is 4. |
GC_ADAPTIVE_SIZE_POLICY_WEIGHT | The weighting given to the current GC time versus previous GC times. Default is 90. |
GC_METASPACE_SIZE | The initial metaspace size. There is no default (example: “20”). |
GC_MAX_METASPACE_SIZE | The maximum metaspace size. There is no default (example: “100”). |
GC_CONTAINER_OPTIONS | Specify Java GC to use. The value of this variable should contain the necessary JRE command-line options to specify the required GC, which will override the default of -XX:+UseParallelGC (example: -XX:+UseG1GC ). |
Troubleshooting configuration issues¶
If you encounter issues with the configuration, you can ask Nessie to print out the configuration it is using. To do this, set the log level for the io.smallrye.config
category to DEBUG
, and also set the console appender level to DEBUG
:
quarkus.log.console.level=DEBUG
quarkus.log.category."io.smallrye.config".level=DEBUG
Warn
This will print out all configuration values, including sensitive ones like passwords. Don’t do this in production, and don’t share this output with anyone you don’t trust!