Skip to content

Nessie Specification

This page documents the complete Nessie specification. This includes:

  • API and its constraints
  • Contract for value objects

API contract

The Nessie API is used by Nessie integrations within for example Apache Iceberg or Delta Lake and user facing applications like Web UIs.

Nessie defines a REST API (OpenAPI) and implementations for Java and Python.

Content managed by Nessie

General Contract

Content Objects describe the state of a data lake object like a table or view. Nessie currently provides types for Iceberg tables, Delta Lake tables and SQL views. Nessie uses two identifiers for a single Content object:

  1. The Content Id is used to identify a content object across all branches even if the content object is being referred to using different table or view names.
  2. The Content Key is used to look up a content object by name, like a table name or view name. The Content Key changes when the associated table or view is renamed.

Content Key

The Content Key consists of multiple strings and is used to resolve a symbolic name, like a table name or a view name used in SQL statements, to a Content object.

When a table or view is renamed using for example an SQL ALTER TABLE RENAME operation, Nessie will record this operation using a remove operation on the old key plus a put operation on the new key (see below).

On Reference State vs Global State

Nessie is designed to support multiple table formats like Apache Iceberg or Delta Lake or generic SQL views. Since different Nessie commits, think: on different branches in Nessie, can refer to the same physical table but with different state of the data and potentially different schema, some table formats like Apache Iceberg require Nessie to refer to a single Global State, in case of Iceberg the table metadata. This Global State is not versioned in Nessie, because it has to contain enough information to resolve all information in all Nessie commits.

Note

The term all information in all Nessie commits used above precisely means all information in all Nessie commits that are considered “live”, have not been garbage-collected by Nessie. See also Management Services.

Content Id

All contents object must have an id field. This field is unique to the object and immutable once created. By convention, it is a UUID though this is not enforced by this Specification. There are several expectations on this field:

  1. Content Ids are immutable. Once created the object will keep the same id for its entire lifetime.
  2. If the object is moved (e.g. stored under a different Key) it will keep the id.
  3. The same content object, i.e. the same content-id, can be referred to using different keys on different branches.

There is no API to look up an object by id and the intention of an id is not to serve in that capacity. An example usage of the id field might be storing auxiliary data on an object in a local cache and using id to look up that auxiliary data.

Note

A note about caching: The Content objects or the values of the referred information (e.g. schema, partitions etc.) might be cached locally by services using Nessie.

For content types that do not track Global State, the hash of the contents object does uniquely reference an object in the Nessie history and is a suitable key to identify an object at a particular point in its history.

Evolution of the Global State is performed in a way that keeps old contents resp. contents on different branches (and tags) available. This is the case for Apache Iceberg.

Content types that do track Global State, the Content Id must be included in the cache key.

For simplicity, it is recommeded to always include the Content Id.

Since the Content object is immutable, the hash is stable and since it is disconnected from Nessie’s version store properties it exists across commits/branches and survives GC and other table maintenance operations.

The commit hash on the other hand makes a poor cache key because multiple commits can refer to the same state of a Content object, e.g. a merge or transplant will change the commit hash but not the state of the Content object.

Content Types

Nessie is designed to support various table formats, and currently supports the following types. See also Tables & Views.

Iceberg Table

Apache Iceberg describes any table using the so called table metadata, see Iceberg Table Spec. Each Iceberg operation that modifies data, for example an append or rewrite operation or more generally each Iceberg transaction, creates a new Iceberg snapshot. Any Nessie commit refers to a particular Iceberg snapshot for an Iceberg table, which translates to the state of an Iceberg table for a particular Nessie commit.

Nessie needs to track Iceberg’s table metadata as so called Global State within Nessie to ensure that schema evolution works as expected.

The Nessie IcebergTable object passed to Nessie in a Put operation therefore consists of

  1. the pointer to the Iceberg table metadata and
  2. the ID of the Iceberg snapshot within the Iceberg table metadata.

The pointer to the Iceberg table is recorded as Global State and the ID of the Iceberg snapshot is recorded within the Put operation in a Nessie commit.

Note

This model puts a strong restriction on the Iceberg table. All metadata JSON documents must be stored and none of the built-in iceberg maintenance procedures can be used. There are potentially serious issues regarding schema migrations in this model as well. Therefore, the Iceberg table spec should be considered subject to change in the near future.

Delta Lake Table

The state of a Delta Lake Table is represented using the Delta Lake Table attributes metadataLocationHistory, checkpointLocationHistory and lastCheckpoint.

Delta Lake Tables are tracked without a Global State in Nessie, i.e. those three attributes are recorded within the Put Operation of a Nessie commit.

View

The state of an SQL view is represented using the attributes sqlText and dialect (currently one of HIVE, SPARK, DREMIO, PRESTO).

SQL views are tracked without a Global State in Nessie, i.e. those three attributes are recorded within the Put Operation of a Nessie commit.

Operations in a Nessie commit

Each Nessie commit carries one or more operations. Each operation contains the Content Key and comes in one of the following variations.

Put operation

A Put operation modifies the state of the included Content object. It must contain the Content object and, if the Content type tracks Global State, also the expected contents. The expected contents attribute can be omitted, if the Content object refers to a new Content Id, e.g. a newly created table or view. See also Conflict Resolution.

A Nessie Put operation is created for everything that modifies a table or a view, either its definition (think: SQL DDL) or data (think: SQL DML).

Delete operation

A Delete operation does not carry any Content object and is used to indicate that a Content object is no longer referenced using the Content Key of the Delete operation.

Example for a Nessie Delete operation is an SQL DROP TABLE. An ALTER TABLE RENAME is mapped to a Delete operation plus a Put operation.

Unmodified operation

An Unmodified operation does not represent any change of the data, but can be included in a Nessie commit operation to enforce strict serializable transactions. The presence of an Unmodified operation means that the Content object referred to via the operation’s Content Key must not have been modified since the Nessie commit’s expectedHash.

Version Store

See Commit Kernel for details.

Conflict Resolution

The API passes an expectedHash parameter with a Nessie commit operation. This is the commit that the client thinks is the most up to date (its HEAD). The Nessie backend will check to see if the key has been modified since that expectedHash and if so, it will reject the requested modification with a NessieConflictException. This is basically an optimistic lock that accounts for the fact that the commit hash is global and nessie branch could have moved on from expectedHash without modifying the key in question.

For content tables that require Global State, a Nessie Put operation should pass the so called expected state, which will be used to compare the latest recorded Global State of a content object with the Global State in the expected state in the Put operation. If both values differ, Nessie will reject the operation with a NessieConflictException.

The reason for these conditions is to behave like a ‘real’ database. You shouldn’t have to update your reference before transacting on table A because it just happened to update table B whilst you were preparing your transaction.