Skip to content

Nessie Spark SQL Extension Reference

See also the Nessie Spark SQL Extensions main page.

Nessie SQL commands reference

The following syntax descriptions illustrate how commands are used and the order of the clauses.

The commands provided by the Nessie Spark SQL are actually a subset of the commands that are available in the Nessie CLI. Nessie Spark SQL commands however have the IN <catalog-name> clause, which is not needed in the Nessie CLI.

Info

CODE style means the term is a keyword.

**BoldTerms** mean variable input, see <u>[Descripton of Command Parts below](#command-parts)</u>

Square brackets `[` `]` mean that the contents are optional (0 or 1 occurrence).

Curly brackets `{` `}` mean that the contents can be repeated 0 or more times.

CREATE BANCH / TAG

CREATE ReferenceType
[ IF NOT EXISTS ]
ReferenceName
[ IN CatalogName ]
[ FROM ExistingReference ]
[ AT [ TIMESTAMP | COMMIT ] TimestampOrCommit ]

Creates a new Nessie branch or tag using the name specified using the ReferenceName parameter. The reference type is specified using the ReferenceType parameter.

By default, the new branch or tag is created from the latest commit on the current reference of the Nessie CLI (see USE statement). Another source reference name can use specified using the FROM clause. The optional AT clause allows specifying a different commit ID (hash) to create the new reference from.

This command will fail, if a references with the name ReferenceName already exists, unless the optional IF NOT EXISTS is specified.

DROP BRANCH / TAG

DROP ReferenceType
[ IF EXISTS ]
ExistingReference
[ IN CatalogName ]

Drops a new Nessie branch or tag using the name specified using the ReferenceName parameter. The reference type is specified using the ReferenceType parameter.

This command will fail, if a references with the name ReferenceName does not exist, unless the optional IF EXISTS is specified.

ASSIGN BRANCH / TAG

ASSIGN ReferenceType
[ ExistingReference ]
[ TO ExistingReference [ AT [ TIMESTAMP | COMMIT ] TimestampOrCommit ] ]
[ IN CatalogName ]

Assigns a Nessie branch or tag using the name specified using the ReferenceName parameter to another commit. The reference type is specified using the ReferenceType parameter.

By default, the branch or tag is updated to the latest commit on the current reference of the Nessie CLI (see USE statement). Another target reference name can use specified using the TO clause. The optional AT clause allows specifying a different commit ID (hash) to assign the reference to.

LIST REFERENCES

LIST REFERENCES
[ FILTER Value
| [ STARTING WITH Value ] [ CONTAINING Value ]
]
[ IN CatalogName ]

Lists all named references.

An optional CEL-filter can be specified, which is evaluated on the server side.

The optional STARTING WITH clause starts the output at the content-key with the given value.

The optional CONTAINING clause only outputs entities with a content-key that contain the given value.

MERGE BRANCH

MERGE
[ DRY ]
[ ReferenceType ]
ExistingReference
[ AT [ TIMESTAMP | COMMIT ] TimestampOrCommit ]
[ INTO ExistingReference ]
[ BEHAVIOR MergeBehaviorKind ]
[ BEHAVIORS ContentKey = MergeBehaviorKind { AND ContentKey = MergeBehaviorKind } ]
[ IN CatalogName ]

Merges a branch or tag into another branch, supporting manual conflict resolution.

The optional DRY keyword defines that Nessie shall simulate a merge operation. This is useful to check whether a merge operation would succeed.

Specifying the name of the “from” reference is mandatory. By default, the latest commit of the “from” branch or tag will be merged, which can be overridden using the AT clause.

By default, MERGE uses the CLI’s current reference as the target branch. The INTO clause can be used to specify another target branch.

Nessie merge operations currently support three different merge behaviors:

  • NORMAL: a merge succeeds, if the content does not have a conflicting change in the target branch.
  • FORCE: a merge always succeeds, the content from the “from” reference will be applied onto the target branch.
  • DROP: like NORMAL, but does not cause a conflict, so does not fail the whole merge operation.

The merge behavior for all contents defaults to NORMAL and can be changed using the BEHAVIOR clause.

Specific merge behaviors can be specified using the BEHAVIORS clause for individual content keys.

SHOW LOG

SHOW LOG
[ [ ON [ ReferenceType ] ] ExistingReference ]
[ AT [ TIMESTAMP | COMMIT ] TimestampOrCommit ]
[ LIMIT PositiveInt ]
[ IN CatalogName ]

Shows the Nessie commit log.

By default, the commit log fetched for the current reference of the Nessie CLI, or in the branch or tab specified using the IN clause. By default entities on the latest commit of the branch or tag will be listed, which can be overridden using the AT clause.

The output can be limited using the LIMIT clause. It is safe to omit the LIMIT clause for ANSI terminals, because the commit log will be safely paged with neither overloading the Nessie CLI or Nessie server.

SHOW REFERENCE

SHOW REFERENCE
[ ExistingReference ]
[ AT [ TIMESTAMP | COMMIT ] TimestampOrCommit ]
[ IN CatalogName ]

Shows information about the current or given reference.

If no reference is specified, information about the current reference of the Nessie CLI is shown, otherwise information about the given reference. By default, entities information of latest commit of the branch or tag will be shown, which can be overridden using the AT clause.

Command parts

CatalogName

Spark catalog name.

ReferenceType

BRANCH | TAG

ExistingReference

Name of an existing reference in Nessie.

ReferenceName

Nessie reference name.

TimestampOrCommit

Either a Nessie commit ID (hash) or a timestamp in ISO format. Examples:

  • 2024-04-26T10:31:05.277650575Z is a valid ISO timestamp
  • fa32a50d5303a53826f65649277561f5c6772eba019e7f1e01a359becb764877 is a valid Nessie commit ID (hash)