Skip to content

Nessie Spark SQL Extensions

Spark SQL extensions provide an easy way to execute common Nessie commands via SQL.

How to use them

Spark 3.5

In order to be able to use Nessie’s custom Spark SQL extensions with Spark 3.5.x, one needs to configure org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.0 along with org.projectnessie.nessie-integrations:nessie-spark-extensions-3.5_2.12:0.79.0

Here’s an example of how this is done when starting the spark-sql shell:

bin/spark-sql 
  --packages "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.0,org.projectnessie.nessie-integrations:nessie-spark-extensions-3.5_2.12:0.79.0"
  --conf spark.sql.extensions="org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions"
  --conf <other settings>

Spark 3.4

In order to be able to use Nessie’s custom Spark SQL extensions with Spark 3.4.x, one needs to configure org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.5.0 along with org.projectnessie.nessie-integrations:nessie-spark-extensions-3.4_2.12:0.79.0

Here’s an example of how this is done when starting the spark-sql shell:

bin/spark-sql 
  --packages "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.5.0,org.projectnessie.nessie-integrations:nessie-spark-extensions-3.4_2.12:0.79.0"
  --conf spark.sql.extensions="org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions"
  --conf <other settings>

Spark 3.3

In order to be able to use Nessie’s custom Spark SQL extensions with Spark 3.3.x, one needs to configure org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.5.0 along with org.projectnessie.nessie-integrations:nessie-spark-extensions-3.3_2.12:0.79.0

Here’s an example of how this is done when starting the spark-sql shell:

bin/spark-sql 
  --packages "org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.5.0,org.projectnessie.nessie-integrations:nessie-spark-extensions-3.3_2.12:0.79.0"
  --conf spark.sql.extensions="org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions"
  --conf <other settings>

Additional configuration details can be found in the Spark via Iceberg docs.

Grammar

The current grammar is shown below:

: CREATE (BRANCH|TAG) (IF NOT EXISTS)? reference=identifier (IN catalog=identifier)? (FROM fromRef=identifier)?
| DROP (BRANCH|TAG) (IF EXISTS)? identifier (IN catalog=identifier)?
| USE REFERENCE reference=identifier (AT tsOrHash=identifier)?  (IN catalog=identifier)?
| LIST REFERENCES (IN catalog=identifier)?
| SHOW REFERENCE (IN catalog=identifier)?
| MERGE BRANCH (reference=identifier)? (INTO toRef=identifier)?  (IN catalog=identifier)?
| SHOW LOG (reference=identifier)? (IN catalog=identifier)?
| ASSIGN (BRANCH|TAG) (reference=identifier)? (TO toRef=identifier)? (IN catalog=identifier)?

Creating Branches/Tags

Creating a branch dev in the nessie catalog (in case it doesn’t already exist):

  • CREATE BRANCH IF NOT EXISTS dev IN nessie

Creating a tag devTag in the nessie catalog (in case it doesn’t already exist):

  • CREATE TAG IF NOT EXISTS devTag IN nessie

Creating a branch dev in the nessie catalog off of an existing branch/tag base:

  • CREATE BRANCH IF NOT EXISTS dev IN nessie FROM base

Note that in case base doesn’t exist, Nessie will fall back to the default branch (main).

Dropping Branches/Tags

Dropping a branch dev in the nessie catalog (in case it exists):

  • DROP BRANCH IF EXISTS dev IN nessie

Dropping a tag devTag in the nessie catalog (in case it exists):

  • DROP TAG IF EXISTS devTag IN nessie

Note: the IF EXISTS clause is optional and is only supported for Nessie SQL Extensions 0.72 or higher.

Switching to a Branch/Tag

In order to switch to the HEAD of the branch/tag ref in the nessie catalog:

  • USE REFERENCE ref IN nessie

It is also possible to switch to a specific timestamp on a given branch/tag:

  • USE REFERENCE ref AT `2021-10-06T08:50:37.157602` IN nessie

Additionally, one can switch to a specific hash on a given branch/tag:

  • USE REFERENCE ref AT dd8d46a3dd5478ce69749a5455dba29d74f6d1171188f4c21d0e15ff4a0a9a9b IN nessie

Listing available Branches/Tags

One can list available branches/tags in the nessie catalog via:

  • LIST REFERENCES IN nessie

Showing details of the current Branch/Tag

One can see details about the current branch/tag in the nessie catalog via:

  • SHOW REFERENCE IN nessie

Showing the Commit Log of the current Branch/Tag

It is possible to look at the commit log of a particular branch/tag in the nessie catalog via:

  • SHOW LOG dev IN nessie

Assigning Branches/Tags

Assigning a branch dev to base in catalog nessie can be done via:

  • ASSIGN BRANCH dev TO base IN nessie

Assigning a tag devTag to base in catalog nessie can be done via:

  • ASSIGN TAG devTag TO base IN nessie

Note that in case base doesn’t exist, Nessie will fall back to the default branch (main).

It is also possible to assign a branch/tag to a base at a particular hash:

  • ASSIGN TAG devTag TO base AT dd8d46a3dd5478ce69749a5455dba29d74f6d1171188f4c21d0e15ff4a0a9a9b IN nessie

Merging a Branch into another Branch

Merging branch dev into base for the nessie catalog can be done via:

  • MERGE BRANCH dev INTO base IN nessie

Note that in case base doesn’t exist, Nessie will fall back to the default branch (main).