Nessie on Kubernetes¶
The easiest and recommended way to get started with Nessie on Kubernetes is to use the Helm chart described below.
Note
We are also working on a Kubernetes Operator for Nessie, but it is not available yet. If you are interested in deploying Nessie via an operator, please get in touch.
Note
See separate page about how to configure Nessie for using a reverse proxy like istio or nginx.
For more information on Helm and Helm charts, see the Helm docs.
Installing the Helm chart¶
Add the Nessie Helm repo:
helm repo add nessie-helm https://charts.projectnessie.org
helm repo update
Install the Helm chart in the nessie-ns
namespace (create the namespace first if it doesn’t exist), and name the release nessie
:
helm install -n nessie-ns nessie nessie-helm/nessie
Additional docs (incl. all configuration settings) can be found in the Nessie Helm chart docs hosted in Nessie’s GitHub repository.
Customizing the Helm chart¶
For example, to install the Helm chart with a predefined image, simply do this:
helm install -n nessie-ns nessie nessie-helm/nessie \
--set image.repository=ghcr.io/projectnessie/nessie \
--set image.tag=0.101.3
It’s also useful to create more than one replica of the Nessie server. To do this, simply set the replicaCount
value:
helm install -n nessie-ns nessie nessie-helm/nessie --set replicaCount=3
Configuring memory and CPU¶
By default, the Helm chart does not set any resource limits. It is generally recommended though to set memory requests and limits (usually to the same value), as well as CPU requests.
There are no one-size-fits-all values for these settings, so you should adjust them according to your needs. A good starting point for a production deployment is to set the memory request and limit to 8Gi or higher, and the CPU request to a minimum of 4.
For example, to set the memory request and limit to 8Gi, and the CPU request to 4 cores, you can do this:
helm install -n nessie-ns nessie nessie-helm/nessie \
--set-string resources.requests.memory=8Gi \
--set-string resources.limits.memory=8Gi \
--set-string resources.requests.cpu=4
Regarding memory: when a limit is set with resources.limits.memory
, Nessie will by default use 80% of that limit as the maximum heap size. For example, if the limit is 8Gi, then the maximum effective heap size will be 6.4Gi. If you want to change this, you can set the JAVA_MAX_MEM_RATIO
environment variable to a different value. For example, to set the maximum heap size to 50% of the memory limit, you can do this:
helm install -n nessie-ns nessie nessie-helm/nessie \
--set-string 'extraEnv[0].name=JAVA_MAX_MEM_RATIO' \
--set-string 'extraEnv[0].value=50'
You can also set JAVA_MAX_MEM_RATIO
to 0
, in which case the maximum heap size will not be constrained and will be only bounded by the container’s memory limit size itself.
See the server configuration page for more details about memory settings.
Configuring database authentication¶
Nessie supports a variety of version stores, each of which requires different configuration. For example, the JDBC version store requires a JDBC URL, username and password, while the DynamoDB version store requires AWS credentials.
All database authentication options must be provided as Kubernetes secrets, and these must be created before installing the Helm chart.
Providing secrets for JDBC datastores¶
- Make sure you have a Secret in the following form (assuming PostgreSQL, but the same applies to other JDBC datastores):
> cat $PWD/postgres-creds
postgres_username=YOUR_USERNAME
postgres_password=YOUR_PASSWORD
- Create the secret from the given file:
kubectl create secret generic postgres-creds --from-env-file="$PWD/postgres-creds"
- The
postgres-creds
secret will now be picked up when you useJDBC
as the version store when installing Nessie (see below).
Providing secrets for MongoDB¶
- Providing secrets for MongoDB is strongly recommended, but not enforced.
- Make sure you have a Secret in the following form:
> cat $PWD/mongodb-creds
mongodb_username=YOUR_USERNAME
mongodb_password=YOUR_PASSWORD
- Create the secret from the given file:
kubectl create secret generic mongodb-creds --from-env-file="$PWD/mongodb-creds"
- The
mongodb-creds
secret will now be picked up when you useMONGODB
as the version store when installing Nessie (see below).
Providing secrets for Cassandra¶
- Providing secrets for Cassandra is strongly recommended, but not enforced.
- Make sure you have a Secret in the following form:
> cat $PWD/cassandra-creds
cassandra_username=YOUR_USERNAME
cassandra_password=YOUR_PASSWORD
- Create the secret from the given file:
kubectl create secret generic cassandra-creds --from-env-file="$PWD/cassandra-creds"
- The
cassandra-creds
secret will now be picked up when you useCASSANDRA
as the version store when installing Nessie (see below).
Providing secrets for DynamoDB¶
- Make sure you have a Secret in the following form:
> cat $PWD/awscreds
aws_access_key_id=YOURACCESSKEYDATA
aws_secret_access_key=YOURSECRETKEYDATA
- Create the secret from the given file:
kubectl create secret generic awscreds --from-env-file="$PWD/awscreds"
- The
awscreds
secret will now be picked up when you useDYNAMODB
as the version store when installing Nessie (see below).
Providing secrets for Bigtable¶
A secret is not required for Bigtable. If one is present, it is assumed that authentication will use a service account JSON key. See this page for details on how to create a service account key.
If no secret is used, then Workload Identity usage is assumed instead; in this case, make sure that the pod’s service account has been granted access to BigTable. See this page for details on how to create a suitable service account.
Important: when using Workload Identity, unless the cluster is in Autopilot mode, it is also required to add the following nodeSelector
label:
iam.gke.io/gke-metadata-server-enabled: "true"
This is not done automatically by the chart because this selector would be invalid for Autopilot clusters.
- Make sure you have a Secret in the following form:
> cat $PWD/bigtable-creds
sa_json=YOUR_SA_JSON_KEY
- Create the secret from the given file:
kubectl create secret generic bigtable-creds --from-env-file="$PWD/bigtable-creds"
- The
bigtable-creds
secret will now be picked up when you useBIGTABLE
as the version store when installing Nessie (see below).
Configuring the version store¶
Configuring JDBC version stores¶
Note
When setting up your SQL backend, both the database (sometimes called catalog) and the schema (sometimes called namespace, for backends that distinguish between database and schema) must be created beforehand, as the Helm chart will not create them for you. Check your database documentation for more information, especially around the CREATE DATABASE
and CREATE SCHEMA
commands. You must also create a user with the necessary permissions to access the database and schema.
Let’s assume that we want to use a PostgreSQL service, that the database is called nessiedb
and the schema nessie
. The PostgreSQL service is running at postgres:5432
in the same namespace.
Next, we need to configure the Helm chart to use the JDBC
version store type and to pull the database credentials from the secret that was created previously. We can do this by creating a values.yaml
file with the following content:
versionStoreType: JDBC
jdbc:
jdbcUrl: jdbc:postgresql://postgres:5432/nessiedb?currentSchema=nessie
secret:
name: postgres-creds
username: postgres_username
password: postgres_password
Let’s now assume that we are using MariaDB or MySQL instead of PostgreSQL. These backends do not support schemas, thus only the database name needs to be provided. MariaDB and MySQL share the same JDBC driver (the MariaDB one), so the JDBC URL is roughly the same for both; a minimal JDBC URL for these backends would look like this:
For MariaDB:
jdbcUrl: jdbc:mariadb://mariadb:3306/nessiedb
For MySQL:
jdbcUrl: jdbc:mysql://mysql:3306/nessiedb
In the above examples, mariadb
and mysql
are the service names of the MariaDB and MySQL services, respectively. The database name is nessiedb
.
Note
The exact format of the JDBC URL may vary depending on the database you are using. Also, JDBC drivers usually support various optional connection properties. Check the documentation of your database and its JDBC driver for more information (for PostgreSQL, check out this page and for MariaDB, check out this one).
Note
While the database and the schema must be created beforehand, the required tables can be created automatically by Nessie if they don’t exist, in the target database and schema. If they do exist, they will be used as-is. You must ensure that their structure is up-to-date with the version of Nessie that you are using. Check the Nessie release notes for more information on schema upgrades.
Then, we can install the Helm chart with the following values:
helm install -n nessie-ns nessie nessie-helm/nessie -f values.yaml
Configuring MongoDB version stores¶
Let’s assume that we want to use a MongoDB database. The database must be created beforehand, as the Helm chart will not create it for you. Let’s assume that the database is called nessie
. The MongoDB service is running at mongodb:27017
in the same namespace.
Then, we need to configure the Helm chart to use the MONGODB
version store type and to pull the database credentials from the secret that was created previously. We can do this by creating a values.yaml
file with the following content:
versionStoreType: MONGODB
mongodb:
database: nessie
connectionString: mongodb://mongodb:27017
secret:
name: mongodb-creds
username: mongodb_username
password: mongodb_password
Configuring DynamoDB version stores¶
Let’s assume that we want to use a DynamoDB database in the us-west-2
region. The tables will be created automatically by Nessie if they don’t exist.
Then, we need to configure the Helm chart to use the DYNAMODB
version store type and to pull the AWS credentials from the secret that was created previously. We can do this by creating a values.yaml
file with the following content:
versionStoreType: DYNAMODB
dynamodb:
region: us-west-2
secret:
name: awscreds
awsAccessKeyId: aws_access_key_id
awsSecretAccessKey: aws_secret_access_key
Configuring Bigtable version stores¶
Let’s assume that we want to use a Bigtable instance named nessie-bigtable
in the prod-us
project, using the default profile id. The tables will be created automatically by Nessie if they don’t exist, but the instance must be created and configured beforehand.
Then, we need to configure the Helm chart to use the BIGTABLE
version store type and to pull the Bigtable credentials from the secret that was created previously. We can do this by creating a values.yaml
file with the following content:
versionStoreType: BIGTABLE
bigtable:
projectId: prod-us
instanceId: nessie-bigtable
appProfileId: default
The above will use Workload Identity. If you are using instead a service account JSON key as described above, you can also specify it in the values.yaml
file:
versionStoreType: BIGTABLE
bigtable:
projectId: prod-us
instanceId: nessie-bigtable
appProfileId: default
secret:
name: bigtable-creds
key: sa_json
Configuring other datasource types¶
Other datasource types are supported, and most of them have mandatory and optional configuration options. Again, check the Nessie Helm chart docs for more information.
Uninstalling the Helm chart¶
To uninstall the Helm chart and delete the nessie
release from the nessie-ns
namespace:
helm uninstall -n nessie-ns nessie
Troubleshooting¶
The first step in troubleshooting a Nessie Kubernetes deployment is to check the logs of the Nessie server pod. You can do this by running:
kubectl logs -n <namespace> <pod>
You can also check the status of the pod:
kubectl describe pod -n <namespace> <pod>
It’s also possible to get a terminal into the Nessie pod’s main container:
kubectl exec -it -n <namespace> <pod> -- /bin/bash
But beware that the container does not have some tools installed, e.g. curl
, wget
, etc. are not present.
Troubleshooting connectivity¶
Connectivity issues require more powerful tools. One useful technique is to run an ephemeral container in the same pod as the Nessie server, which shares the same network namespace and can access the Nessie server as localhost
. This can be done with the kubectl debug
command:
kubectl debug -it -n <namespace> <pod> --image=nicolaka/netshoot --target=nessie --share-processes
The above example uses the nicolaka/netshoot
image, which contains a lot of useful tools for debugging. See the nicolaka/netshoot Docker Hub page for more information. The command should give you a shell in the Nessie pod, where you can use curl
, wget
, netstat
, dig
, tcpdump
, etc.
For example, once you get a shell in the debug container, you can:
- Check which processes are running in the Nessie pod with
ps aux
; - Check Nessie’s management API on port 9000 to see if the server is healthy:
curl http://127.0.0.1:9000/q/health
- Check the Nessie server’s API endpoint on port 19120:
curl http://127.0.0.1:19120/api/v2/config
Advanced Nessie JVM troubleshooting¶
JVM issues such as memory leaks, high CPU usage, etc. can be debugged using JVM tools.
The Nessie container ships with few utilities, but it does have jcmd
, a command-line utility for interacting with the JVM, installed.
First, get a shell in the Nessie container:
kubectl exec -it -n <namespace> <pod> -c nessie -- /bin/bash
Then, you can use jcmd
to capture a thread dump, heap dump, etc.:
jcmd 1 Thread.print
jcmd 1 GC.heap_dump /tmp/heapdump.hprof
Tip
Nessie server PID is usually 1. You can double-check with ps aux
or jps
.
If you need other JVM tools, such as jfr
or async-profiler
, a more complex setup is required.
First, restart the Nessie pod with some extra JVM options. The most useful option to add is the --XX:+StartAttachListener
JVM option; without it, the JVM will not allow attaching to it and tools like jcmd
will fail.
This can be done by modifying the pod template spec in the deployment spec, and adding/updating the JAVA_OPTS_APPEND
environment variable:
java_opts_append=$(kubectl get deployment -n <namespace> <deployment> -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="JAVA_OPTS_APPEND")].value}')
kubectl set env -n <namespace> deployment <deployment> JAVA_OPTS_APPEND="$java_opts_append -XX:+StartAttachListener"
Warning
The above command will restart all Nessie pods! Unfortunately that’s inevitable, because environment variables cannot be changed in the pod spec directly, if it belongs to a deployment.
Once the target pod is ready to be attached, you will need an image with the required tools. One example is the lightrun-platform/koolkits/koolkit-jvm
image, which contains a JVM-based toolset for debugging Java applications:
kubectl debug -it -n <namespace> <pod> --image=lightruncom/koolkits:jvm --target=nessie --share-processes
See the JVM KoolKits page for more information. Beware that the image is quite large, so it may take some time to download.
A few preliminary commands may need to be executed prior to be able to use the tools, for example if users and groups don’t match: the Nessie process runs as UID 10000 and GID 10001 by default, while in many debug containers, the running user is root (UID 0). If that is the case, you won’t be able to attach to the Nessie JVM. You can solve the problem by creating a new user with the same UID and GID as the Nessie process. In the case of JVM KoolKits, you also need to copy a few files before switching to the new user; here is the whole snippet to run:
addgroup --gid 10001 debug
adduser --home /home/debug --uid 10000 --gid 10001 --disabled-password --gecos "" debug
cp -Rf /root/.sdkman/ /home/debug/ && chown -R 10000:10001 /home/debug/.sdkman
cp /root/.bashrc /home/debug/ && chown 10000:10001 /home/debug/.bashrc
su - debug
After running the above commands, you should be able to use jps
, jcmd
, jmap
, etc. as well as other tools like jfr
(Java Flight Recorder), etc. For example, you can use JFR to record a profile for 30 seconds:
jcmd 1 JFR.start duration=30s filename=/tmp/profile.jfr
Note that the profile will be saved in the Nessie container, not the debug container. You can copy it to your local machine with kubectl cp
:
kubectl cp -n <namespace> <pod>:/tmp/profile.jfr profile.jfr
Remote debugging¶
If the above doesn’t help, you can also enable remote debugging, by attaching to the Nessie pod with a remote debugger.
Again, this will require restarting the Nessie pod with some extra JVM options. The most useful option to add is the -agentlib:jdwp
JVM option, which enables the Java Debug Wire Protocol (JDWP).
This can be done more easily done by simply setting the JAVA_DEBUG
and JAVA_DEBUG_PORT
environment variables in the deployment spec:
kubectl set env -n <namespace> deployment <deployment> JAVA_DEBUG="true" JAVA_DEBUG_PORT="*:5005"
Once the pod is ready to be debugged, you must forward the 5005 port to your local machine:
kubectl port-forward -n <namespace> <pod> 5005:5005
Then you can attach to the Nessie server with your favorite IDE or command-line debugger.
Tip
If you are using IntelliJ IDEA, you should create a new “Remote JVM Debug” run configuration. Using the “Attach to process…” option will not work, since it only supports attaching to local JVM processes.