Nessie Airflow Provider¶
Documentation: https://projectnessie.github.io/nessie_provider
Source Code: https://github.com/projectnessie/nessie_provider
Usage¶
To use in airflow install via pip pip install airflow-provider-nessie
. See Nessie Documentation for
instructions on starting and using a Nessie server.
Operators and Hooks¶
To interact with Nessie from an airflow DAG you have the following options:
- Nessie Hook: register as a connection w/ Airflow and store your Nessie url and credentials
- Create reference operator: Create a Branch or Tag as part of an airflow DAG
- Delete reference operator: Delete a Branch or Tag as part of an airflow DAG
- Commit operator: commit objects to the Nessie database on a given branch
- Merge operator: merge one branch into another
These can be seen in action by looking at the Example DAGs.
The basic_nessie.py
DAG shows each operator in action and the spark_nessie_iceberg.py
DAG shows a more complicated example of performing an iceberg
transaction in Nessie from the Spark operator.
Development¶
Setup environement¶
You should have Pipenv installed. Then, you can install the dependencies with:
pipenv install --dev
After that, activate the virtual environment:
pipenv shell
Run unit tests¶
You can run all the tests with:
make test
Alternatively, you can run pytest
yourself:
pytest
Format the code¶
Execute the following command to apply isort
and black
formatting:
make format
License¶
This project is licensed under the terms of the Apache Software License 2.0.