Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add docs on how use custom Python processors #753

Merged
merged 11 commits into from
Feb 26, 2025
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Loading custom components
= Custom `.nar` files

:description: Load custom NiFi components by using custom Docker images or mounting external volumes with nar files for enhanced functionality.
:nifi-docs-custom-components: https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#introduction

Expand Down Expand Up @@ -36,6 +37,7 @@ spec:
Also read the xref:guides:custom-images.adoc[Using customized product images] guide for additional information.

== Using the official image

If you don't want to create a custom image or don't have access to an image registry, you can use the extra volume mount functionality to mount a volume containing your custom components and configure NiFi to read these from the mounted volumes.

For this to work you'll need to prepare a PersistentVolumeClaim (PVC) containing your components.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
= Custom Python processors

In NiFi 2.0, support for custom processors written in Python was added.
The Stackable images already contain the required tooling, such as - obviously - a supported Python version.

== General configuration

[source,yaml]
----
spec:
nodes:
configOverrides:
nifi.properties:
# The command used to launch Python.
# This property must be set to enable Python-based processors.
nifi.python.command: python3
# The directory that NiFi should look in to find custom Python-based
# Processors.
nifi.python.extensions.source.directory.custom: /nifi-python-extensions
# The directory that contains the Python framework for communicating
# between the Python and Java processes.
nifi.python.framework.source.directory: /stackable/nifi/python/framework/
# The working directory where NiFi should store artifacts
# This property defaults to ./work/python but if you want to mount an
# emptyDir for the working directory then another directory has to be
# set to avoid ownership conflicts with ./work/nar.
nifi.python.working.directory: /nifi-python-working-directory
----

== Getting Python scripts into NiFi

TIP: NiFi should hot-reload the Python scripts. You might need to refresh your browser window to see the new processor.

[#configmap]
=== 1. Mount as ConfigMap

The easiest way is defining a ConfigMap and mounting it as follows.
This way, the Python processors are stored and versioned alongside your NifiCluster itself.

// Technically it's yaml, but the most content is Python
[source,python]
----
apiVersion: v1
kind: ConfigMap
metadata:
name: nifi-python-extensions
data:
CreateFlowFileProcessor.py: |
from nifiapi.flowfilesource import FlowFileSource, FlowFileSourceResult

class CreateFlowFile(FlowFileSource):
class Java:
implements = ['org.apache.nifi.python.processor.FlowFileSource']

class ProcessorDetails:
version = '0.0.1-SNAPSHOT'
description = '''A Python processor that creates FlowFiles.'''

def __init__(self, **kwargs):
pass

def create(self, context):
return FlowFileSourceResult(
relationship = 'success',
attributes = {'greeting': 'hello'},
contents = 'Hello World!'
)
----

The Python script is taken from https://nifi.apache.org/nifi-docs/python-developer-guide.html#flowfile-source[the offical NiFi Python developer guide].

You can add multiple Python scripts in the ConfigMap.
Afterwards we need to mount the Python scripts into `/nifi-python-extensions`:

[source,yaml]
----
spec:
nodes:
podOverrides:
spec:
containers:
- name: nifi
volumeMounts:
- name: nifi-python-extensions
mountPath: /nifi-python-extensions
- name: nifi-python-working-directory
mountPath: /nifi-python-working-directory
volumes:
- name: nifi-python-extensions
configMap:
name: nifi-python-extensions
- name: nifi-python-working-directory
emptyDir: {}
----

[#git-sync]
=== 2. Use git-sync

As an alternative you can use `git-sync` to keep your Python processors up to date.
You need to add a sidecar using podOverrides that syncs into a shared volume between the `nifi` and `git-sync` container.

The following snippet can serve as a starting point (the Git repo has the folder `processors` with the Python scripts inside).

[source,yaml]
----
spec:
nodes:
podOverrides:
spec:
containers:
- name: nifi
volumeMounts:
- name: nifi-python-extensions
mountPath: /nifi-python-extensions
- name: nifi-python-working-directory
mountPath: /nifi-python-working-directory
- name: git-sync
image: registry.k8s.io/git-sync/git-sync:v4.2.3
args:
- --repo=https://github.com/stackabletech/nifi-talk
- --root=/nifi-python-extensions
- --period=10s
volumeMounts:
- name: nifi-python-extensions
mountPath: /nifi-python-extensions
volumes:
- name: nifi-python-extensions
emptyDir: {}
- name: nifi-python-working-directory
emptyDir: {}
----

Afterwards you need to update your source directory (the one you added previously) accordingly, to point into the Git subfolder you have.

[source,yaml]
----
spec:
nodes:
configOverrides:
nifi.properties:
# Replace the property from the previous step
# Format is /nifi-python-extensions/<git-repo-name>/<git-folder>/
nifi.python.extensions.source.directory.custom: >
/nifi-python-extensions/nifi-talk/processors/
----

=== 3. Use PersistentVolume

You can also mount a PVC below `/nifi-python-extensions` using podOverrides and shell into the NiFi Pod to make changes.
However, the <<configmap>> or <<git-sync>> approach is recommended.

== Check processors have been loaded

NiFi logs every Python processor it found.
You can use that to check if the processors have been loaded.

[source,console]
----
$ kubectl logs nifi-2-0-0-node-default-0 -c nifi \
| grep 'Discovered.*Python Processor'
… INFO [main] … Discovered Python Processor PythonZgrepProcessor
… INFO [main] … Discovered Python Processor TransformOpenskyStates
… INFO [main] … Discovered Python Processor UpdateAttributeFileLookup
… INFO [main] … Discovered or updated 3 Python Processors in 64 millis
----
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
= Loading custom components
:description: Load custom NiFi components for enhanced functionality.

You can develop or use custom components for Apache NiFi, typically custom processors, to extend its functionality.

There are currently two types of custom components:

1. xref:nifi:usage_guide/custom-components/custom-nars.adoc[]
2. Starting with NiFi 2.0 you can also use xref:nifi:usage_guide/custom-components/custom-python-processors.adoc[]
4 changes: 3 additions & 1 deletion docs/modules/nifi/partials/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
** xref:nifi:usage_guide/listenerclass.adoc[]
** xref:nifi:usage_guide/zookeeper-connection.adoc[]
** xref:nifi:usage_guide/extra-volumes.adoc[]
** xref:nifi:usage_guide/custom_processors.adoc[]
** xref:nifi:usage_guide/external_ports.adoc[]
** xref:nifi:usage_guide/security.adoc[]
** xref:nifi:usage_guide/resource-configuration.adoc[]
Expand All @@ -14,6 +13,9 @@
** xref:nifi:usage_guide/updating.adoc[]
** xref:nifi:usage_guide/overrides.adoc[]
** xref:nifi:usage_guide/writing-to-iceberg-tables.adoc[]
** xref:nifi:usage_guide/custom-components/index.adoc[]
*** xref:nifi:usage_guide/custom-components/custom-nars.adoc[]
*** xref:nifi:usage_guide/custom-components/custom-python-processors.adoc[]
** xref:nifi:usage_guide/operations/index.adoc[]
*** xref:nifi:usage_guide/operations/cluster-operations.adoc[]
*** xref:nifi:usage_guide/operations/pod-placement.adoc[]
Expand Down