There are so many connectors available through Apache NiFi. I won’t discuss them all here; however I do want to give an idea of the capabilities that Apache NiFi has. As we have discussed before; NiFi itself doesn’t do any heavy data processing and if you are looking to run heavy data aggregations and joins as part of your pipeline; NiFi itself is not the tool, rather Spark is better suited.
So; let’s think about the ETL process. We have to get some data; do something to it and load it into the destination system. So for getting Data; NiFi has lots of connectors, for example:
- GetFile
- GetHDFS
- GetHTTP
- GetFTP
- GetKafka
- GetS3Object
- SelectHiveQL
- ExecuteSQL
- Convert JSON to SQL
- LookupRecord
- ExecuteProcesses: run process on a server (e.g. Python Script)
- InvokeHTTP (API usage)
- ListenHTTP
Now that we’ve got our data, we need to do something to it. We can do cool things like:
- EncryptContent
- DecryptContent
- CompressContent
- DecompressContent
- Replace Text (e.g. Change UK to United Kingdom)
- SplitText (process line by line, split at n)
- MergeContent (merge many flow files to single file)
We can also Route on Content (e.g. if flow file includes UK then route it to processor1 else use processor2). This gives us additional flexibility in our ETL process with relative ease.
Now that we have the data in a format we’re happy with; we need to put it somewhere. We can use connectors like:
- PutEmail (send an email)
- PutKafka
- PutFTP
- PutS3Object
- PutHiveQL
- PostHTTP