Data Provenance in Apache NiFi

Understanding how your data came to be in its current form is critical for any kind of data flow. When you’ve got a simple flow, you can eyeball it & understand what happened & why the outcome is as it is. But, when you start forking your flows & adding a lot of complexity, it is difficult to manually map out the full provenance.

If you look at the below flow. I am going to right click on ‘Put file in target’ and look at the provenance. Manually checking would show me that there were 4 routes that the file could have taken to get to this point, so a tool to monitor the provenance will make life much easier.

Once you have right clicked and selected ‘view data provenance’, you can see each of the processes that have been completed by this processor.

If I were looking for something specific, I could filter, in this case, I will do that by filename.

Now that we have a list of only the files I am interested in, we can start looking at the route they took & all the actions that were taken against them.

Kodey