Spark

Testing our UDF’s with UnitTest

Notebooks make it easy to test our code on the fly and check the output dataframe looks correct. However, it is good practice to run some unit tests with some edge cases – things you may not see very often & may not be in your sample data and it’s also important to check that […]

Read more
Spark

Testing in PySpark (Part 1)

Testing is one of the most painful parts of data engineering, especially when you have a really huge data set. But it is an absolutely necessary part of every project, as without it, we can’t have complete confidence in our data. We can write unit tests using libraries like PyTest, which I will cover in […]

Read more