Spark

Solving the UDF performance problem with Pandas

I mentioned in a previous article that for performance reasons, you should avoid the use of UDF’s wherever possible. And while that statement still stands, if you absolutely must use a UDF, you should consider a Pandas UDF rather than those that come out of the box with Spark. The standard UDF’s in Spark operate […]

Read more