Apache Spark provides us with a framework to crunch a huge amount of data efficiently by leveraging parallelism which is great! However, with great power, comes great responsibility; because, optimising your scripts to run efficiently, is not so easy. Within our scripts, we need to look to minimize the data we bring in; avoid UDF’s […]
Read more