Melting our Spark dataframes with Koalas

Melting dataframes is the process of taking a short, wide table and making it into a long thin one, using column headings as categorical data within the resulting dataframe.

In the below example we have a dataframe which shows the total kilometres walked and cycled per person.

NAMEBIKEKMWALKKM
Kieran77178
Bobby79158
Polly45124

Sometimes, it might be more useful to have data displayed as below.

NAMEEXERCISE TYPEMEASURE
KieranBike77
KieranWalk187
BobbyBike79
BobbyWalk158
PollyBike45
PollyWalk24

Being able to melt your dataframes like that makes dealing with data much simpler, especially for visualization. To do this in Pandas, we can follow the below.

  1. First we define a dataframe from a dictionary
  2. Then we use the melt method where we fold the BikeKM and WalkKM fields into the dataframe as categorical data in an exercise type field.

To do the same in Spark, we simply define a dataframe; import Koalas jand then do exactly as we did with Pandas.

So there we go – a super simple function that has the ability to transform the way we work with data.