Pandas is the defaqto library for data analysis in Python for good reason. It’s infinitely flexible, relatively performant and easy to learn. That said, quite a lot of what we, as data scientists and engineers do is trial and error and exploration.
This exploratative process can sometimes be a bit tedius. We create a dataframe; apply some filters; visualize the data using matplot lib or seaborn on an infinite loop until we know enough about the dataset we’re working with.
I have recently stumbled upon a GUI for Pandas, which makes all of this a little bit quicker and a little bit less tedious, as we can rapidly slice and dice our data as we please, without even writing a line of code.
To install it, you can simply type ‘pip3 install pandasgui’. If you get an error a little bit like the one below, make sure to first run ‘pip3 install –upgrade pip’. This should fix the issue.
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-dzdmy8xr/PyQt5/
Now, let’s get into it. Below, we create a dataframe just as we would do normally with Pandas, but instead of df.head() or something similar, we do show(df).
import pandas as pd from pandasgui import show df = pd.read_csv('/home/Datasets/diabetes2.csv') show(df)
This opens a new window like the below, which shows us our dataframe visually. I hear you, nothing new about this really, it looks pretty similar to showing the dataframe in a Jupyter notebook. But keep reading, this is better… much better.
What if then, I wanted to filter this dataset to show me only the records where pregnancies were equal to 3. We simply, head over to the filter tab and type ‘pregnancies == 3’ and that’s it – we’ve filtered the dataframe.
Now, we can inspect the dataframe to extract some key statistics from each field – the min, max, standard deviation etc… To do that, we just head over to the statistics tab – we see the below.
Now, the really valuable bit. The grapher tab gives us the ability to play around with different graph types and configurations to explore our data and understand what we are working with. It’s as simple as dragging the field we are interested in from the left pane to the right and clicking the ‘finish’ button.
The final thing, which is pretty cool is the reshaper. If you need to melt or pivot your dataframes, you can do that using this tab. It gives us the opportunity to trial a few things with our dataframes and make sure we have a good handle on the dataset & what we are working with before we try to ingest the data into a model.
Personally, I think this tool is really useful. Perhaps it isn’t a gamechanger, but it certainly does streamline the data exploration phase of our project.