Another Pandas GUI: Pandas Profiling

Another GUI for Pandas! YES – they’re coming out of the woodwork now! But this one is a little bit different to the Pandas GUI library I discussed previously. Pandas GUI let us slice and dise our data; restructure it and visualize it. Pandas Profiling isn’t as feature-rich but provides a different way of looking at our dataset.

Out of the box, you get a nice visual of every input column. It tells us a number of useful stats about the column, including the mean, max, min etc… and it also provides us with a visualisation of the data distribution, which is very useful to know.

Next, we have a really nice tool, which simply lets us compare two features. In the below, I have selected Glucose for the X axis and Insulin for the Y. We can now look at the relationship between these two features. This is nowhere near as thorough and flexible as the Pandas GUI charting options, but does give us a very quick method for simply checking the relationship between features.

The tool also produces us a correlation heatmap like the below. This is a feature that the Pandas GUI does not possess, so the best way to work with your data exploration is to utilise a number of methods.

The final noteworthy piece of information that Pandas Profiling can give you is a list of warnings about your data, like the below.

As with Pandas GUI in my previous article, these tools will not solve all of your data exploration needs, but it will give you a very time-efficient method to conduct a lot of analysis in a very short period of time.

Code:

#pip install -U pandas-profiling
from pandas_profiling import ProfileReport
import pandas as pd

df = pd.read_csv('/home/Datasets/diabetes2.csv')

profile = ProfileReport(df, title='Diabetes Data Exploration', explorative = True)
profile
Kodey