Python

Wild Wednesday: handling semi structured JSON data

Wild Wednesday posts are all about taming semi or unstructured data. Today, we’re going to look at ingesting JSON data, generated from YARN, using the API; putting it into a dataframe and then outputting that information to a Hive table. JSON data can pose us with problems as it has a flexible schema (i.e. not […]

Read more