Spark

Starter Sundays: PySpark Basics, Selecting & Filtering Data

Welcome to this weeks Starter Sunday – which is where I cover off some basic concepts of Spark, Python or Hive. Today, we’re looking at selecting and filtering data from our dataframes in Spark, specifically, Pyspark. Select specific columns Below, we have a dataframe called df_select which will take just two columns from the dataframe […]

Read more