PySpark Introduction

Spark is one of the most in-demand Big Data processing frameworks right now.

This course will take you through the core concepts of PySpark. We will work to enable you to do most of the things you’d do in SQL or Python Pandas library, that is:

  • Getting hold of data
  • Handling missing data and cleaning data up
  • Aggregating your data
  • Filtering it
  • Pivoting it
  • And Writing it back

All of these things will enable you to leverage Spark on large datasets and start getting value from your data.

You can download the course dataset here.

Course Content

THE COLAB DEVELOPMENT ENVIRONMENT

DATAFRAME & DATASET INTRODUCTION FOR SCENARIO

SPARK CONFIGURATION

INGESTING & CLEANING OUR SCENARIO DATA

ANSWERING OUR SCENARIO QUESTIONS

CORE CONCEPTS: BRINGING DATA INTO DATAFRAMES

CORE CONCEPTS: INSPECTING DATAFRAMES

HANDLING NULL & DUPLICATE VALUES

CORE CONCEPTS: SELECTING & FILTERING DATA

CORE CONCEPTS: APPLYING MULTIPLE FILTERS

CORE CONCEPTS: RUNNING SQL ON DATAFRAMES

CORE CONCEPTS: ADDING CALCULATED COLUMNS

CORE CONCEPTS: GROUP BY & AGGREGATION

CORE CONCEPTS: WRITING DATAFRAMES TO FILES

To take this course, you must be a member.

Join over 30,000 students & access all the Kodey video content for $10 a year.

What our students think

Rating: 5 out of 5.

I really like the speed of this course. Is absolutely practical, is not going to deep, but I think there is a big value on teaching me really fast a normal processing with spark, so I can see the big picture quickly. I’m enjoying it honestly

Rating: 5 out of 5.

A good way to start the pyspark learning,very useful one,well connected topics in chronological order. Well Done and big thank you.

Rating: 5 out of 5.

The explanation and the content is clear and crisp. Just the right amount need for a beginner. Only thing that’s missing is a slight pre-explanation about pyspark and how it works

Rating: 4.5 out of 5.

So far , this is a great course and it is exactly what I need for professional purpose. If this course can tell more how to manage the storage drive memory in pyspark, it will be awesome

Rating: 5 out of 5.

I appreciate how much I could learn in 90 minutes. This course gives you all the tools needed to start getting comfortable with PySpark projects.

Rating: 5 out of 5.

Wonderful overview of PySpark. Completed it in one sitting. If you have experience in pandas or numpy already, this course will suffice learning the basics of pyspark.

Rating: 5 out of 5.

This is very good and handy course for starter who wants to get introduce to how to use pyspark. Great work in structuring this course so well.

Rating: 5 out of 5.

amazing and in-depth walkthrough! perfect for me as I am an absolute beginner and now feel like a pro.

Rating: 5 out of 5.

Instructor is very elaborate and teaching everything with hands-on by taking good examples of data.