Data Badass Free Book for Data Leaders

My new book ‘Data Badass’ is available to view using this link. It’s not yet been through thorough editing and will be added to over time but I am keen to gather some feedback. It’s a book that aims to provide a decent level of understanding in major data concepts to aspiring data leaders.

  • Introduction
  • Data Basics
    • An intro to data
    • Data roles & responsibilities
    • Data Lifecycle
    • Summary
  • Data Structures:
    • Data Types
    • Arrays
    • Vectors
    • Matrices
    • Hash tables / hash maps / dictionaries
    • Queues
    • Summary
  • Types of data:
    • Structured
    • Unstructured
    • Semi-Structured
  • Data Modeling:
    • Conceptual data models
    • Logical data models
    • Physical data models
    • Summary
  • Hadoop:
    • Components of a data platform
    • Ingestion tools
      • Sqoop
      • Kafka
      • Flume
      • NiFi
      • Apache Spark
    • Storage:
      • HDFS
      • HBase
      • Hive
  • Statistics:
    • Data types
    • Measures of central tendency
    • Measures of variability
    • Point estimates & confidence intervals
    • Percentiles
    • Skewness
    • Distributions
    • Central Limit Theorem
    • Standard error
    • Measures of relationship
    • Probability
    • Hypothesis Testing
  • Machine learning intro:
    • Introduction to terminology
    • Machine learning introduction
    • Stages of a machine learning project
    • Types of machine learning
  • Machine learning data preparation:
    • Data exploration
    • Data cleaning
    • Duplicate data
    • Dealing with dates
    • Structural issues
    • Feature engineering
    • Dimensionality reduction
    • Split data
    • Data scaling
  • Machine learning models:
    • Linear regression
    • Support vector machines
    • K-Nearest Neighbours
    • Naive Bayes
    • Association rules mining
    • KMeans clustering
    • Random forests
    • Isolation forest
  • Machine learning model accuracy:
    • Measure model accuracy
    • Classification accuracy
    • Confusion matrix
    • ROC Curves & AUC
    • Mean absolute error
    • Root mean square error
    • Cross validation
    • Tuning our models
    • Handling class imbalance
    • Data leakage
    • Hyper parameter tuning
    • Section summary
  • Time series forecasting:
    • Basics of timeseries analysis
    • Timeseries terminology
    • Seasonal decomposition
    • ARIMA in depth
  • Deep learning:
    • Introduction
    • Terminology
    • Multi-layer perceptron deep dive
    • More deep learning terminology
  • Roundup
Share the Post:

Related Posts