Partitioning Overview in MYSQL

Partitioning Overview in MYSQL

I’ve recently started working with MySQL at a little bit of a larger scale and it’s surprising me. I come from a Hadoop & Apache Hive background and I had assumed that the principles from Apache Hive would carry over to MySQL – they’re both SQL-Like engines, afterall.

But there are some significant differences around syntax, specifically related to partitioning. In Apache Hive, you would say something like the below to insert data into a specific partition. In MYSQL; it kind of sorts itself out, you don’t need to be so explicit.

INSERT OVERWRITE TABLE DB.TABLENAME PARTITION(date='20220601')
SELECT ..... 

So let’s look at how things work in MYSQL. First of all, let’s create a partitioned table. In this case, I will create a range partition which buckets dates into yearly partitions.

CREATE TABLE ran.order_data (
    username VARCHAR(20) NOT NULL,
    order_date DATETIME NOT NULL,
    PRIMARY KEY(username, order_date)
)
PARTITION BY RANGE( YEAR(order_date) )(
    PARTITION p2015 VALUES LESS THAN (2016),
    PARTITION p2016 VALUES LESS THAN (2017),
    PARTITION p2017 VALUES LESS THAN (2018))

Now, let’s insert a little bit of data. You will notice that the first record inserts nicely, but I get an error for record 2.

insert into ran.order_data
select 'bob 2', '2015/06/06';

insert into ran.order_data
select 'bq3ob', '2019/06/06';

The error is ‘Table has no partition value for 2019‘. This is because the record does not meet any of our partition criteria and hence MySQL doesn’t know what to do with it.

We can get around this by adding a partition. We can now run the same insert statement without error, as there is now a partition in place to house the new data.

ALTER TABLE ran.order_data
ADD PARTITION (
PARTITION p2019 VALUES LESS THAN (2020))

We can run a quick ‘show create table’ to check that this has worked nicely, as you can see, it has through the updated create statement.

However, if you now run a count of records on each partition, you will see that it shows 0, even though we have one record there. So, we can run ‘analyze’ to update the information_schema data again.

Analyze collects stats about your table. You should run the analyze command whenever you expect your overall data distribution to change.

ANALYZE TABLE ran.order_data

I’ve now dumped a few more rows of data into the table. We can check how many rows reside in each partition:

SELECT `PARTITION_DESCRIPTION`, TABLE_ROWS, PARTITION_METHOD
       FROM information_schema.PARTITIONS 
       WHERE TABLE_SCHEMA = 'ran' AND TABLE_NAME = 'order_data'

We can add data from an unpartitioned table into our new table to force partitioning too!

insert into ran.order_data
select * from ran.unpartitioned_table

Now, what would happen if we wanted to partition by both date and month? In the below, we can extract them from the field as YYYYMM. We can then run the same range operations to drop into a year/month partition.

CREATE TABLE ran.multi_field (
    username VARCHAR(20) NOT NULL,
    order_date DATETIME NOT NULL,
    PRIMARY KEY(username, order_date)
)
PARTITION BY RANGE( extract(year_month from `order_date`) )(
    PARTITION p2021 VALUES LESS THAN (202201),
    PARTITION p202201 VALUES LESS THAN (202202),
    PARTITION p202202 VALUES LESS THAN (202203))

We can do by year and week too:

CREATE TABLE ran.withweek(
    username VARCHAR(20) NOT NULL,
    order_date DATETIME NOT NULL,
    PRIMARY KEY(username, order_date)
)
PARTITION BY RANGE( YEARWEEK(order_date) )(
    PARTITION p2021 VALUES LESS THAN (202201),
    PARTITION p202201 VALUES LESS THAN (202202),
    PARTITION p202202 VALUES LESS THAN (202203))

If we insert some data, let’s see what partition it drops into:

INSERT INTO ran.withweek
SELECT 'fff', '2022/01/09'

We can check, using the below statement:

SELECT `PARTITION_DESCRIPTION`, TABLE_ROWS, PARTITION_METHOD, PARTITION_NAME
       FROM information_schema.PARTITIONS 
       WHERE TABLE_SCHEMA = 'ran' AND TABLE_NAME = 'withweek'

This has dropped into week 2, as you can see below.

Upon further testing, it seems to work nicely. It’s worth noting that you can define a ‘mode’ for the week method. This defines the day your week begins, which may be important for some types of analysis.

Total
0
Shares
Previous Article
Approaches to text classification

NLP Crash Course (Part 2): Approaches to text classification

Next Article
MySQL Querying A Partitioned Table

MySQL Querying A Partitioned Table

Related Posts