How to create tonnes of dummy data in MySQL

Are you looking for a guide on how to create dummy data for analysis in MySQL? If so, you’ve come to the right place! In this blog post, we’ll provide an overview of what dummy data is and how it can be used, explain why you should create MySQL dummy data, and provide step-by-step instructions on how to create it. Whether you’re just getting started with database analysis or are an experienced user, you’ll learn how to generate dummy data quickly and efficiently with this guide. Let’s get started creating MySQL dummy data!

Why Generate Dummy Data

Generating dummy data is a useful and necessary process for analysis and development. Dummy data can be used to test software and to perform various types of statistical analysis. It allows developers to simulate real-world conditions, as well as to find potential problems and opportunities for optimization before the application goes live.

Dummy data also helps in providing better understanding of data structures, data types, relationships between objects, and other related matters. With dummy data, developers can explore what works and what doesn’t without any risk to the system or data. Additionally, dummy data allows developers to test the efficiency of their system’s performance when subjected to large volumes of data.

In short, generating dummy data is essential in order to properly design and debug applications. Without dummy data, developers would have to rely on real user data which could put the system at risk and make it more difficult to identify areas of improvement. With dummy data, developers can ensure that their applications are working as expected without compromising any real user data.

How to Generate Dummy Data

There are two really good methods to generate mock data in your MYSQL tables. The first, is in Python; using the Faker library to generate data which is relatively representative of data you might use in your live databases. An example of doing this is below. I have discussed Faker previously, here, this just expands on that by connecting to the MySQL database too.

from sqlalchemy import create_engine
from faker import Faker
import mysql.connector
import pandas as pd
from faker.providers import internet, address, automotive, bank, barcode, company, credit_card, currency, date_time, file, geo, job, misc, person, phone_number, user_agent
fake = Faker('en-GB')
from random import randint
while i <= 1000000000:
    i = i+1
    y = str(i)
    customer_name =
    address =
    place =
    favorite_quote = fake.catch_phrase()
    favorite_meal = fake.catch_phrase()
    field = fake.catch_phrase()
    field2 = fake.catch_phrase()
    field3 = randint(1, 30)
    field3 = str(field3)
        #source table conn
        engine = create_engine("mysql+pymysql://username:password@localhost/ran",echo = True)
        iconn = engine.connect()
    except Exception as e:
        print('extract: ' + str(e))
        query = "INSERT INTO ran.dbmigrate SELECT " + y + " as i, '" + customer_name + "' , '" + address + "' , '" + place + "' , '" + favorite_quote + "' , '" + favorite_meal + "' , '" + field + "' , '" + field2 + "' ,'" + field3 + "'"
        data = pd.read_sql(query, iconn)
    except Exception as e:
        print('extract: ' + str(e))

The other option is to create a stored procedure. I’ve discussed them here. In the below; we create a table called mybigtablex, with a number of fields.

Within the procedure, we run a while loop – while i < 1000, then insert some data. This is less flexible as you can’t generate really representative data like you can with Faker but it’s a great option when you just need lots of data!

DROP PROCEDURE make_lots_data;
CREATE TABLE `mybigtablex` 
  `id`                bigint(20) NOT NULL      AUTO_INCREMENT,
  `order_date`        timestamp  NULL          DEFAULT CURRENT_TIMESTAMP,
  `sales_category`    int(11)                  DEFAULT NULL,
  `store_id`          varchar(40)              DEFAULT NULL,
  `order_value`       float                    DEFAULT NULL,
  PRIMARY KEY (`id`)

CREATE PROCEDURE make_lots_data()
  WHILE i < 1000 DO
    INSERT INTO `mybigtablex` (`order_date`,`sales_category`,`store_id`, `order_value`) VALUES (
      FROM_UNIXTIME(UNIX_TIMESTAMP('2022-01-01 01:00:00')+FLOOR(RAND()*31536000)),
    SET i = i + 1;
CALL make_lots_data();

Either option will give you lots of data to play with to test your MySQL functions.

Share the Post:

Related Posts