Are you looking for a guide on how to create dummy data for analysis in MySQL? If so, you’ve come to the right place! In this blog post, we’ll provide an overview of what dummy data is and how it can be used, explain why you should create MySQL dummy data, and provide step-by-step instructions on how to create it. Whether you’re just getting started with database analysis or are an experienced user, you’ll learn how to generate dummy data quickly and efficiently with this guide. Let’s get started creating MySQL dummy data!
Why Generate Dummy Data
Generating dummy data is a useful and necessary process for analysis and development. Dummy data can be used to test software and to perform various types of statistical analysis. It allows developers to simulate real-world conditions, as well as to find potential problems and opportunities for optimization before the application goes live.
Dummy data also helps in providing better understanding of data structures, data types, relationships between objects, and other related matters. With dummy data, developers can explore what works and what doesn’t without any risk to the system or data. Additionally, dummy data allows developers to test the efficiency of their system’s performance when subjected to large volumes of data.
In short, generating dummy data is essential in order to properly design and debug applications. Without dummy data, developers would have to rely on real user data which could put the system at risk and make it more difficult to identify areas of improvement. With dummy data, developers can ensure that their applications are working as expected without compromising any real user data.
How to Generate Dummy Data
There are two really good methods to generate mock data in your MYSQL tables. The first, is in Python; using the Faker library to generate data which is relatively representative of data you might use in your live databases. An example of doing this is below. I have discussed Faker previously, here, this just expands on that by connecting to the MySQL database too.
from sqlalchemy import create_engine
from faker import Faker
import mysql.connector
import pandas as pd
from faker.providers import internet, address, automotive, bank, barcode, company, credit_card, currency, date_time, file, geo, job, misc, person, phone_number, user_agent
fake = Faker('en-GB')
from random import randint
i=0
while i <= 1000000000:
i = i+1
y = str(i)
customer_name = fake.name()
address = fake.bs()
place = fake.city()
favorite_quote = fake.catch_phrase()
favorite_meal = fake.catch_phrase()
field = fake.catch_phrase()
field2 = fake.catch_phrase()
field3 = randint(1, 30)
field3 = str(field3)
#####SOURCE
try:
#source table conn
engine = create_engine("mysql+pymysql://username:password@localhost/ran",echo = True)
iconn = engine.connect()
except Exception as e:
print('extract: ' + str(e))
try:
query = "INSERT INTO ran.dbmigrate SELECT " + y + " as i, '" + customer_name + "' , '" + address + "' , '" + place + "' , '" + favorite_quote + "' , '" + favorite_meal + "' , '" + field + "' , '" + field2 + "' ,'" + field3 + "'"
data = pd.read_sql(query, iconn)
except Exception as e:
print('extract: ' + str(e))
The other option is to create a stored procedure. I’ve discussed them here. In the below; we create a table called mybigtablex, with a number of fields.
Within the procedure, we run a while loop – while i < 1000, then insert some data. This is less flexible as you can’t generate really representative data like you can with Faker but it’s a great option when you just need lots of data!
DROP PROCEDURE make_lots_data;
CREATE TABLE `mybigtablex`
(
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`order_date` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`sales_category` int(11) DEFAULT NULL,
`store_id` varchar(40) DEFAULT NULL,
`order_value` float DEFAULT NULL,
PRIMARY KEY (`id`)
);
DELIMITER $$
CREATE PROCEDURE make_lots_data()
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < 1000 DO
INSERT INTO `mybigtablex` (`order_date`,`sales_category`,`store_id`, `order_value`) VALUES (
FROM_UNIXTIME(UNIX_TIMESTAMP('2022-01-01 01:00:00')+FLOOR(RAND()*31536000)),
ROUND(RAND()*10,0),
1,
ROUND(RAND()*100,2)
);
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;
CALL make_lots_data();
Either option will give you lots of data to play with to test your MySQL functions.