When we are testing various database concepts, we need mock data. Sometimes, we need a lot of mock data.
There are two really good methods to generate mock data in your MYSQL tables. The first, is in Python; using the Faker library to generate data which is relatively representative of data you might use in your live databases. An example of doing this is below. I have discussed Faker previously, here, this just expands on that by connecting to the MySQL database too.
from sqlalchemy import create_engine
from faker import Faker
import mysql.connector
import pandas as pd
from faker.providers import internet, address, automotive, bank, barcode, company, credit_card, currency, date_time, file, geo, job, misc, person, phone_number, user_agent
fake = Faker('en-GB')
from random import randint
i=0
while i <= 1000000000:
i = i+1
y = str(i)
customer_name = fake.name()
address = fake.bs()
place = fake.city()
favorite_quote = fake.catch_phrase()
favorite_meal = fake.catch_phrase()
field = fake.catch_phrase()
field2 = fake.catch_phrase()
field3 = randint(1, 30)
field3 = str(field3)
#####SOURCE
try:
#source table conn
engine = create_engine("mysql+pymysql://username:password@localhost/ran",echo = True)
iconn = engine.connect()
except Exception as e:
print('extract: ' + str(e))
try:
query = "INSERT INTO ran.dbmigrate SELECT " + y + " as i, '" + customer_name + "' , '" + address + "' , '" + place + "' , '" + favorite_quote + "' , '" + favorite_meal + "' , '" + field + "' , '" + field2 + "' ,'" + field3 + "'"
data = pd.read_sql(query, iconn)
except Exception as e:
print('extract: ' + str(e))
The other option is to create a stored procedure. I’ve discussed them here. In the below; we create a table called mybigtablex, with a number of fields.
Within the procedure, we run a while loop – while i < 1000, then insert some data. This is less flexible as you can’t generate really representative data like you can with Faker but it’s a great option when you just need lots of data!
DROP PROCEDURE make_lots_data;
CREATE TABLE `mybigtablex`
(
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`order_date` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`sales_category` int(11) DEFAULT NULL,
`store_id` varchar(40) DEFAULT NULL,
`order_value` float DEFAULT NULL,
PRIMARY KEY (`id`)
);
DELIMITER $$
CREATE PROCEDURE make_lots_data()
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < 1000 DO
INSERT INTO `mybigtablex` (`order_date`,`sales_category`,`store_id`, `order_value`) VALUES (
FROM_UNIXTIME(UNIX_TIMESTAMP('2022-01-01 01:00:00')+FLOOR(RAND()*31536000)),
ROUND(RAND()*10,0),
1,
ROUND(RAND()*100,2)
);
SET i = i + 1;
END WHILE;
END$$
DELIMITER ;
CALL make_lots_data();
Either option will give you lots of data to play with to test your MySQL functions.