基于卷积神经网络的图片验证码识别

📌

  • 基于卷积神经网络训练模型识别图片验证码,0~9数字和26个字母大写和小写组合识别
  • PIL模块生成验证码训练集和测试集,添加干扰点和干扰线来模拟实际复杂环境,os保存以及读取,通过去噪以及灰度化处理提高识别率,通过卷积神经网络池化后全连接层处理,识别成功率达到约96%
    💨🕥😴

生成训练集和测试集

设置数字和字母

  • random_num随机产生数字,random_upper随机产生大写字母,random_lower随机产生小写字母
  • random.choice()随机选择三种情况
1
2
3
4
5
6
7
8
9
10
def getRandomChar():
"""
chr(i) return the Unicode of i
:return: None
"""
random_num = str(random.randint(0, 9)) # 0~9
random_upper = chr(random.randint(65, 90)) # A~Z
random_lower = chr(random.randint(97, 122)) # a~z
random_char = random.choice([random_num, random_upper, random_lower])
return random_char

设置随机颜色

  • is_light是颜色深浅的标签
  • 总的来说RGB值>127的颜色较深,True代表浅色,False代表深色
1
2
3
4
5
6
7
8
9
10
def getRandomColor(is_light=True):
"""
Generate colors, default light color
:param is_light: Distinguish between light and dark colors
:return: r, g, b
"""
r = random.randint(0, 127) + int(is_light) * 128
g = random.randint(0, 127) + int(is_light) * 128
b = random.randint(0, 127) + int(is_light) * 128
return r, g, b

设置干扰

  • 设置干扰线,随机产生起点和终点的坐标,is_light设置为True,干扰线浅色
  • 设置干扰点,随机产生干扰点的坐标,is_light设置为True,干扰点浅色
  • numLinenunmPoint自由设置,干扰线和干扰点数量越大,后期识别成功率越低
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def drawLine(draw):
"""
Draw interference lines, numLine feels free
x1,x2,y1,y2: The begin point and the end point of the line
:param draw:
:return: None
"""
numLine = 5
for i in range(numLine):
x1 = random.randint(0, width)
x2 = random.randint(0, width)
y1 = random.randint(0, height)
y2 = random.randint(0, height)
draw.line((x1, y1, x2, y2), fill=getRandomColor(is_light=True))


def drawPoint(draw):
"""
Draw interference points, numPoint feels free
:param draw:
:return: None
"""
numPoint = 50
for i in range(numPoint):
x = random.randint(0, width)
y = random.randint(0, height)
draw.point((x, y), fill=getRandomColor(is_light=True))

保存验证码

  • 使用PILImageImageDrawImageFont的模块(由于PIL不再适配高版本Python,安装需要用pip install pillow
  • 保存验证码label.png,直接打上标签,不用之手手动打标签
  • 背景采用浅色,numOfNum=4验证码长度为4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
def createImg(folder):
"""
Use drawPoint(), drawLine(), getRandomColor(), getRandomChar() to generate an Img by random
:param folder: train or test folder
:return: None
"""
# Generate background color by random
bg_color = getRandomColor(is_light=True)
# Generate a Img
img = Image.new(mode="RGB", size=(width, height), color=bg_color)
# Get picture brush
draw = ImageDraw.Draw(img)
# Set the font
font = ImageFont.truetype(font="C:/Users/lingz/AppData/Local/Microsoft/Windows/Fonts/Monaco.ttf", size=16)
# Set the Img name
file_name = ''

# Generate number or char, set the numOfNum to change the number of it
numOfNum = 4
for i in range(numOfNum):
random_txt = getRandomChar()
# Default: the bk_color is light, the txt_color is dark
txt_color = getRandomColor(is_light=False)

# Avoid the text color == the background color, re-generate txt_color
while txt_color == bg_color:
txt_color = getRandomColor(is_light=False)
# Fill the txt, the first para is the coordinates of the upper left corner
# If numOfNum = 4, then (15,0),(30,0),(45,0),(60,0), four nums of chars
draw.text((15 + 15 * i, 0), text=random_txt, fill=txt_color, font=font)
file_name += random_txt

# Draw interference lines and points
drawLine(draw)
drawPoint(draw)

# Save the Img
with open("./{}/{}.png".format(folder, file_name), "wb") as train:
img.save(train, format="png")

Fig.1 产生数据集

Fig.1 产生数据集

自动删除文件

  • 用于每次生成数据时自动清空原有数据
1
2
3
4
5
6
def creat_del_file(path_data):
os.path.exists(path_data) or os.makedirs(path_data)
for i in os.listdir(path_data):
file_data = path_data + "\\" + i
os.remove(file_data)
print("Original %s set is deleted." % path_data)

卷积神经网络训练和识别

编码准备工作

  • model_dir存放训练后的模型
  • char_setalphabet存放字符集用于数据读取
  • char_to_int用于获取字典中的下标,用于之后的one_hot编码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
model_dir = r'model.h5'
char_set = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] + ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k',
'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
'w', 'x', 'y', 'z'] + ['A', 'B', 'C', 'D', 'E', 'F',
'G', 'H', 'I', 'J', 'K', 'L',
'M', 'N', 'O', 'P', 'Q', 'R',
'S', 'T', 'U', 'V', 'W', 'X',
'Y', 'Z']
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k',
'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
'w', 'x', 'y', 'z'] + ['A', 'B', 'C', 'D', 'E', 'F',
'G', 'H', 'I', 'J', 'K', 'L',
'M', 'N', 'O', 'P', 'Q', 'R',
'S', 'T', 'U', 'V', 'W', 'X',
'Y', 'Z']
char_to_int = dict((c, i) for i, c in enumerate(char_set))

读取图片处理数据

  • x_data数据,y_data标签
  • 循环导入验证码,经过deNoising()降噪和灰度化处理,转化成np后存入x_data,并且使用按照.切割以获取验证码的标签值存入y_data
  • unicode_to_intunicode_to_int_index来暂存unicode和字典下标的转化,np中读取到的数据是unicode,若要使用one_hot编码处理最好转化成np.int,所以先判断每个字符集是否在alphabet_list中,若在则先暂存,用np.int0全部替换,再用y_data = y_data.astype(np.int)转换类型,最后在进行一次遍历,从暂存列表中取出indexvalue完成转换
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
def generateTrainData(filePath):
"""
Generate data set, 0~9, a~z, A~Z turn to 0~61
:param filePath: The Img in this folder need to be processed
:return: x_data is data, shape=(num, 20, 80)
y_data is label, shape=(num, 4)
"""
# os.listdir(filePath) return list of files and folders
# If the path has Chinese characters, it will be garbled, deal with unicode()
train_file_name_list = os.listdir(filePath)
x_data = []
y_data = []

# Process each Img
for selected_train_file_name in train_file_name_list:
if selected_train_file_name.endswith('.png'):
# Get Img object, os.path.join(PathA, PathB) return PathA\PathB
captcha_image = Image.open(os.path.join(filePath, selected_train_file_name))
# deNoising
captcha_image = deNoising(captcha_image)

captcha_image_np = np.array(captcha_image)

img_np = np.array(captcha_image_np)
x_data.append(img_np)
# split('.')[0] to get label from label.png
y_data.append(np.array(list(selected_train_file_name.split('.')[0])))

x_data = np.array(x_data).astype(np.float)
y_data = np.array(y_data)

# Temporarily store letters into array
unicode_to_int = ['x']
unicode_to_int_index = [0]

for index_0, alphabet_list in enumerate(y_data):
for index_1, value in enumerate(alphabet_list):
if (alphabet_list[index_1]) in alphabet:
unicode_to_int.append(alphabet_list[index_1])
alphabet_list[index_1] = 0
unicode_to_int_index.append(index_0)
unicode_to_int_index.append(index_1)
y_data = y_data.astype(np.int)

del (unicode_to_int[0])
del (unicode_to_int_index[0])

for i in range(0, len(unicode_to_int)):
y_data[unicode_to_int_index[2 * i]][unicode_to_int_index[2 * i + 1]] = char_to_int.get(unicode_to_int[i])

return x_data, y_data

去噪和灰度化处理

  • 阈值threshold设置为128用来区分RGB深浅,和之前的浅色背景、浅色干扰、深色验证码相对应
  • 若大于阈值,直接变成白色(255,255,255),其他变成黑色(0,0,0)
  • convert('L')直接转化为二值图像,每个像素点八个bit
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def deNoising(image):
"""
image.getpixel((x,y)):get RGB,return (*,*,*)
:param image: RGB Img
:return: Grayscale Img
"""
# Set threshold to filter
threshold = 128
# Change to Grayscale
for i in range(image.width):
for j in range(image.height):
r, g, b = image.getpixel((i, j))
# For each pixel, if rgb>threshold,equals to light color==>bk_pixel,then set white(255,255,255)
if r > threshold or g > threshold or b > threshold:
r = 255
g = 255
b = 255
image.putpixel((i, j), (r, g, b))
# else,equals to dark color==>txt_pixel,set black(0,0,0)
else:
r = 0
g = 0
b = 0
image.putpixel((i, j), (r, g, b))
# Turn to grayscale Img
image = image.convert('L')
return image

卷积模型

  • 使用Tensorflow2版本会简单许多
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
tf.summary.trace_on(graph=True, profiler=True)
model = Sequential([
layers.Conv2D(32, kernel_size=[3, 3], padding="same", activation=tf.nn.relu),
layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'),
layers.Dropout(0.5),
layers.Conv2D(64, kernel_size=[3, 3], padding="same", activation=tf.nn.relu),
layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'),
layers.Dropout(0.3),
layers.Conv2D(64, kernel_size=[3, 3], padding="same", activation=tf.nn.relu),
layers.MaxPool2D(pool_size=[2, 2], strides=2, padding='same'),
layers.Dropout(0.25),

layers.Flatten(),

layers.Dense(2480),
layers.Dense(248), # 4*62
layers.Reshape([4, 62])
])

model.build(input_shape=[None, 20, 80, 1])
model.summary()
optimizer = optimizers.Adam(lr=0.43 * 1e-3)

训练数据

  • 同时使用Tensorboard可视化数据,要注意tf1和tf2的tensorboard使用不一样,需要手动将数据写入日志保存
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def IdentifyTrain():
"""
Save model.h5
:return: None
"""
# Repeat training
training_time = 40
loss = 1
for epoch in range(training_time):
for step, (x, y) in enumerate(train_db):
if x.shape == (64, 20, 80, 1):
with tf.GradientTape() as tape:
logits = IdentifyImg.model(x)
y_one_hot = tf.one_hot(y, depth=62)
loss_ce = tf.losses.MSE(y_one_hot, logits)
loss_ce = tf.reduce_mean(loss_ce)
if loss_ce < loss:
IdentifyImg.model.save('model.h5')
loss = loss_ce
print("Save new model.h5 successfully! acc:", float(1 - loss_ce))
grads = tape.gradient(loss_ce, IdentifyImg.model.trainable_variables)
IdentifyImg.optimizer.apply_gradients(zip(grads, IdentifyImg.model.trainable_variables))

if step % 100 == 0:
print(epoch, step)
with IdentifyImg.summary_writer.as_default():
tf.summary.scalar('train_loss', loss, step=epoch)
print("Finally model acc: ", float(1 - loss))


# Generate train set
(x_train, y_train) = IdentifyImg.generateTrainData(train_data_dir)
print(x_train.shape, y_train.shape)

# load_dataset
batch_size = 64
train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_db = train_db.map(IdentifyImg.preprocess).batch(batch_size)

测试数据

  • 调用训练时存下的模型,判断是否四个验证码都相同,返回正确率,通过调整学习率optimizer以及卷积模型,正确识别率达到96%以上。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def IdentifyTest():
"""
correct_rate is over 93% now
:return: correct_rate = correct_quantity/total_number
"""
correct_quantity = 0
total_number = x_test.shape[0]
IdentifyImg.model = tf.keras.models.load_model('model.h5', compile=False)
for step, (x, y) in enumerate(test_db):
if x.shape == (1, 20, 80, 1):
logits = IdentifyImg.model(x)
logits = tf.nn.softmax(logits)
pred = tf.cast(tf.argmax(logits, axis=2), dtype=tf.int32)
true_flag = int(tf.reduce_sum(tf.cast(tf.equal(pred, y), dtype=tf.int32))) == 4
print('Pre:', np.array(IdentifyImg.char_set)[pred[0].numpy()], 'Real:',
np.array(IdentifyImg.char_set)[y[0].numpy()], 'Same:', true_flag)
if true_flag:
correct_quantity += 1
correct_rate = correct_quantity / total_number
return correct_rate


# Generate test set
(x_test, y_test) = IdentifyImg.generateTrainData(test_data_dir)
print(x_test.shape, y_test.shape)
# (num of Img, 20:width, 80:height) (num of Img, 4)


# load_dataset
test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_db = test_db.map(IdentifyImg.preprocess).batch(1)

项目结构

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
.
├── README.md
├── __pycache__
├── logs
├── test
│   └── xxx.png
├── train
│   └── xxx.png
├── GenerateImg.py
├── IdentifyImg.py
├── IdentifyTest.py
├── IdentifyTrain.py
└── model.h5

4 directories, 6 files

Download Source File

  • Copyright: Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.

请我喝杯咖啡吧~