使用Python和OpenCV实现实时文档扫描与矫正系统_开发_开发者

一、系统概述

该系统主要实现以下功能：

实时摄像头捕获图像
边缘检测和轮廓查找
文档轮廓识别
透视变换矫正文档
二值化处理增强可读性

二、核心代码解析

1. 导入必要库

import numpy as np
import cv2

我们主要使用NumPy进行数值计算，OpenCV进行图像处理。

2. 辅助函数定义

首先定义了一个简单的图像显示函数，方便调试：

def cv_show(name,img):
    cv2.imshow(name,img)
    cv2.waitKey(10)

3. 坐标点排序函数

order_points函数用于将检测到的文档四个角点按顺序排列（左上、右上、右下、左下）：

def order_points(pts):
    rect = np.zeros((4,2),dtype="float32")
    s = pts.sum(axis=1)
    rect[0] = pts[np.argmin(s)]  # 左上点(x+y最小)
    rect[2] = pts[np.argmax(s)]  # 右下点(x+y最大)
    diff = np.diff(pts,axis=1)
    rect[1] = pts[np.argmin(diff)]  # 右上点(y-x最小)
    rect[3] = pts[np.argmax(diff)]  # 左下点(y-x最大)
    return rect

这个函数的作用是对给定的4个二维坐标点进行排序，使其按照左上、右上、右下、左下的顺序排列。这在文档扫描、图像矫正等应用中非常重要，因为我们需要知道每个角点的确切位置才能正确地进行透视变换。

函数详细解析

（1）排序逻辑说明

左上点(rect[0])：选择x+y值最小的点
- 因为左上角在坐标系中 x 和 y 值都较小，相加结android果最小
右下点(rect[2])：选择x+y值最大的点
- 因为右下角在坐标系中 x 和 y 值都较大，相加结果最大
右上点(rect[1])：选择y-x值最小的点
- 右上角的特点是 y 相对较小而 x 相对较大，所以 y-x 值最小
左下点(rect[3])：选择y-x值最大的点
- 左下角的特点是 y 相对较大而 x 相对较小，所以 y-x 值最大

（2）示例

假设有4个点：

	A(10, 20)  # 假设是左上
	B(50, 20)  # 右上
	C(50, 60)  # 右下
	D(10, 60)  # 左下

计算过程：

x+y值：[30, 70, 110, 70]
- 最小30 → A(左上)
- 最大110 → C(右下)
y-x值：[10, -30, 10, 50]
- 最小-30 → B(右上)
- 最大50 → D(左下)

最终排序结果：[A, B, C, D] 即 [左上, 右上, 右下, 左下]

（3）为什么这种方法有效

这种方法利用了二维坐标点的几何特性：

在标准坐标系中，左上角的x和y值都较小
右下角的x和y值都较大
右上角的x较大而y较小
左下角的x较小而y较大

通过简单的加减运算就能可靠地区分出各个角点，不需要复杂的几何计算。

4. 透视变换函数

four_point_transform函数实现了文档矫正的核心功能：

def four_point_transform(image,pts):
    rect = order_points(pts)
    (tl,tr,br,bl) = rect
    
    # 计算变换后的宽度和高度
    widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    maxWidth = max(int(widthA),int(widthB))
    
    heighta = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    maxHeight = max(int(heightA),int(heightB))
    
    # 定义目标图像坐标python
    dst = np.array([[0,0],[maxWidth - 1,0],
                    [maxWidth - 1,maxHeight - 1],[0,maxHeight - 1]],dtype="float32")

    # 计算透视变换矩阵并应用
    M = cv2.getPerspectiveTransform(rect,dst)
    warped = cv2.warpPerspective(image,M,(maxWidth,maxHeight))
    
    return warped

这个函数实现了透视变换(Perspective Transformation)，用于将图像中的任意四边形区域矫正为一个矩形（即"去透视"效果）。

函数详细解析

输入参数

def four_point_transform(image, pts):

image: 原始图像
pts: 包含4个点的数组，表示要转换的四边形区域

坐标点排序

rect = order_points(pts)
(tl, tr, br, bl) = rect  # 分解为左上(top-left)、右上(top-right)、右下(bottom-right)、左下(bottom-left)

使用之前介绍的order_points函数将4个点按顺序排列

计算输出图像的宽度

widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))  # 底边长度
widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))  # 顶边长度
maxWidth = max(int(widthA), int(widthB))  # 取最大值作为输出图像宽度

计算四边形底部和顶部的边长，选择较长的作为输出宽度

计算输出图像的高度

heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))  # 右边高度
heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))  # 左边高度
maxHeight = max(int(heightA), int(heightB))  # 取最大值作为输出图像高度

计算四边形右侧和左侧的边长，选择较长的作为输出高度

定义目标矩形坐标

dst = np.array([
    [0, 0],  # 左上
    [maxWidth - 1, 0],  # 右上
    [maxWidth - 1, maxHeight - 1],  # 右下
    [0, maxHeight - 1]  # 左下
], dtype="float32")

定义变换后的矩形角点坐标（从(0,0)开始的正矩形）

计算透视变换矩阵并应用

M = cv2.getPerspectiveTransform(rect, dst)  # 计算变换矩阵
warped = cv2.warpPerspective(image, M, (maxWidth, maxHeight))  # 应用变换

getPerspectiveTransform: 计算从原始四边形到目标矩形的3x3变换矩阵
warpPerspective: 应用这个变换矩阵到原始图像

返回结果

return warped

返回矫正后的矩形图像

透视变换原理图示

原始图像中的四边形               变换后的矩形
   tl--------tr                    0--------maxWidth
    \        /                      |        |
     \      /                       |        |
      bl----br                       maxHeight

为什么需要这样计算宽度和高度？

取最大值的原因：

原始四边形可能有透视变形，两条对边长度可能不等
选择较大的值可以确保所有内容都能包含在输出图像中

减1的原因：

图像坐标从0开始，所以宽度为maxWidth的图像，最大x坐标是maxWidth-1

5. 主程序流程

主程序实现了实时文档检测和矫正的完整流程：

初始化摄像头

cap = cv2.VideoCapture(0)
if not cap.isOpened():
    print("Cannot open camera")
    exit()

实时处理循环

while True:
    flag = 0
    ret,image = cap.read()
    orig = image.copy()
    if not ret:
        print("不能读取摄像头")
        break

图像预处理

gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray,(5,5),0)  # 高斯滤波降噪
edged = cv2.Canny(gray,75,200)  # Canny边缘检测

轮廓检测与筛选

cnts = cv2.findContours(edged,cv2.RETR_EXTIkxLdcYFERNAL,cv2.CHAIN_APPROX_SIMPLE)[-2]
cnts = sorted(cnts,key=cv2.contourArea,reverse=True)[:3]  # 取面积最大的3个轮廓

for c in cnts:
    peri = cv2.arcLength(c,True)  # 计算轮廓周长
    approx = cv2.approxPolyDP(c,0.05 * peri,True)  # 多边形近似
    area = cv2.contourArea(approx)
    
    # 筛选四边形且面积足够大的轮廓
    if area > 20000 and len(approx) == 4:
        screenCnt = approx
        flag = 1
        break

文档矫正与显示

if flag == 1:
    # 绘制轮廓
    image_contours = cv2.drawContours(image,[screenCnt],0,(0,255,0),2)
    
    # 透视变换
    warped = four_point_transform(orig,screenCnt.reshape(4,2))
    
    # 二值化处理
    warped = cv2.cvtColor(warped,cv2.COLOR_BGR2GRAY)
    ref = cv2.threshold(warped,0,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]

三、完整代码

# 导入工具包
import numpy as np
import cv2

def cv_show(name,img):
    cv2.imshow(name,img)
    cv2.waitKey(10)
def order_points(pts):
    # 一共4个坐标点
    rect = np.zeros((4,2),dtype="float32") # 用来存储排序之后的坐标位置
    # 按顺序找到对应坐标0123分别是 左上、右上、右下、左下
    s = pts.sum(axis=1) #对pts矩阵的每一行进行求和操作，（x+y）
    rect[0] = pts[np.argmin(s)]
    rect[2] = pts[np.argmax(s)]
    diff = np.diff(pts,axis=1) #对pts矩阵的每一行进行求差操作，（y-x）
    rect[1] = pts[np.argmin(diff)]
    rect[3] = pts[np.argmax(diff)]
    return rect

def four_point_transform(image,pts):
    # 获取输入坐标点
    rect = order_points(pts)
    (tl,tr,br,bl) = rect
    # 计算输入的w和h值
    widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    maxWidth = max(int(wandroididthA),int(widthB))
    heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    maxHeight = max(int(heightA),int(heightB))
    # 变换后对应坐标位置
    dst = np.array([[0,0],[maxWidth - 1,0],
                    [maxWidth - 1,maxHeight - 1],[0,maxHeight - 1]],dtype="float32")

    M = cv2.getPerspectiveTransform(rect,dst)
    warped = cv2.warpPerspective(image,M,(maxWidth,maxHeight))
    # 返回变换后的结果
    return warped


# 读取输入
import cv2
cap = cv2.VideoCapture(0)  # 确保摄像头是可以启动的状态
if not cap.isOpened():   #打开失败
    print("Cannot open camera")
    exit()

while True:
    flag = 0 # 用于标时 当前是否检测到文档
    ret,image = cap.read()  # 如果正确读取帧，ret为True
    orig = image.copy()
    if not ret: #读取失败，则退出循环
        print("不能读取摄像头")
        break
    cv_show("image",image)

    gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
    # 预处理
    gray = cv2.GaussianBlur(gray,(5,5),0) # 高斯滤波
    edged = cv2.Canny(gray,75,200)
    cv_show('1',edged)

    # 轮廓检测
    cnts = cv2.findContours(edged,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)[-2]

    cnts = sorted(cnts,key=cv2.contourArea,reverse=True)[:3]
    image_contours = cv2.drawContours(image,cnts,-1,(0,255,0),2)
    cv_show("image_contours",image_contours)
    # 遍历轮廓
    for c in cnts:
        # 计算轮廓近似
        peri = cv2.arcLength(c,True) # 计算轮廓的周长
        # C 表示输入的点集
        # epsilon表示从原始轮廓到近似轮廓的最大距离，它是一个准确度参数
        # True表示封闭的
        approx = cv2.approxPolyDP(c,0.05 * peri,True) # 轮廓近似
        area = cv2.contourArea(approx)
        # 4个点的时候就拿出来
        if area > 20000 and len(approx) == 4:
            screenCnt = approx
            flag = 1
            print(peri,area)
            print("检测到文档")
            break
    if flag == 1:
        # 展示结果
        # print("STEP 2: 获取轮廓")
        image_contours = cv2.drawContours(image,[screenCnt],0,(0,255,0),2)
        cv_show("image",image_contours)
        # 透视变换
        warped = four_point_transform(orig,screenCnt.reshape(4,2))
        cv_show("warped",warped)
        # 二值处理
        warped = cv2.cvtColor(warped,cv2.COLOR_BGR2GRAY)
        # ref = cv2.threshold(warped,220,255,cv2.THRESH_BINARY)[1]
        ref = cv2.threshold(warped,0,255,cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
  php      cv_show("ref",ref)
cap.release() # 释放捕捉器
cv2.destroyAllWindows() #关闭图像窗口