問題描述
Pytesseract 或 Keras OCR 從圖像中提取文本 (Pytesseract or Keras OCR to extract text from image)
我正在嘗試從圖像中提取文本。目前我得到空字符串作為輸出。下面是我的 pytesseract 代碼,儘管我也對 Keras OCR 持開放態度:‑
from PIL import Image
import pytesseract
path = 'captcha.svg.png'
img = Image.open(path)
captchaText = pytesseract.image_to_string(img, lang='eng', config='‑‑psm 6')
我不確定如何使用 svg 圖像,所以我將它們轉換為 png。下面是一些示例圖片:‑
keras‑ocr
not working or returning nothing is because of the grayscale image (as I found it worked otherwise). See below:
from PIL import Image
a = Image.open('/content/gD7vA.png') # return none by keras‑ocr,
a.mode, a.split() # mode 1 channel + transparent layer / alpha layer (LA)
b = Image.open('/content/CYegU.png') # return result by keras‑ocr
b.mode, b.split() # mode RGB + transparent layer / alpha layer (RGBA)
In the above, the a
is the file you mention in your question; as It showed, it has to channel, e.g. grayscale and transparent layer. And b
is the file I converted to RGB
or RGBA
. The transparent layer already included in your original file and I didn't remove it, but it seems useless to keep otherwise if needed. In short, to make your input work on keras‑ocr
, you can convert your files to RGB
(or RGBA
) first and save them on disk. And then pass them to ocr.
# Using PIL to convert one mode to another
# and save on disk
c = Image.open('/content/gD7vA.png').convert('RGBA')
c.save(....png)
c.mode, c.split()
('RGBA',
(<PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A410>,
<PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A590>,
<PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A810>,
<PIL.Image.Image image mode=L size=150x50 at 0x7F03E8E7A110>))
Full code
import matplotlib.pyplot as plt
# keras‑ocr will automatically download pretrained
# weights for the detector and recognizer.
pipeline = keras_ocr.pipeline.Pipeline()
# Get a set of three example images
images = [
keras_ocr.tools.read(url) for url in [
'/content/CYegU.png', # mode: RGBA; Only RGB should work too!
'/content/bw6Eq.png', # mode: RGBA;
'/content/jH2QS.png', # mode: RGBA
'/content/xbADG.png' # mode: RGBA
]
]
# Each list of predictions in prediction_groups is a list of
# (word, box) tuples.
prediction_groups = pipeline.recognize(images)
Looking for /root/.keras‑ocr/craft_mlt_25k.h5
Looking for /root/.keras‑ocr/crnn_kurapan.h5
prediction_groups
[[('zum', array([[ 10.658852, 15.11916 ],
[148.90204 , 13.144257],
[149.39563 , 47.694347],
[ 11.152428, 49.66925 ]], dtype=float32))],
[('sresa', array([[ 5., 15.],
[143., 15.],
[143., 48.],
[ 5., 48.]], dtype=float32))],
[('sycw', array([[ 10., 15.],
[149., 15.],
[149., 49.],
[ 10., 49.]], dtype=float32))],
[('vdivize', array([[ 10.407883, 13.685192],
[140.62648 , 16.940662],
[139.82323 , 49.070583],
[ 9.604624, 45.815113]], dtype=float32))]]
Display
# Plot the predictions
fig, axs = plt.subplots(nrows=len(images), figsize=(20, 20))
for ax, image, predictions in zip(axs, images, prediction_groups):
keras_ocr.tools.drawAnnotations(image=image, predictions=predictions, ax=ax)
(by Abhash Upadhyaya、M.Innat)