My task is to identify and extract text with strikethrough symbols from an image. I want to select only the words that have this symbol and place each instance in a list.
Image Containing Strikethrough Text
Code I have tried:
from PIL import Imageimport pytesseract# Open the image fileimg_path = 'path/to/image.png'img = Image.open(img_path)# Use tesseract to do OCR on the imagetext = pytesseract.image_to_string(img)text
The issue is that the output includes all words with no sign of a strikethrough symbol. If the string contained an indicator of a strkethrough word or phrase, such as '-', then I could further process it; however, regular pytesseract will not detect the strikethrough in this image.
A better approach will be needed.
Example output: ['Once upon a time', 'Jack', 'village']