Reaching out to see if anyone might have an answer for how I can extract or Scrape this data. See picture below
The areas with a Red box around them is what I need. They are very small and OCR seems to be having a lot of trouble. These are the things I have tried
- Extract just Text ( does not work because the CAD file is flattened before it is sent out so this is considered an IMAGE
- Extract text with OCR trying multiple OCR engines Google, Microsoft.
- Changing Scale with OCR scan so it can get make the small text a little bigger
- Changing my DEFAULT app for opening PDFs I typically do my measuring on BLUEBEAM but since this is a very advanced software I did not want to wait for the load time each time it opened a new PDF so I tried just opening PDFs with Google and then using a scrape function ( this did not work either )
- I most recently tried to change my PDF viewer to Adobe thinking it might help but it seems to not of made much of a difference.
If I could pull the text only, I am not to bad with Regex now and I wouldn’t mind using a matching activity but I can’t even get there.
After everything I have tried, I am almost certain this will have to be where a human does the leg work and types in these fields so the bot can continue but that really loses touch with the fact this is automation.
I look forward to hearing from anyone who thinks they have a good answer for this. If you need me to send you a sample version of my PDF for testing on your side let me know.