Input to my workflow is PDF documents from a folder that do not have a standard format. I need to extract order details which is in tabular format in the PDF. Apart from the tabular data the PDF will also contain paragraphs or customer information. I could identify the line where the tabular data starts by extracting line by line data from PDF by splitting the PDF content using environment.NewLine and by using string function.
Question here is how do we extract the tabular data? If i read using OCR, the data gets realignned without retaining the actual position which makes it difficult to split the fields. Since the position where the tabular data is present varies for each template, i need to pass the clipping region dynamically and extract structured data based on that. Appreciate your help on this with a simple example.
Oh ok then…
All i think of is either by scraper or read pdf text but the both will return string output and then
You need to make use of indexing and substring to get each item and then pass to excel(optional).
PS: How about Generate Table activity : Generates a DataTable variable from unstructured data .
In CE 2017 edition it is integrated with scraper where user has the option to choose Column separator (space/tab/newline) and newline separator (space/tab/newline) and return the data table as the output.
Two Option
1.Use Read PDF activity and then you have choice to set the PDF page number.(extracting process remains same as mentioned in previous comment.)
2.Else you just have to use PDF shortcut keys (Ctrl+Shift+n or page down) by using SendHotKey Activity and perform Extracting.
This will not work for my scenario. Attaching the samples. The position where the tabular data is present will vary for each format.Sample.pdf (178.0 KB) Sample1.pdf (181.6 KB)
I want the tabular data which has product code, product description, supplier id, Csct, Quantity, unit price and extended price to be extracted. If I replace space by comma separator, description with space in between will be difficult to handle
Dear Team,
i am not able to see “Generate Table” button under “Screen Scrapping” wizard, instead, i am getting “Copy to Clipboard”.
i am using “CV 2019.4.2”
This functionality has been decoupled. You should now use the Generate Data Table activity. It is possible to paste your example input in its wizard and this is why you see an option to copy to clipboard in your Screen Scraping wizard