Input to my workflow is PDF documents from a folder that do not have a standard format. I need to extract order details which is in tabular format in the PDF. Apart from the tabular data the PDF will also contain paragraphs or customer information. I could identify the line where the tabular data starts by extracting line by line data from PDF by splitting the PDF content using environment.NewLine and by using string function.
Question here is how do we extract the tabular data? If i read using OCR, the data gets realignned without retaining the actual position which makes it difficult to split the fields. Since the position where the tabular data is present varies for each template, i need to pass the clipping region dynamically and extract structured data based on that. Appreciate your help on this with a simple example.