Can anyone tellme how to extract the table from this pdf using string manipulation?

Hi Guys,

Kindly let me know how to extract the table from this pdf using string manipulation, its kinda urgent!
Sample2.pdf (153.3 KB)

Thanks!

Did you try with this component

Cheers @Jai_Pande

The interviewer want only string manipulation, i showed her with document understanding also…she is stubborn @Palaniyappan

Then u can try with Regex method @Jai_Pande

I wasnt able to can you please help man

Hi @Jai_Pande ,

Could you maybe check the below workflow :
PDF_DataExtraction_Regex.zip (153.6 KB)

The Extraction of Tables in general using String Manipulation/Regex is not that Straight forward, a requirement gathering would require to be performed and we would need to understand different layouts or different formats of the Table in order to arrive at the conclusion of having a proper logic for the Table cell values.

Some of the PDF will always follow a Specific format which is easy for String manipulation/Regex and some others do not, which makes us to dig more deep into the Business case and understand the needs of each fields and what is done when they are not available in the Table itself.

Hence, the above workflow used follows or understands the formatting provided in that One PDF Sample provided and have implemented the logic accordingly.

Several Assumptions have been made for Extracting the data :

  1. There are always 2 lines for One Row field of the Table.
  2. First Column is a Date Range field in the format MM/dd/yy
  3. The End Service Date in the First Column is Optional
  4. All the Other columns Except the Last column is Mandatory.
  5. All columns starting from third column are numeric value only
  6. With Additional assumption that there are always 2 or more spaces between the values.

Let us know if you are able to understand, however the implementation is maybe a bit more complex, but would suggest you to learn on the Regex parts one by one and then divide each steps and then understand each of it one by one (Do not get overwhelmed).

For the learnings :

I even extracted the result using document understanding and thag too 100 percent accurate, then also she didn’t selected me @supermanPunch

I would politely explain to the interviewer that they don’t know what they’re talking about and shouldn’t demand string manipulation as a solution, because it’s not the right way to do it. If they insist, then I’d probably decide I don’t want to work for them anyway.

Or maybe you could show her this thread :slight_smile:

2 Likes

Oh i said that but then she said that " in string manipulation there is logic and in document understanding there isn’t" by the way thanks alot man for vouching up for me it feels better now!! I extracted all the data in one clean excel file (confidence score was 97percent) and i was not selected and the guy who used string manipulation and that too he only extracted one row that too gibberish was selected, i guess you are right when you say i dont wana work for them @postwick