Hi Guys,
Kindly let me know how to extract the table from this pdf using string manipulation, its kinda urgent!
Sample2.pdf (153.3 KB)
Thanks!
Hi Guys,
Kindly let me know how to extract the table from this pdf using string manipulation, its kinda urgent!
Sample2.pdf (153.3 KB)
Thanks!
The interviewer want only string manipulation, i showed her with document understanding also…she is stubborn @Palaniyappan
I wasnt able to can you please help man
Hi @Jai_Pande ,
Could you maybe check the below workflow :
PDF_DataExtraction_Regex.zip (153.6 KB)
The Extraction of Tables in general using String Manipulation/Regex is not that Straight forward, a requirement gathering would require to be performed and we would need to understand different layouts or different formats of the Table in order to arrive at the conclusion of having a proper logic for the Table cell values.
Some of the PDF will always follow a Specific format which is easy for String manipulation/Regex and some others do not, which makes us to dig more deep into the Business case and understand the needs of each fields and what is done when they are not available in the Table itself.
Hence, the above workflow used follows or understands the formatting provided in that One PDF Sample provided and have implemented the logic accordingly.
Several Assumptions have been made for Extracting the data :
Let us know if you are able to understand, however the implementation is maybe a bit more complex, but would suggest you to learn on the Regex parts one by one and then divide each steps and then understand each of it one by one (Do not get overwhelmed).
For the learnings :
I even extracted the result using document understanding and thag too 100 percent accurate, then also she didn’t selected me @supermanPunch
I would politely explain to the interviewer that they don’t know what they’re talking about and shouldn’t demand string manipulation as a solution, because it’s not the right way to do it. If they insist, then I’d probably decide I don’t want to work for them anyway.
Or maybe you could show her this thread
Oh i said that but then she said that " in string manipulation there is logic and in document understanding there isn’t" by the way thanks alot man for vouching up for me it feels better now!! I extracted all the data in one clean excel file (confidence score was 97percent) and i was not selected and the guy who used string manipulation and that too he only extracted one row that too gibberish was selected, i guess you are right when you say i dont wana work for them @postwick