Extracting information in a PDF table and relate it with the columns

Garcia_Hoyos_Daniel · April 14, 2021, 7:59am

Hi,
I’m working on an automation to extract PDF information to an excel. I have extracted the ‘easy’ information with Substrings and Regex. My problem is when I have this table because when I transform it to a text file, the information is misplaced. I think I can use document understanding but, I’ve never worked with it.
What I exactly want it’s to extract the information inside the table and relate it with the column and the row. I’m working with different tables; different number of rows but, the number of columns stays. Therefore, I want to extract, for instance, “Procedimiento VFR Incumplido” and relate that it’s column “AERONAVE”. Another example: extract “Separación inadecuada” and relate with “CONSECUENCIAS”
I know isn’t easy to do this but, I hope someone can help me.

Thanks!

system · April 16, 2021, 4:01pm

Hello @Garcia_Hoyos_Daniel!

It seems that you have trouble getting an answer to your question in the first 24 hours.
Let us give you a few hints and helpful links.

First, make sure you browsed through our Forum FAQ Beginner’s Guide. It will teach you what should be included in your topic.

You can check out some of our resources directly, see below:

Always search first. It is the best way to quickly find your answer. Check out the icon for that.
Clicking the options button will let you set more specific topic search filters, i.e. only the ones with a solution.
Topic that contains most common solutions with example project files can be found here.
Read our official documentation where you can find a lot of information and instructions about each of our products:
Watch the videos on our official YouTube channel for more visual tutorials.
Meet us and our users on our Community Slack and ask your question there.

Hopefully this will let you easily find the solution/information you need. Once you have it, we would be happy if you could share your findings here and mark it as a solution. This will help other users find it in the future.

Thank you for helping us build our UiPath Community!

Cheers from your friendly
Forum_Staff

AndresTarazona · May 1, 2021, 3:01am

Hola @Garcia_Hoyos_Daniel !

Espero en español te sea mas fácil mi respuesta. Analizando las columnas en el documento de muestra veo que estas pueden variar dependiendo de las horas que vayas a reportar, y también que están subdivididas en Evento/Factor Descriptivo. No sé si la actividad de Form Extractor de Document Understanding te vaya a funcionar pero puedes intentarlo.

También puedes explorar la opción de utilizar la actividad RegexBasedExtractor, la cual dentro de sus opciones trae la opción de alinear visualmente el texto extraído, lo cual previene el error de texto fuera de lugar/posición que mencionas.

Me cuentas si tienes dudas.

Saludos,
Andres

Topic		Replies	Views
Extracting table from PDF and splitting row by column Studio studio , question , properties_panel	18	4456	April 20, 2022
Help / Expert advice needed: PDF Table extraction (Purchase Order to Excel) Studio studio , question , document_understanding , pdf-extraction , table-extraction , invoices	17	1584	October 17, 2023
Extract Varying Size PDF Using Document Understanding Action Center uiautomation , studio , question , document_understanding , action_center	2	797	February 2, 2023
Extract values in PDF Studio	8	1372	June 16, 2023
Extract table from PDF - Document Understanding Studio studio , question , activities_panel	5	200	October 19, 2024

Extracting information in a PDF table and relate it with the columns

Related topics