I have +1000 scanned and very unstructured pdfs and I have input all their content into a massive csv file to analyse all their content at the same time. To find out which line in the csv file belongs to each individual pdf, I have tried to concatenate the name of each specific pdf file to its lines but I haven’t been able to.
This is how the final csv file should look like
Line1
Pdf1, line1 of pdf 1
Line 2
Pdf1, line2 of pdf 1
Line 3
Pdf1, line3 of pdf 1
Line 4
Pdf2, line1 of pdf 2
Line 5
Pdf2, line2 of pdf 2
Line 6
Pdf2, line3 of pdf 2
Basically, I just need to know which line belongs to each pdf file so I dont mix their content when I analyse them #pleasesavemylife. Thanks
Thanks for that @rkelchuri. This workflow is kind of complicated to me, a ‘newbie’ so I have a few questions: What activities I need to drop in ‘All Values’ sequence? ‘All Values’ is in an excel scope but what I need is a csv file. Taking these attached pdf files that I randomly picked from the internet as samples Sample 1.pdf (44.1 KB) Sample 2.pdf (191.7 KB)
I need the final csv file to look like this Final.zip (5.9 KB)
Answer: All Values’ sequence is an empty and you can delete or use how the way you want. it’s just an empty box. Yes it is inside excel scope so you can read column data by using Assign activity and Use this → row(“ColumnName”).toString to read specific column data from excel.
My script is very simple one. If you copy .xaml file in the folder where you have all 1000 pdf files then it will automatically detect each pdf file and show you file name. by this you can get all 1000 pdf file names. Now use write text file function to write PDF file name into CSV file how you are writing now.
Next step. start using read PDF Activity and get all PDF data into a string variable and paste that into each line of csv file.
Similar process repeat for each Pdf file.
This is how you can design.
Hope my inputs are useful.
Your input is very helpful, you cannot image how much I appreciate it , but I still need a little bit more of it. I think I am now very close. These are the activities I dropped into ‘All Values’ and the variables I have created
I havent been able to add the FileName in every line, I think the Input text property in ‘Write Text File’ activity ‘pdfText +ReadColumn’ is not the wright expression???