Saving Extracted Text from PDFs in JSON Format Using UiPath

Ray_Shadow · August 15, 2023, 3:00pm

Hi there,

I am currently working on an UiPath project where I need to extract text from multiple PDF/JPG files, process the extracted text, and then save it in JSON format in separate text files. I’ve successfully managed to extract the text from the PDFs and save them as individual text files, but now I’m looking to convert the extracted text into JSON format and save it in the text files. Each line of text should be represented as a separate JSON object in the file.

Could anyone provide guidance on how to achieve this? Specifically, I would like to know how to transform the extracted text into JSON format and how to correctly structure and save the JSON data in separate text files for each PDF. Any insights or sample code would be greatly appreciated!

This is the desired output format per JPG/PDF:

{
“Location”: “Gotham City”,
“Company”: “Wayne Enterprises”,
“VAT Number”: “123456789”,
“Slip Type”: “Type B”,
“Date Time On Slip”: “2023-08-16 14:30:00”,
“Applicants Name”: “Bruce Wayne”,
“ABC Code”: “SS694200”,
“Type Category”: “Tourism”,
“Sub Category”: “Leisure”,
“AAA Fee”: 6.00,
“ABCD Fee”: 9.00,
“Service Fee”: 4.00,
“SMS Fee”: 2.00,
“VAT”: 69.42,
“Total ABCDE Fees Non Vatable”: 69.00,
“Total ABCDEF Fees Including VAT”: 420.69
}

Thanks Again!

postwick · August 15, 2023, 3:08pm

Create jsonObj as datatype Newtonsoft.Json.Linq.JObject and initialize (in the default) as New JObject.

For each value you want to add:

jsonObj.Add(“Location”,“Gotham City”)

then jsonObj.ToString will give you your desired output.

postwick · August 15, 2023, 3:17pm

It’s easiest to do it in Invoke Code:

Invoke Code make sure the argument is in/out:

output:

(Actually, I accidentally left jsonObj as an In argument, and it still worked. I suspect this is because the object exists outside the Invode Code and therefore the .Add still updates it outside)

supermanPunch · August 15, 2023, 3:25pm

Hi @Ray_Shadow ,

We would also need to know how the format of the Extracted Text is and have you stored it as a Key-Value pair or are you yet to Extract the Values accordingly mentioned.

If already extracted, how is it stored? Datatable or Dictionary ?

If yet to Extract the necessary details, we would ask you to provide us with a Sample of the Extracted text so that we can check on each value extraction and the necessary steps to bring it to the require format.

Ray_Shadow · August 15, 2023, 3:53pm

Here is a dummy Receipt example:

Gotham City
Wayne Enterprises
VAT123456789
Type B
2023/08/16 14:30:00
Mr.Bruce Wayne
SS694200
Category: Tourism
Sub Category: Leisure
Sub Type Category: Leisure
AAA Fee: 6.00
ABCD Fee: 9.00
Service Fee: 4.00
SMS Fee: 2.00
VAT: 69.42
Total ABCDE Fees Non Vatable: 69.00
Total ABCDEF Fees Including VAT: 420.69

Hope that helps

Usha_Jyothi · August 15, 2023, 4:00pm

Please try this

Dim pdfDocument As New iText.Kernel.Pdf.PdfDocument(New iText.Kernel.Pdf.PdfReader("PDFFilePath"))
Dim form As iText.Forms.PdfAcroForm = iText.Forms.PdfAcroForm.GetAcroForm(pdfDocument, True)
Dim fields As IDictionary(Of String, iText.Forms.Fields.PdfFormField) = form.GetFormFields()
For Each fieldName As String In fields.Keys
    Dim field As iText.Forms.Fields.PdfFormField = fields(fieldName)
    Dim value As String = field.GetValueAsString()
    Console.WriteLine("Field name: " & fieldName & ", value: " & value)
Next
pdfDocument.Close()

Cheers

Sai_Ganesh2 · March 24, 2024, 1:53pm

Hello sir I would like to know how to convert the pdf into json format please enlighten me with the packages and sequences. This is important for me right now. ill rephrase my question please enlighten me on “how to convert pdf (its information) into JSON format”??
Thank you!

Topic		Replies	Views
Reading pdf acroform into json Activities pdf , pdf-extraction	15	1503	April 30, 2023
How can we convert a text to JSON in uipath Studio studio , question , activities_panel	8	6889	February 11, 2022
Extract PDF-Activity (OCR) and return in structured JSON Activities ocr , studio , pdf-extraction	1	30	December 13, 2024
Need Help with Data Extraction from OCR Processed Images in UiPath Studio ocr , activities , studio , regex , question , json , activities_panel , loop	5	956	August 24, 2023
Convert Specific Data from Pdf to json format Studio studio , question , workflow_diff	1	917	October 6, 2022

Most Active Users - Yesterday
ashokkarale
V_Roboto_V
Ruhi_Sayyad
Yoichi
Parvathy
sonaliaggarwal47
lrtetala
adi.mehare
Anelisa_Bolosha1
NANANA
More details...

Saving Extracted Text from PDFs in JSON Format Using UiPath

Related topics