Reading pdf acroform into json

Hi,

I have a pdf which contains acroform, I want extract the data into json format. Any idea on how to go about

@May_Prince

To read the data from a PDF AcroForm and convert it into JSON format in UiPath, you can use the following steps:

  1. Use the “Read PDF Text” activity to read the text from the PDF file. This will retrieve the text from all the fields in the PDF form, including any metadata and labels.

  2. Use the “Deserialize JSON” activity to create a JSON object. This activity allows you to create a JSON schema or use an existing schema to deserialize the data.

  3. Create a dictionary object in UiPath to store the data. You can use the “Invoke Method” activity to create a dictionary and add key-value pairs to it. For example, you can create a dictionary with the field names as keys and the field values as values.

  4. Use regular expressions or string manipulation methods to extract the data from the text retrieved in step 1. For example, you can use the “Regex.Match” method to find the text between two labels or the “String.Split” method to separate the text based on a delimiter.

  5. Add the key-value pairs to the dictionary object created in step 3.

  6. Serialize the dictionary object into a JSON string using the “Serialize JSON” activity. This activity allows you to specify the object you want to serialize and the output format.

  7. Write the JSON string to a file or send it to another application using the appropriate activity.

Here’s an example of the code you can use within the steps:

' Read PDF text
pdfText = Read PDF Text("pdfPath")

' Create dictionary object
pdfData = New Dictionary(Of String, String)

' Extract data from text
pdfData("FirstName") = Regex.Match(pdfText, "First Name: (.*)").Groups(1).Value
pdfData("LastName") = Regex.Match(pdfText, "Last Name: (.*)").Groups(1).Value
pdfData("Email") = Regex.Match(pdfText, "Email Address: (.*)").Groups(1).Value

' Serialize dictionary object to JSON
jsonString = Serialize JSON(pdfData)

' Write JSON string to file
Write Text File("jsonPath", jsonString)

Note: Replace the field names and regular expressions with the appropriate values for your PDF form.

@May_Prince

Please try this vb.net code…you need itext7 for this

Dim pdfDocument As New iText.Kernel.Pdf.PdfDocument(New iText.Kernel.Pdf.PdfReader("PDFFilePath"))
Dim form As iText.Forms.Fields.PdfAcroForm = iText.Forms.Fields.PdfAcroForm.GetAcroForm(pdfDocument, True)
Dim fields As IDictionary(Of String, iText.Forms.Fields.PdfFormField) = form.GetFormFields()
For Each fieldName As String In fields.Keys
    Dim field As iText.Forms.Fields.PdfFormField = fields(fieldName)
    Dim value As String = field.GetValueAsString()
    Console.WriteLine("Field name: " & fieldName & ", value: " & value)
Next
pdfDocument.Close()

You can create a dictionary output I have just printed it

cheers

how can i install itext libraries

any idea on what namespaces to be imported

@May_Prince

Please search for itext7 in manage packages…

You should be getting the required itext7

Cheers

Hi anil,

unfortunately it is not showing

@May_Prince

Please check your filters

And are you using windows compatibility only?

cheers

thank you filter removal worked, here am not able to access pdfacroform

@May_Prince

Please try this

Dim pdfDocument As New iText.Kernel.Pdf.PdfDocument(New iText.Kernel.Pdf.PdfReader("PDFFilePath"))
Dim form As iText.Forms.PdfAcroForm = iText.Forms.PdfAcroForm.GetAcroForm(pdfDocument, True)
Dim fields As IDictionary(Of String, iText.Forms.Fields.PdfFormField) = form.GetFormFields()
For Each fieldName As String In fields.Keys
    Dim field As iText.Forms.Fields.PdfFormField = fields(fieldName)
    Dim value As String = field.GetValueAsString()
    Console.WriteLine("Field name: " & fieldName & ", value: " & value)
Next
pdfDocument.Close()

Cheers

1 Like

Hi Anil,

am not able to import iText.Forms.PdfAcroForm

@May_Prince

Is this the package you installed/

I see no errors

image

image

cheers

still I am getting this error.

@May_Prince

Please use the new code i have given…you are still using the old one

Cheers

thank you anil

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.