Hi,
I have a pdf which contains acroform, I want extract the data into json format. Any idea on how to go about
Hi,
I have a pdf which contains acroform, I want extract the data into json format. Any idea on how to go about
To read the data from a PDF AcroForm and convert it into JSON format in UiPath, you can use the following steps:
Use the “Read PDF Text” activity to read the text from the PDF file. This will retrieve the text from all the fields in the PDF form, including any metadata and labels.
Use the “Deserialize JSON” activity to create a JSON object. This activity allows you to create a JSON schema or use an existing schema to deserialize the data.
Create a dictionary object in UiPath to store the data. You can use the “Invoke Method” activity to create a dictionary and add key-value pairs to it. For example, you can create a dictionary with the field names as keys and the field values as values.
Use regular expressions or string manipulation methods to extract the data from the text retrieved in step 1. For example, you can use the “Regex.Match” method to find the text between two labels or the “String.Split” method to separate the text based on a delimiter.
Add the key-value pairs to the dictionary object created in step 3.
Serialize the dictionary object into a JSON string using the “Serialize JSON” activity. This activity allows you to specify the object you want to serialize and the output format.
Write the JSON string to a file or send it to another application using the appropriate activity.
Here’s an example of the code you can use within the steps:
' Read PDF text
pdfText = Read PDF Text("pdfPath")
' Create dictionary object
pdfData = New Dictionary(Of String, String)
' Extract data from text
pdfData("FirstName") = Regex.Match(pdfText, "First Name: (.*)").Groups(1).Value
pdfData("LastName") = Regex.Match(pdfText, "Last Name: (.*)").Groups(1).Value
pdfData("Email") = Regex.Match(pdfText, "Email Address: (.*)").Groups(1).Value
' Serialize dictionary object to JSON
jsonString = Serialize JSON(pdfData)
' Write JSON string to file
Write Text File("jsonPath", jsonString)
Note: Replace the field names and regular expressions with the appropriate values for your PDF form.
Please try this vb.net code…you need itext7 for this
Dim pdfDocument As New iText.Kernel.Pdf.PdfDocument(New iText.Kernel.Pdf.PdfReader("PDFFilePath"))
Dim form As iText.Forms.Fields.PdfAcroForm = iText.Forms.Fields.PdfAcroForm.GetAcroForm(pdfDocument, True)
Dim fields As IDictionary(Of String, iText.Forms.Fields.PdfFormField) = form.GetFormFields()
For Each fieldName As String In fields.Keys
Dim field As iText.Forms.Fields.PdfFormField = fields(fieldName)
Dim value As String = field.GetValueAsString()
Console.WriteLine("Field name: " & fieldName & ", value: " & value)
Next
pdfDocument.Close()
You can create a dictionary output I have just printed it
cheers
how can i install itext libraries
any idea on what namespaces to be imported
Please search for itext7 in manage packages…
You should be getting the required itext7
Cheers
Please try this
Dim pdfDocument As New iText.Kernel.Pdf.PdfDocument(New iText.Kernel.Pdf.PdfReader("PDFFilePath"))
Dim form As iText.Forms.PdfAcroForm = iText.Forms.PdfAcroForm.GetAcroForm(pdfDocument, True)
Dim fields As IDictionary(Of String, iText.Forms.Fields.PdfFormField) = form.GetFormFields()
For Each fieldName As String In fields.Keys
Dim field As iText.Forms.Fields.PdfFormField = fields(fieldName)
Dim value As String = field.GetValueAsString()
Console.WriteLine("Field name: " & fieldName & ", value: " & value)
Next
pdfDocument.Close()
Cheers
Hi Anil,
am not able to import iText.Forms.PdfAcroForm
thank you anil
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.