Hi Friends,
I need to extract header of each pdf page. the header is not same but its position is fixed. I was trying to use regex but not sure how to use it in this case? Can anyone help me around this?
Thanks in advance!
Hi Friends,
I need to extract header of each pdf page. the header is not same but its position is fixed. I was trying to use regex but not sure how to use it in this case? Can anyone help me around this?
Thanks in advance!
could you share sample pdf ?
can you please provide sample data
This is the top section of one of the pdf page. I need to extract the name which in this case is ‘Kiran Malhotra’.
Fine
We can use read pdf and get the output with a variable of type smarting named str_input
—can I have a view in the output that we get once after reading the file so that we can look on how to use the Regex or split method
Kindly share that screen shot with a writeline activity from output panel
Cheers @Rita_Balmukund_Jaisw
read the pdf file and store it into a variable say Var
use assign activity-
(To) arr_str: (From) - var.split({" “},StringSplitOptions.RemoveEmptyEntries)
use another assign activity-
(To) Name: (From) - arr_str(0)+” "+arr_str(1)
Read PDF and save it has one variable
then
System.Text.RegularExpression.Regex.Match(OutputPDF, “(?<=(From))(.*?)(?=(To))”).Value.Trim
Cheers @Rita_Balmukund_Jaisw
XXX
XX years
The Professional Me The Other Me
• Hobby 1
• Hobby 2
• Hobby 3
Apart from work
• Topic 1
• Topic 2
• Topic 3
Ice breakers
• Friends call me - Answer 1
• Superpower – Answer 2
• If I am AC head – Answer 3
Rapid fire!
• Fact 1
• Fact 2
• Fact 3
Fun facts
Describe yourself in pictures
Bangalore Home town
Formal
Headshot
Personal
Picture
Kiran Malhotra General Management Team Building
• Education 1
• Education 2
Education
• : XX yrs
• Prior Experience: XX yrs
Experience
• Project 1 details
• Project 2 details
Data in each pdf page changes and hence I am not able to use split function to extract the text in between. so I was thinking on getting position to extract the text
Thanks Shriharsha. However this give compile error: RegulorExpression is not a member of Text.
@Rita_Balmukund_Jaisw Share the error Screenshot
System.Text.RegularExpressions.Regex.Match(pdfOut, “(?<=From)(.*?)(?=To)”).Value.Trim
System.Text.RegularExpressions.Regex.Match(pdfOut, “(?<=From)(.*?)(?=To)”).Value.Trim
Use this syntax
-pdfOut - its the output variable name of your PDF
-From - From which position you need to capture the data mention that word
-To - Till which word you need to capture that word you can mention
Sorry my bad. But in my case from and to for each page in pdf is variable, so can’t give a fix word.
There is any constants in every page?
How should you know which data need to capture
@Rita_Balmukund_Jaisw highlighted text is the constant in the PDF
ystem.Text.RegularExpressions.Regex.Match(pdfOut, “(?<=\r\n)(.*?)(?=General Management)”).Value.Trim
Hello Rita,
You may use this in an Assign
Name_Variable = Split(Split(YourPDFTextVariable.ToString, “General Management”)(0),vbcrlf)(1)
Tell me if it works
Kind regards,
Daniel
Okay… I am using Team Building as a constant here. Then it will give combination of name and designation which I can handle in excel to separate out.
Thanks for the help…
use itextsharp in C# to capture position based data
Build one DLL in C# that will use itextsharp library to fetch position based data from pdf and then you can use that DLL in Uipath.