Use read pdf text or Read pdf with ocr if the pdf is scanned
Use regex or string manipulations to get the data you want
Welcome to Community!!
1.Use Read Pdf activity
2.By using Regex you can extract that fields
I hope it helps!!
Can you provide that pdf file then we can extract that fields.
Hey @HongRui_Zhang !! Can you please share the pdf so that we can work on a solution.
Singapore
(?<=Leg 1 of 2\s*\|\s+).*(?=\s+to\s*[A-Z]+)
Dubai
(?<=Leg 1 of 2.*\s*\|\s+.*\s+to\s+).*(?=\s+\|)
I hope it helps!!
Hey @HongRui_Zhang
You can try this regex expression
(\w+(?: (\w+))?)\s+to\s+(\w+(?: (\w+))?)\s+(?=|)
In order to extract the SINGAPORE (SIN) and DUBAI (DXB), we can make use of the following assign statements
assign origin = text.Split({“to”},StringSplitOptions.None)(0)
assign destination = text.Split({“to”},StringSplitOptions.None)(1)
Output
Hi @HongRui_Zhang ,
You can use the split function to get this done. First, split the str by ‘|’ and then by ’ to '.
Please follow the below steps-
- Assign the above text to str variable.
str= "Leg 1 of 2 | Singapore (SIN) to Dubai (DXB) | Operated by Emirates (equipment owner - Emirates)"
- Apply split function “|” as written below to get “Singapore (SIN)”
Split(Str.Split("|"c)(1)," to ")(0)
- Apply split function as written below to get “Dubai (DXB)”
Split(Str.Split("|"c)(1)," to ")(1)
Now this solution will also work in case the destinations’ names are changing.
Below is the screenshot for your reference.
Regards,
Ashutosh Gupta
This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.