Hi All,
I want to extract all the text from If and upto Order from the entire pdf text which I extracted using Regex, .Here, I have extracted the complete pdf using Read PDF Activity.
Hi All,
I want to extract all the text from If and upto Order from the entire pdf text which I extracted using Regex, .Here, I have extracted the complete pdf using Read PDF Activity.
To extract anything with RegEx you should have some pattern and in your document I don’t think there is some pattern.
This could be only possible through UiPath Document Understanding with some training.
Thanks,
Ashok
Well if you know that the word/characters “General.” and “Order.” is only present in ONLY one place in the document then a pattern like this could maybe work.
System.Text.RegularExpressions.Regex.Match(yourInputString, "(?<=General\.\s).*?(?=\sOrder\.)").Value
Note that this pattern is not very robust and if you the words/characters “General.” and “Order.” appear anywhere else then you WILL get an output that is wrong.
Then you would need somekind of Machine Learning and AI as the method @ashokkarale describes.
Actually somehow I extracted text : Now I have standard text and the UiPath extracted text. If I want to compare line by line and find difference , is it doable by bot. Below is my standard text and extracted text:
Standard Text:
If the initial invoice date stated above does not align to the execution date, then the initial invoice date will be the execution date. Databricks reserves the right to accelerate scheduled billing of fees for the Specified Commitment(s) (and associated Support Services, if any) if Customer’s usage exceeds the related amount for which it has been billed. Additionally, if Customer’s usage exceeds the applicable Specified Commitment or Universal Usage Commitment (e.g. Burst Usage), fees for such Burst Usage (and for associated Support Services, if any) shall be billed monthly in arrears. Fees above do not include True-up Payments (if any), which would be invoiced by Databricks after the End Date under the terms of this Order.
UiPath Extracted Text:
If the initial invoice date stated above does not align to the execution date then the initial invoice date will be the execution date.
Databricks reserves the right to accelerate scheduled billing of fees for the Specified Commitments and associated Support Services if any if Customers usage exceeds the related amount for which it has been billed.
Additionally if Customers usage exceeds the applicable Specified Commitment or Universal Usage Commitment e.g.
Burst Usage fees for such Burst Usage and for associated Support Services if any shall be billed monthly in arrears.
Fees above do not include Trueup Payments if any which would be invoiced by Databricks after the End Date under the terms of this Order.
Okay,
I edited the RegEx slightly.
But you understand that the RegEx pattern will not work if the words/characters “General.” and “Order.” appear anywhere else?
The document you have right now is a best case scenario maybe, what about other documents?
Thats what my use case is . If it does not follow the standard pattern it has to be mismatch and throw error. It should be always in same standard order.
One question here is , I used Document Understanding to extract the text.
Now my standard format has some words within brackets. But when bot extracted those words are not under brackets, can we show that as mismatch?
Hello @dutta.marina
Please highlight from which your data needs to be extracted, and give us more sample files, @ashokkarale informed to extract the data from the text we need some kind of structure otherwise you have to use the DU or Generative AI