I’m trying to extract data from this pdf french frequency list (begins on page 18). It lists the top 5000 french words by use. The words are listed like this: 1 je…2 tu…etc. I want to take all these words and put them in an excel spreadsheet. However, I don’t know what function to use to extract this information. Please help me.
Hi @Trevor_Guo
Welcome to the Community
You can use following methods to extract data from your PDF
- Try to use Document Understanding Framework
-
if you wan go into further you can use AI Fabric ML model
-
otherwise old method you can read the document using PDF activities and do identification using Regex
Thank you, Document Understanding Framework with the Regex extractor was exactly what I needed. However, I have a new problem, which is that for each field in my document type, I am only extracting a single word.
I need to extract 5000 words. I hope that I do not need to create 5000 fields. Please advise ASAP.Hi Trevor
Can you provide a sample, output and tell us about the pattern / what you are trying to match?
Cheers
Steve