the flow I’m working on extracts data from an invoice and creates an excel with that data, but as soon as I give it a 30 page invo
ice it breaks everything can you give me a hand
Hey @Dev5 you are working with community version and the document you are using has exceeds the number try to work with less number of document and if you want to work on such big number of Data you have to work with enterprise version .
as you can see also in the error message that’s also say :The document exceeds the maximum allowed size for a request
cheers
Since the Machine Learnig Extractor 30-page limit for public ML models (custom models may support more), you need to split your PDF before processing. Use Extract PDF Page Range (from UiPath.PDF.Activities) and process them in batches or If there is any junk pages that you already know remove it.
so my process is
Digitize Document (with ocr that outputs the extracted data and puts it in the text variable that you see in data extraction scope)
||
Classify Document Scope (to verify that it is an invoice)
||
Data Extraction Scope (that you see in the screenshot).
\\\\\\\\\\
so in the data extraction scope it analyzes the text variable, the text extracted from the 52-page invoice, but it gives me the same problem
@Dev5 first before run it with 52 pages try to run it with smaller chunck like 10 pages. because large scale data cause the issue .and before write the data into the excel use
Document Validation so you could see the result and verify the extraction is correct or not.
so doing that two things you get two point clear
1- data extraction is correct
2- if the extraction is correct then large data sets causing the issue
then with invoices with many pages it works perfectly, and excel fills in the correct invoice data, the problem is that if I process an invoice of more than 30 pages it gives me the error.In that case I could split the text string as you said, but when I split the pdf or the text string I get 2 parts with missing data in vinenda, how do I join them then?
@Dev5 if the data is split into the two parts like var1 and var2 you can do string manipulation and concatenate them like make a variable and pass the var1+var2 like this