Problem whit Data Extraction Scope

the flow I’m working on extracts data from an invoice and creates an excel with that data, but as soon as I give it a 30 page invo
ice it breaks everything can you give me a hand


questo è l’errore
Data Extraction Scope: Request CorrelationId: 0e8175fd-a004-46d5-86fb-1ef89771ead6
Request PredictionId: EIAgrs13frvKoVX4qdTldQGRVFEtLGKekqqdvkMihw8=_7eb7aa7d-0a37-49e9-ba5c-68497bd4eae8
The document exceeds the maximum allowed size for a request
Http Response Code: 413
Http Response Content: {“info”: “Prediction Failed”, “reason”: “{"message": "Service Exception 413: InvalidRequest - Payload exceeds page limit of 30 pages", "stacktrace": " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/workspace/model/microservice/main.py\", line 53, in predict_with_metadata\n response, telemetry = service_internal.predict(json_data, doctype_name=model_name or self.doctype_name)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/workspace/model/microservice/document_understanding/t5/service_internal.py\", line 112, in predict\n response, telemetry = _predict(request_json, correlation_id, recorder, correlation_logger, doctype_name)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/workspace/model/microservice/document_understanding/t5/service_internal.py\", line 179, in _predict\n pages = ss_serviceutil.validate_pages(opt, pages, telemetry)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/workspace/model/microservice/document_understanding/semistructured/serviceutil.py\", line 180, in validate_pages\n raise ServiceException(error_msg, error_code)\nutil.misc.ServiceException: Service Exception 413: InvalidRequest - Payload exceeds page limit of 30 pages"}”}
Cloudfare CF-RAY:
AppId:

Hey @Dev5 you are working with community version and the document you are using has exceeds the number try to work with less number of document and if you want to work on such big number of Data you have to work with enterprise version .

as you can see also in the error message that’s also say :The document exceeds the maximum allowed size for a request
cheers


I have this version do you think this is the problem?

Hi @Dev5

Since the Machine Learnig Extractor 30-page limit for public ML models (custom models may support more), you need to split your PDF before processing. Use Extract PDF Page Range (from UiPath.PDF.Activities) and process them in batches or If there is any junk pages that you already know remove it.

Hope this helps!

@Dev5 can you split the document in less number like decrease the file size number than try to extract Data.

cheers

so my process is
Digitize Document (with ocr that outputs the extracted data and puts it in the text variable that you see in data extraction scope)
||
Classify Document Scope (to verify that it is an invoice)
||
Data Extraction Scope (that you see in the screenshot).
\\\\\\\\\\
so in the data extraction scope it analyzes the text variable, the text extracted from the 52-page invoice, but it gives me the same problem

@Dev5 first before run it with 52 pages try to run it with smaller chunck like 10 pages. because large scale data cause the issue .and before write the data into the excel use
Document Validation so you could see the result and verify the extraction is correct or not.

so doing that two things you get two point clear
1- data extraction is correct
2- if the extraction is correct then large data sets causing the issue

cheers

then with invoices with many pages it works perfectly, and excel fills in the correct invoice data, the problem is that if I process an invoice of more than 30 pages it gives me the error.In that case I could split the text string as you said, but when I split the pdf or the text string I get 2 parts with missing data in vinenda, how do I join them then?

@Dev5 if the data is split into the two parts like var1 and var2 you can do string manipulation and concatenate them like make a variable and pass the var1+var2 like this