Problem whit Data Extraction Scope

Dev5 · February 4, 2025, 10:46am

the flow I’m working on extracts data from an invoice and creates an excel with that data, but as soon as I give it a 30 page invo
ice it breaks everything can you give me a hand

questo è l’errore
Data Extraction Scope: Request CorrelationId: 0e8175fd-a004-46d5-86fb-1ef89771ead6
Request PredictionId: EIAgrs13frvKoVX4qdTldQGRVFEtLGKekqqdvkMihw8=_7eb7aa7d-0a37-49e9-ba5c-68497bd4eae8
The document exceeds the maximum allowed size for a request
Http Response Code: 413
Http Response Content: {“info”: “Prediction Failed”, “reason”: “{"message": "Service Exception 413: InvalidRequest - Payload exceeds page limit of 30 pages", "stacktrace": " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/workspace/model/microservice/main.py\", line 53, in predict_with_metadata\n response, telemetry = service_internal.predict(json_data, doctype_name=model_name or self.doctype_name)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/workspace/model/microservice/document_understanding/t5/service_internal.py\", line 112, in predict\n response, telemetry = _predict(request_json, correlation_id, recorder, correlation_logger, doctype_name)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/workspace/model/microservice/document_understanding/t5/service_internal.py\", line 179, in _predict\n pages = ss_serviceutil.validate_pages(opt, pages, telemetry)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/workspace/model/microservice/document_understanding/semistructured/serviceutil.py\", line 180, in validate_pages\n raise ServiceException(error_msg, error_code)\nutil.misc.ServiceException: Service Exception 413: InvalidRequest - Payload exceeds page limit of 30 pages"}”}
Cloudfare CF-RAY:
AppId:

singh_sumit · February 4, 2025, 10:54am

Hey @Dev5 you are working with community version and the document you are using has exceeds the number try to work with less number of document and if you want to work on such big number of Data you have to work with enterprise version .

as you can see also in the error message that’s also say :The document exceeds the maximum allowed size for a request
cheers

Dev5 · February 4, 2025, 11:11am

I have this version do you think this is the problem?

Sanjay_Bhat · February 4, 2025, 11:14am

Hi @Dev5

Since the Machine Learnig Extractor 30-page limit for public ML models (custom models may support more), you need to split your PDF before processing. Use Extract PDF Page Range (from UiPath.PDF.Activities) and process them in batches or If there is any junk pages that you already know remove it.

Hope this helps!

singh_sumit · February 4, 2025, 11:17am

@Dev5 can you split the document in less number like decrease the file size number than try to extract Data.

cheers

Dev5 · February 4, 2025, 11:38am

so my process is
Digitize Document (with ocr that outputs the extracted data and puts it in the text variable that you see in data extraction scope)
||
Classify Document Scope (to verify that it is an invoice)
||
Data Extraction Scope (that you see in the screenshot).
\\\\\\\\\\
so in the data extraction scope it analyzes the text variable, the text extracted from the 52-page invoice, but it gives me the same problem

singh_sumit · February 4, 2025, 12:36pm

@Dev5 first before run it with 52 pages try to run it with smaller chunck like 10 pages. because large scale data cause the issue .and before write the data into the excel use
Document Validation so you could see the result and verify the extraction is correct or not.

so doing that two things you get two point clear
1- data extraction is correct
2- if the extraction is correct then large data sets causing the issue

cheers

Dev5 · February 4, 2025, 1:32pm

then with invoices with many pages it works perfectly, and excel fills in the correct invoice data, the problem is that if I process an invoice of more than 30 pages it gives me the error.In that case I could split the text string as you said, but when I split the pdf or the text string I get 2 parts with missing data in vinenda, how do I join them then?

singh_sumit · February 4, 2025, 1:38pm

@Dev5 if the data is split into the two parts like var1 and var2 you can do string manipulation and concatenate them like make a variable and pass the var1+var2 like this

Topic		Replies	Views
Extract Document Data activity limitations Activities activities , question , document_understanding	3	360	February 1, 2024
Data Extraction Scope: Request CorrelationId: 10d0dc5d-0cfc-41fd-b437-bb8d4b4e30b4 Request PredictionId: 2000652894 Your license could not be validated. Please make sure that the API key parameter is correctly configured StudioX studiox , question	12	5173	May 21, 2022
Error Failed to consume license code=RequestEntityTooLarge trace Document Understanding question	2	1570	August 5, 2021
Document Understand : Data Extraction failing for certain files with error "the document exceeds the maximum allowed size of a request" Document Understanding	3	2082	April 4, 2022
ML extraction, maximum rows getting extracted is 100 datarows AI Center question , ai_center	1	564	February 10, 2023

Problem whit Data Extraction Scope

Related topics