Parsing an unknown file type

iamthejuan · May 4, 2020, 9:19am

Hi,

Is there a better way to parse this sample string to get display_name and content?To be honest, I’m not sure what type text file format this is. Thank you.

payload {
  annotation_spec_id: "4174790675084083200"
  display_name: "VIN"
  text_extraction {
    score: 1.0
    text_segment {
      start_offset: 94
      end_offset: 118
      content: "2402-0161C-E2777JLP10000"
    }
  }
}
payload {
  annotation_spec_id: "8786476693511471104"
  display_name: "Last_Name"
  text_extraction {
    score: 1.0
    text_segment {
      start_offset: 138
      end_offset: 143
      content: "DELA CRUZ"
    }
  }
}
payload {
  annotation_spec_id: "5622803508399964160"
  display_name: "First_Name"
  text_extraction {
    score: 0.9790445566177368
    text_segment {
      start_offset: 144
      end_offset: 152
      content: "JUAN"
    }
  }
}
payload {
  annotation_spec_id: "310104060474687488"
  display_name: "Middle_Name"
  text_extraction {
    score: 1.0
    text_segment {
      start_offset: 153
      end_offset: 159
      content: "CRUZ"
    }
  }
}
payload {
  annotation_spec_id: "3015078586664091648"
  display_name: "DOB"
  text_extraction {
    score: 1.0
    text_segment {
      start_offset: 174
      end_offset: 186
      content: "January 01, 1981"
    }
  }
}
payload {
  annotation_spec_id: "6338594374175162368"
  display_name: "Status"
  text_extraction {
    score: 1.0
    text_segment {
      start_offset: 200
      end_offset: 206
      content: "Single"
    }
  }
}
payload {
  annotation_spec_id: "427795785111830528"
  display_name: "Citizenship"
  text_extraction {
    score: 1.0
    text_segment {
      start_offset: 219
      end_offset: 227
      content: "Spanish"
    }
  }
}
payload {
  annotation_spec_id: "153076207842230272"
  display_name: "Address"
  text_extraction {
    score: 0.9999884963035583
    text_segment {
      start_offset: 236
      end_offset: 276
      content: "INIGO BLUE EXT, BO, OBRRO ST,"
    }
  }
}
payload {
  annotation_spec_id: "9144618417003692032"
  display_name: "Precinct_Code"
  text_extraction {
    score: 1.0
    text_segment {
      start_offset: 297
      end_offset: 302
      content: "0123C"
    }
  }
}

supermanPunch · May 4, 2020, 9:29am

@iamthejuan It looks like it’s a Json String, but not in it’s format, However you can use it as a text file, and use Regex methods to extract the Required data like in the below link :

ppr · May 4, 2020, 12:27pm

@iamthejuan
Regex and anchoring on property name and “:” can maybe sufficient enough.

another approach could be about fixing it with RegEx replaces:

split on payload
correct the strings on “” and commas to a target like this:

so you can process it as standard json

iamthejuan · May 7, 2020, 1:22am

Thank you, I will try it.

manson_ew2 · July 25, 2021, 1:52pm

It looks like protobuf text format.
If you don’t have a schema definition - you may create your own, based on the output and use TextFormat parser from protobuf library (or convert to json, etc)

system · July 30, 2021, 2:14am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Parsing JSON Type string Studio studio , question , activities_panel	3	1086	February 11, 2021
Extracting spesific value from Json Studio datatable , selector , uiautomation , robot , activities , studio , question	5	969	June 3, 2021
How to parse a string Help studio	20	2227	December 13, 2020
Error in Data Extraction from '.txt' file Studio uiautomation , question	0	642	April 15, 2020
Getting Data from Text file Help uiautomation , activities , question	26	2159	August 15, 2020

Most Active Users - Yesterday
ashokkarale
MD_Farhan1
Ajay_Mishra
postwick
Dheerendra_vishwakarma
Anil_G
chandreshsinh.jadeja
Gautham_Pattabiraman
vrdabberu
aravindbalineni123
More details...

Parsing an unknown file type

Related Topics