Extract Filename from text

Good day,

Can someone assist me with code to get the filename from the following text:
“1980/01/02 02:00 4783104 AB GBB Meal Tool 19 Feb 2014 CH (2).ppt” Filename should be 4783104 AB GBB Meal Tool 19 Feb 2014 CH (2).ppt

“1980/01/02 02:00 66684416 Testing One proposal Two three Nov 2013 V7 APM .doc” Filename should be Testing One proposal Two three Nov 2013 V7 APM .doc

“2016/06/08 09:01 786927 ASSR ABCD12ER IASB IFRIC FINAL 1July15.pdf” Filename should be ASSR ABCD12ER IASB IFRIC FINAL 1July15.pdf

“1980/01/02 02:00 22528 KZN02_ Hello World.xls” KZN02_ Hello World.xls Filename should be KZN02_ Hello World.xls

Thus the date and time (first two columns) should be ignored as well as that number following.

Hi @martin.park

How about this Regular expression?

System.Text.RegularExpressions.Regex.Match(YourString,"(?<=\d{4}.\d{2}.\d{2}\s\d{2}.\d{2}\s\d.*)\s\S.*")

image

Regards
Gokul

Hi,

Can you share why the first example include 4783104 but others don’t include 66684416 or 786927 etc?

Regards,

This number is a filesize that is extracted when running a CMD prompt

HI,

All right. I think it should be removed. How about the following expression?

System.Text.RegularExpressions.Regex.Replace(yourString,"^[\d/]+\s+[\d:]+\s+\d+\s+","")

Regards,

Thank you. I can only test a bit later. Will this also work if the filename starts with a number for example:
“1980/01/02 02:00 22528 1234 KZN02_ Hello World.xls” should return 1234 KZN02_ Hello World.xls

“1980/01/02 02:00 22528 5678KZN02_ Hello World.xls” should return 5678KZN02_ Hello World.xls

Hi,

It’s no problem as the following.

Regards,

Hi Martin,

Try using this expression:

System.Text.RegularExpressions.Regex.Replace(outputText, "^(\S+\s){2}\S+\s", String.Empty)

This will replace everything in the string before the 3rd occurrence of a space leaving the file name.
Try it out and let me know.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.