Correct CSV File with RegEX

Hello,

I have an extracted CSV file from a system (which I have no control over)

The CS data extracted has commas (,) as delimiters, however, they aren’t enclosed in quotations like “,”.

The data I extract has a free text column, and the sentences there also have commas (,) in them. Because of this, when I open it in Excel, the data and column headers are mostly misaligned.

The other columns do not have commas (,) it is only this one column.

However, thankfully the two columns between the ‘free text’ column, has fixed structures (one is a datetime format, the other is a custom ID structure where it has 2 alphabets, and a 3 numbers - eg:AC123).

So the structure looks like this:
ID, Text, Date
AC123, Text, dd/mm/yyyy hh:mm:ss
(repeat)

I am trying to figure out a workflow to get UI Path to read the csv file, determine the two columns between the ‘free text’ column, and place quotation marks around the commas, and then write it over the csv file (clearing the old data, and appending the newly corrected data). And I want my workflow to do this for every data row.

The data structure should look like this after correctioin:
ID, Text, Date
AC123"," Text"," dd/mm/yyyy hh:mm:ss
(repeat)

This is so that it can be passed on to the next workflow through Excel (fixed process)

Can anyone help write the specific activities I should use? I know RegEX would help with identifying the column datas between the text data, but how do I make it so it finds the position based on the RegEx word, and get the commas (,), either before, and one for after?

(I’m quite new to RPA so a detailed explanation of how this is performed will help greatly)

EXAMPLES:
Status,ID,Title,Open Date,Close Date
Open,PO102,THIS IS WHERE THE,TEXT WILL BE,04/25/2020 01:25:10 AM,04/25/2020 03:43:50 PM
Open,PO293,THIS IS,WHERE THE,TEXT WILL BE,04/25/2020 12:22:40 AM,04/25/2020 04:43:22 PM
Open,PO029,SOMETIMES HAS NO COMMAS,04/25/2020 03:08:00 AM,04/25/2020 05:08:23 PM
Open,PO502,SOMETIMES,HAS,COMMAS,04/25/2020 03:00:00 PM,04/25/2020 05:19:25 PM

@williammm, Hi and welcome to the community

can you share a sample .csv file?

Hello SenzoD, I have put up more samples in the post, but I can’t seem to upload csv file. I copied and pasted example datas from Notepad

@williammm
(?<=\d,)([a-zA-Z\s,?]*)(?=,)

Try this Regex Pattern on the input.

1 Like

@williammm,

just you can read .csv file using read CSV activity
image
and send that String outcome in the following Regex
Generate final output data table
use build Data Table activity and build data table contain the following columns
Status , ID, Title, OpenDate, CloseDate,

Remove all the delimiters
Regex.Replace(str_scvString,","," ")

Identify the status column and ID
Regex.Replce(str_scvString,"(?<=PO\d{0,})(\b \b)",",")
Regex.Replce(str_scvString,"(\b \b)(?=PO\d{0,})",",")

Identify Open and Close date columns
REgex.Replace(str_scvString,"\d{2}/\d{2}/\d{4}\s{0,}\d{2}:\d{2}:\d{2}\s{0,}(A|P)M",",")
then use the Generate data table and parse the “str_scvString” and config the delimiter that’s it
Happy Automation

2 Likes

Thank you all. The explanations were very helpful

1 Like

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.