Correct CSV File with RegEX

williammm · May 21, 2020, 1:50pm

Hello,

I have an extracted CSV file from a system (which I have no control over)

The CS data extracted has commas (,) as delimiters, however, they aren’t enclosed in quotations like “,”.

The data I extract has a free text column, and the sentences there also have commas (,) in them. Because of this, when I open it in Excel, the data and column headers are mostly misaligned.

The other columns do not have commas (,) it is only this one column.

However, thankfully the two columns between the ‘free text’ column, has fixed structures (one is a datetime format, the other is a custom ID structure where it has 2 alphabets, and a 3 numbers - eg:AC123).

So the structure looks like this:
ID, Text, Date
AC123, Text, dd/mm/yyyy hh:mm:ss
(repeat)

I am trying to figure out a workflow to get UI Path to read the csv file, determine the two columns between the ‘free text’ column, and place quotation marks around the commas, and then write it over the csv file (clearing the old data, and appending the newly corrected data). And I want my workflow to do this for every data row.

The data structure should look like this after correctioin:
ID, Text, Date
AC123"," Text"," dd/mm/yyyy hh:mm:ss
(repeat)

This is so that it can be passed on to the next workflow through Excel (fixed process)

Can anyone help write the specific activities I should use? I know RegEX would help with identifying the column datas between the text data, but how do I make it so it finds the position based on the RegEx word, and get the commas (,), either before, and one for after?

(I’m quite new to RPA so a detailed explanation of how this is performed will help greatly)

EXAMPLES:
Status,ID,Title,Open Date,Close Date
Open,PO102,THIS IS WHERE THE,TEXT WILL BE,04/25/2020 01:25:10 AM,04/25/2020 03:43:50 PM
Open,PO293,THIS IS,WHERE THE,TEXT WILL BE,04/25/2020 12:22:40 AM,04/25/2020 04:43:22 PM
Open,PO029,SOMETIMES HAS NO COMMAS,04/25/2020 03:08:00 AM,04/25/2020 05:08:23 PM
Open,PO502,SOMETIMES,HAS,COMMAS,04/25/2020 03:00:00 PM,04/25/2020 05:19:25 PM

SenzoD · May 21, 2020, 1:56pm

@williammm, Hi and welcome to the community

can you share a sample .csv file?

williammm · May 22, 2020, 2:04am

Hello SenzoD, I have put up more samples in the post, but I can’t seem to upload csv file. I copied and pasted example datas from Notepad

iVishalNayak · May 22, 2020, 8:29am

@williammm
(?<=\d,)([a-zA-Z\s,?]*)(?=,)

Try this Regex Pattern on the input.

SamanGuruge · May 22, 2020, 9:03am

@williammm,

just you can read .csv file using read CSV activity

and send that String outcome in the following Regex
Generate final output data table
use build Data Table activity and build data table contain the following columns
Status , ID, Title, OpenDate, CloseDate,

Remove all the delimiters
Regex.Replace(str_scvString,“,”," ")

Identify the status column and ID
Regex.Replce(str_scvString,“(?<=PO\d{0,})(\b \b)”,“,”)
Regex.Replce(str_scvString,“(\b \b)(?=PO\d{0,})”,“,”)

Identify Open and Close date columns
REgex.Replace(str_scvString,“\d{2}/\d{2}/\d{4}\s{0,}\d{2}:\d{2}:\d{2}\s{0,}(A|P)M”,“,”)
then use the Generate data table and parse the “str_scvString” and config the delimiter that’s it
Happy Automation

williammm · May 24, 2020, 2:54am

Thank you all. The explanations were very helpful

Topic		Replies	Views
Read CSV doesn't work as it should Activities excel , csv , activities , studio	3	121	December 13, 2024
How should I update few things on .csv File? Activities excel , csv , activities , question , update-csv	24	1471	August 30, 2022
Remving commas in my csv file Studio studio , question , activities_panel	2	828	August 31, 2022
Format csv file Help csv , activities , question	4	2188	January 3, 2020
Replace commas with dots Studio studio , question , activities_panel	5	84	July 29, 2024

Correct CSV File with RegEX

Related topics