BM0031 PRINCIPLES OF ACCOUNTING
PRINCIPLES OF FINANCIAL MANAGEMENT/BUSINESS FINANCE
Above 2 are the different formats that I want to extract from different word documents how do I write the regex expression and also the input for this ? Would be great to have an example to use or reference to.
Hi, my requirement is to extract module syllabus and I want to extract for example the module name and module code in my prev post. They are both in different formats. I want to extract the name as module name and the code as module code and if there is no module code it will return empty.
The regex must be able to get the different formatting of the same syllabus. e.g. module name and module code. The start of the word documents have this:
BM0031 PRINCIPLES OF ACCOUNTING
PRINCIPLES OF FINANCIAL MANAGEMENT/BUSINESS FINANCE
This 2 are from 2 different word documents and I want to write a regex that reads this 2 different patterns and stores the BM0031 as module code and the name as module name.
I have inputs like BM0031 PRINCPLES OF ACCOUNTING, PRINCIPLES OF FINANCIAL MANAGEMENT/BUSINESS FINANCE, BM0523 SERVICES MARKETING MANAGEMENT, IT1528 CYBER SECURITY TECHNOLOGY, LAW AND ETHICS, IT3526 Cyber Security Attack & Defense etc… I want to write a regex format that can read all this types.
“The start of the word documents have this:
BM0031 PRINCIPLES OF ACCOUNTING
PRINCIPLES OF FINANCIAL MANAGEMENT/BUSINESS FINANCE”
I see that in the second case, the module name is not be preceded by a module code. Now, if this is all the input we have, it won’t be a problem to create a regex for this. But, your documents will have much more text than just these names which means the generic regex we create won’t work. And we have to figure out a way to separate the module name and code from the rest of the text.
Please share a sample document which has your input.