How could I copy the following text in word to my Excel file.
What I need to copy are the numbers indicated in the image (in the ID section of my excel) and their respective texts in the “I want” section of my excel file.
Attached screenshots:
excel to which I want to incorporate that information both in “ID” and in “I WANT”
I’m going to draft the concept here. This can be fine tuned based on how efficiently these steps can be toned down.
If your input is an image then I am thinking you scrape the entire text into a single paragraph.
The challenge will be the quality of the text read from the image
If your input is a bulleted text list then it might not be much of an issue
Assuming you got the text into a paragraph:
Split it into single lines or read the text line by line.
If not split each line it into an Array using the Line-Break as the splitting character
Next would be to loop the collection and read each line within an Excel Application Scope:
Use RegEx to find the position of the first alphabet - this would be the start of the text after the bullet number. Example, for : 1.1. Login, the first alphabet is the letter L
The reason I’m saying RegEx is that your bullet numbering is varying based on the nested levels of your text.
If you don’t prefer RegEx, the other way is to find the last occurrence of the period “.” character which is usually before the bullet text begins - this may not be reliable if your bullet text has periods in them
Once you get the position:
You read anything before this position as the bullet number
Read anything after this position as the bullet text
Use Write Cell to write bullet number to cell A{X}
Use Write cell to write bullet text to cell E{X}
Increment counter {X} by 1 to move to the next line
I have something set up quickly. It picks up after you get the text into a string. In my example, I have mocked up the text by writing it to a notepad file.
You can see the output is as you expect. The Bullets are in Column A and the text goes into Column E. The solution covers different kind of bullets provided they have periods as separators. Other kind of bullets will need modifications to the code.
Files.ReadAllLines() is a way to read multi-line text into an array . Each line of text is assigned as a cell in the array.
matchBullet is a regular expression string that helps identify numbers separated by periods in each bulleted line. The detail is beyond the scope of this conversation.
Simply put, If your line is “1.2.1. This is my Sample Bullet”, then the expression against matchBullet identifies just the "1.2.1. " portion of your bulleted text.
regexBullet is a method of preparing the RegEx expression for use in code because matchBullet by itself is simply a string and can’t do much
In the Body, you pick each line and run it through the regular expression. The function regexBullet.Match returns only one match it has encountered in the line. From the above example it returns "1.2.1 " into the bulletMatch variable
Next, you pull that bullet number into a string variable sBullet
Lastly, a convenient way to get the bullet text is by using the bullet number. The text is anything that is a portion of the string (or substring) that follows the bullet. In other words, the starting position of the bullet text is the length of the bullet number.
Consequently, the substring function pulls “This is my Sample Bullet” portion into a variable named sBulletText
I hope this helps you get a bit more clarity on how the code works.