Regular expressions (regex) are a powerful text search tool allowing very detailed yet fast text searches for what they do. They’re implemented in many languages, but this tutorial will cover setting up a regex search in VBA.
Given our usual context of helping authors edit and write faster, our eventual goal will include extending and customizing natural language voice commands in Dragon Professional advanced scripting. While it may seem like taking a hammer to a fly at first, regular expressions “easily” allow us to implement natural variation in voice command phrasing.
While examples are given, this is not a tutorial on creating general regex search patterns since that is a much larger topic than can be presented in a single introductory article, but the basic steps below should be enough to get started with applying regular expressions in other topics when using VBA in Microsoft Word, Microsoft Excel, or Dragon Professional advanced scripting.
Thanks for your interest
This content is part of a paid plan.
Using Regular Expressions in VBA
Have you ever had some nitpicky search criteria when trying to sift through some text or data—
Why?
Huh? Why’s that matter to you as a writer?
Natural language voice commands. Nuff said.
Well, I still got more to say, but you get the point. Authors aren’t big data analysts, but making our editing and writing easier will help speed things up, probably substantially. More words on the page or more content edited in the same amount of time.
So just stick with me. Imagine all the VBA editing macros on this website or in your own library at your literal call.
But let’s address several obvious questions first (skip ahead if you just want to get to the point).
Uhhh … what about Word Find?
We can do a lot with the native Word Find capability in VBA, and Find is probably faster since it’s native to Microsoft Word … but Word’s Find applies to document text ranges not text variables unless you’re doing some sort work around which this article does not cover.
There really is little reason to use regex with VBA on a Word document specifically. The advanced Find features in Word gives us a large fraction of the extra capabilities that regular expressions provide. Basically, they’re similar tools for different tasks, different targets.
Can’t we already do with Dragon Professional scripting?
Why bother? Dragon Professional already allows custom voice commands and even some natural language commands out-of-the-box, right?
Well, yeah … but not as detailed as we’ll implement in upcoming articles. So yes, Dragon Professional advanced scripting allows both custom and some natural language voice commands. They’re nice, but they won’t provide nearly as much flexibility as designing our own utilizing regular expressions.
I don’t need that …
Well, there are many things we don’t need.
Ice cream, steaks, iced tea, shrimp scampi, stuffed mushrooms—
Huh? Oh, sorry.
But I sure like it when I got ‘em.
Verbal expression is quite a bit different than our typed output. Even for seemingly simple voice commands, the words sometimes come out our mouths differently even when we mean the same thing.
Redeemed honor … an author’s story (partial sarcasm alert!)
For example, I tried to show my wife an early natural language command I had implemented with Dragon’s advanced scripting editor since I had just recently repurchased the program after many years. I had worked on it for a while and probably said the command more than a dozen times while getting it to work.
It was ready! My wife would be amazed. The crowd. The fanfare. The sun on my face.
But as soon she arrived, the command came out of my mouth differently. I can’t say “wrong” because I still meant the same thing, something like “Move that down three paragraphs,” but it didn’t work.
Silence overtook the room. Shadows descended. Shame engulfed my confidence. Would she reveal my lack of programming prowess to the world?
My honor had to be redeemed.
Maintaining a flow state
I want to be able to say the command in different ways, so I don’t sabotage my workflow by having to remember a precise way to say a voice command. Okay, a little exaggeration maybe, but no one likes speed bumps either, and I’m not a robot. It’s all about making things easier and more fluid, so it “just works.”
Why VBA?
If you’re going to start parsing text, VBA probably isn’t your first language of choice other than the fact you’re already working on macros in Microsoft Word. Other languages are better suited to that task but are well beyond the scope of this blog.
However, Dragon Professional advanced scripting uses VBA like Word which is convenient for us as authors who may already be familiar with creating VBA macros. There are a few rough edges in Dragon’s advanced scripting, but the two applications use essentially the same language.
Why natural language voice commands?
Here are just a few natural language voice command examples I’ve implemented.
No select, cut, move, and paste. Not even a keyboard shortcut tapped. Just done.
The second command automatically selects the relevant content and moves it into an Extras section the end of the document while allowing me to be “sloppy” with my initial selection. That way, the text is still there if I decide I need it later (we rarely do). The macro even time stamps the moved content as a reminder of when I removed it from the main text.
You can’t do that out of the box with Dragon Professional advanced scripting. Dragon Professional is nice, but [cracks knuckles] we can make it better.
Details, details … Dragon Professional’s way?
Dragon comes with some natural language processing, but if you’re like me you just want … more and better. Plus I get annoyed with the details of how Dragon implements the command.
Ughhh.
One solution is just to create duplicates of the advanced scripts in the command browser with varied wording. That’s nice, but it doesn’t work well for some variations, and the command browser fills up faster than you realize with small variations of carbon copied voice commands.
Enter regular expressions …
I wanted to stretch the boundaries and be able to say whatever comes out of my mouth (hey! don’t go there; within reason) and have the computer “understand” me.
It’s not perfect since the script won’t actually understand human speech, but there is a lot of variability in the exact words out of our mouths. We just have to put in some up front work to create good options for our voice command.
Too bad advanced scripting can’t bring you an iced tea, but we’ll have to settle for just making our editing and writing faster.
Keyboard shortcuts are still king …
Before you think I’m doing away with macros and keyboard shortcuts in Microsoft Word. They’re still extremely well-suited for the most used commands, but many others are more suited to invoking via voice commands, especially the more obscure ones.
What are regular expressions?
So regular expressions are a general text search feature across many programming languages. They allow us to sift through text with wonderfully detailed and specific conditions. Although, the supported capabilities can vary some across implementations like in VBA.
Of course, “wonderfully detailed” can also mean “awfully annoying” to setup. I’m not trying to scare you away because it’s great overall, but it can be a bear to get running if your search criteria are uber specific or overly vague.
Setting up the regex variable
We first need to tell VBA we’re using VBScript regex for our search.
Accessing the Microsoft RegEx library
We’ll need a separate library called Microsoft VBScript Regular Expressions 5.5.
Open the relevant Dragon advanced script (see the previous article for more explanation) and press Alt+Enter. Then scroll down and click the checkbox beside the library name. Alternatively, in the Word VBA Editor select Tools → References in the menu (Windows). It will look something like this in whether accessed in the Word VBA Editor or a Dragon Professional script.
Unfortunately, In Dragon Profession you have to enable this for each script where you use regex in Dragon’s advanced scripting, but I usually copy my scripts over when creating a new command, so that isn’t much of a problem. Or you can create your regex searches in the Word VBA editor. It's not quite a 50/50 choice as far as ease of use, but it is available in both applications.
Declaring the RegEx variable
We need to tell VBA we’re using a regex search variable. Recall we declare variables in VBA using “Dim” followed by a variable name then “As” with a data type.
This is called “early binding” because we’re telling VBA ahead of time what data type we need. It is my preferred approach because it is clear and specific.
We have to use Set because a RegExp variable is an object not just a value like 0 or True. Similarly, the “New” keyword is required to create the regex object variable which we’ll use below. Otherwise, our MySearch variable would literally refer to Nothing until it was created.
Using an Object
However, you can also be vague with the variable type if you prefer.
This is called “late binding”. An “Object” is a generic data type which can refer to any type of object in VBA. You're basically telling VBA you don’t know the type but figure it out when I tell you which object to create.
Then you use CreateObject to create a specific RegEx object from the VBScript library. At this point, VBA can determine what type it should use for the MySearch variable.
The advantage with this approach is we don’t have to know specifically what data type to use. Disadvantages include we won’t get any help from the VBA editor on available variable data (properties) or actions (methods) as we create the script.
I don’t like being vague with my variable types unless there is a good reason.
Setting the search criteria
There are three basic parameters we need to set up.
Global peace (or searches)
Do we want to find only the first match or all of them? That is controlled by a “Global” True-False (Boolean) property of the search variable.
We reference the Global property with a dot . (but no spaces) after the variable name. Essentially, if Global is assigned a value of True, the following search should find all possible matches in the text, but False directs the search to find just the first match.
For custom voice commands, I often use non-global search because the text is so short, and I don’t expect a bunch of matches. For other uses with batches of text, global searches are probably more common.
Case sensitive
Do you want Donald and donald to be the same thing as far as the search results are concerned? If so, case insensitive searches are set by the IgnoreCase property.
Case insensitive is a little slower because it has to match either case (no pun intended).
For voice commands in Dragon Professional advanced scripting, most commands are recognized and given by Dragon in lower case text, so I usually use case sensitive just to make it slightly faster.
Exceptions are rare but might occur, for example, if you want to say “next paragraph” anywhere inside a dictation list variable for a custom command (more on that in an upcoming article).
Make a search pattern
This is where it gets “fun.”
Yeah, there be quotes there, so tighten your seat belt (a little), but it’s also where we begin to access the power of regex searches. Even though VBA regex is slightly less functional than some implementations in other languages, we can still do so much with it.
The search criteria is stored in a Pattern property which is just some plain text in double quotes. For now lets assume a case-insensitive search, and we’re looking at something simple like a name.
Note this article is not a full tutorial on regex search patterns because that’s a large topic. There are whole web pages dedicated to helping people understand, create, and debug regex search patterns. Some of them look more like ascii hieroglyphics than search-related text.
When the search is performed, the pattern first looks for the specific text “donald” without the parentheses possibly following by the last name "duck". Let's break it down.
Groups
The (parentheses) identify the text as a group (called submatches in VBA), specifically the first name here.
We can refer to identified groups later when we’re processing the search results. VBA regex doesn’t allow named groups like some other languages, but you can still refer to groups in the order included or generically using a For Each loop (see below).
What about the space?
Because of the literal space “ ” between the first and last names, the search pattern also requires a single space between them. Whole search patterns can fail to find a match if off by even a single space.
Last name
Finally, the search pattern looks for the following specific text “duck”. Again, the parentheses indicate a second group as part of a potential match, but the ? following it means this group is optional (0 or 1 matches of duck are allowed).
Thus, we can match with “donald ” or “donald Duck” or other case variations since our search is not case sensitive, but note the extra space with donald.
Optional space?
It’s a little awkward to have to match “donald ” with a space after it if there is no "duck" after it, so let’s add our first tweak.
The space should be optional if the last name is optional. There are a couple ways to do this, but the most straight forward is to explicitly identify a space.
We include a “?” to say the space is optional. That is, we allow 0 or 1 spaces in between the names. If you wish to allow tabs or line breaks between the names, any whitespace character is indicated by “\s” (a backslash followed by an s) instead like "\s?".
Another tweak …
Sorry another tweak … I can’t help myself.
Sometimes people accidentally hit the space bar more than once when typing, so what if you wanted to allow more spaces in between the names. Instead of the “?” we use an “*”.
The asterisk “*” says to allow 0 up to any number of spaces in between the names, and it will still count as a successful match for the overall search pattern. There is another way to be more specific with how many spaces you allow, but the above are easier and clearer to use.
But ooops …
This will now match “donald” (without a trailing space) or “Donald Duck” as expected, but unfortunately in this context, it also matches “Donaldduck” or “dOnAld DucK” because it allows zero or more than one space between names. Also, the search is case insensitive, so we unintentionally allow some weird capitalization combinations.
See how this can get messy real quick if you want to take care of these variations? But it’s up to you how detailed and specific you want your search pattern to be.
How many groups?
The previous name pattern included two possible groups, so a single match can contain several groups of identified text. This will be important when using the results in practice, but the number of groups depends on how many groups you specify in the pattern using parentheses.
You can also designate uncaptured groups that are matched but not tracked, but we won’t worry about that here.
A voice command example pattern ...
Here is a more complicated sample search pattern to match a natural language custom voice command in Dragon Professional advanced scripting. Don’t choke or quit after this. I’m merely showing a possibility to prove it’s more useful than the above trivialized examples imply.
I’m not explaining all the details here, but this pattern is relatively simple believe it or not. An example text match might be:
The above pattern would match the last five words of the voice command.
It also allows some natural variation in the spoken command while still identifying several relevant groups. For example, this command would also make a match with the pattern.
See the variability with a single search pattern? And they sound like regular English.
What do I do with it?
We would need some additional VBA machinery to interpret the groups of text, but the pattern is the starting point. We can then use this information to identify and run an appropriate VBA macro in Microsoft Word (see previous article for the latter part of that).
There are many other things we can add to regex search patterns like look ahead matches (VBA doesn’t support look behind), subgroups, optional characters, alternative words or phrases, excluded text, numbers, and more.
Consolidate the settings
Combine these RegEx search settings into a With statement to make them easier to read (it’s also an itsy bitsy slightly tiny bit faster).
Set up the regex search
But alas, time fails us here, so let’s set up the basics of the actual regex search in VBA.
Match variables
Now we have to do the search. We need several variables to perform the search and store the results.
Match Collection variable
The matches are stored in a match collection data type.
This variable will store all the matches found. If Global is False, there will only be one match stored in the MyMatches collection.
If you don’t like remembering what data type is necessary, you can just use “Object” and let VBA figure it out, but you know how I feel about that.
Match variable
Each match has a type Match (how profound is that?).
The variable aMatch corresponds to a single match that contains a collection of submatches.
For most of my own searches, I’m only looking for a single match, meaning the first one. I wish there was a pretty way to reference the first match of the MyMatches variable, but here goes:
Matches(0) is the first match since the Collection increments members starting from zero (called zero-indexed which is common among programming languages).
SubMatches variable (not really required)
Each match will contain zero, one, or more SubMatch(es) which include all groups identified within the single match as a collection.
In practice, we probably don’t need to declare this variable explicitly since it’s only used once in a loop (see below).
Do the regex search
Now that the regex variables are set up, we’ll do the search. We need another pair of variables to store the results.
What search text?
We need some text to search. Just to be extra clear, we’ll assume it’s stored in a string variable called SomeText meaning it is plain text without any Microsoft Word-specific or even rich-text formatting.
In practice, you’ll already have a text variable you want to search such as a voice command phrase in Dragon Professional advanced scripting.
Execute the search …
The search variable MySearch has an Execute method (action) we can use.
Notice MySearch.Execute(…) performs the search, and we’re storing the results in the MyMatches variable. Set is required because the search results include more information than a plain value like 7, True, or “abc.”
Was the search successful?
If any matches were found, the Count property of MyMatches will be positive. We check this with a conditional statement.
I wish a “Found” property existed to be consistent with the Find.Found property (which gives a True or False value) when using Microsoft Word’s Find operation with a range variable, but the above is how we check for results in VBA regex.
See this previous article if you want a brief review of conditional statements in VBA.
Loop over all matches
To my knowledge, there is no way to get the number of groups (submatches) for a match in a general VBA regex search, and we don’t want to “hard code” the number based on a specific search because it will vary.
With that in mind, it’s easier to use a For Each loop with the MySubMatches variable from earlier. Recall VBA regex refers to the groups as SubMatches.
This loops over all groups found in the MySubMatches variable. The For Each loop allows us to not know how many groups exist ahead of time.
wGroup is a Variant type because For Each loops in VBA require a generic loop variable type even if you know the specific data type of the items it loops over. As the loop iterates through each member, the wGroup variable will refer to the current submatch. It corresponds to the current matched group plain text as a string.
I like prepending my loop variables with a “w” just to imply “which” text group, but it’s not required.
Or just skip the SubMatches variable …
We could skip declaring a submatch variable and just refer to the Collection directly in the For Each loop.
This is a little easier, but both versions work.
Summary
Summarizing the above regex setup and search execution:
Using a For Each loop is an easy way to access all matches and/or all submatches (groups) within a single match.
Other possibilities
In addition to using the Execute method, once the search is set up, we could also replace any found text with other text. This example stores the replaced text back into the same variable SomeText.
We could simply validate whether a match exists using:
This results in a True value if a match exists in the SomeText variable and False if not, but it doesn’t do anything else.
Works with Word
The regex searches work inside Microsoft Word VBA, but Word’s native Find and replace operations are actually quite robust if you dig a little bit into how it works, so you probably won’t need regex for any Word document text.
Also works with Excel
All of the above should also work in Excel VBA, but Excel just recently implemented a native (in the main application) RegEx feature, so it’s probably not necessary unless you just like the freedom of working with your own VBA code rather than through the Excel interface.
Gotchas?
If you’ve read any of my articles, you might be waiting for the “what could go wrong” section.
The main issue with regex is creating and validating the search patterns before you put them into use in a real script, but let’s talk about some other potential questions.
Will regex be slow?
Even with the more advanced features of regex, surprisingly most voice command searches will take much less than a millisecond in my testing on a maxed out Surface Pro (probably equivalent to about a mid-range laptop).
I could barely time it racing through dozens of patterns each with many variations, so you’ll never, ever notice it when creating better natural language voice commands. The bulk of any noticeable command delay is in Dragon Professional’s and/or Microsoft Word’s processing.
So, this is definitely a win for us.
Dragon professional advanced scripting issues
Since the main use case in this article is setting the stage to improve natural language voice commands in Dragon Professional advanced scripting, I’ll comment on that some.
The Dragon professional advanced scripting editor is functional enough to create nice voice commands with some flexibility in their usage, but it is clunky. This is exacerbated by the fact that they do not support advanced scripting (at all). All commentary on the website refers users to contact value-added resellers for any help.
While I understand the motivation, since supporting advanced scripting would be costly given the technical aspect of it, the online documentation should be updated and modestly expanded at a minimum. Bits of the existing documentation are even contradictory. It’s probably not the best approach for software that starts at $700 and is marketed as a “professional” product as stated in the name.
While I like the capability to implement custom commands, Dragon Professional advanced scripting is awkward and needs improvement for a professional product. With that said, it’s workable, and I’ve created a stack of voice command scripts to do all sorts of editing tasks by invocation.
Working alongside Dragon’s natural language command recognition?
Dragon professional comes with some natural language command interpretation which is good if you don’t plan on adding anything, but I ended up turning it off because it conflicted with some of my own commands. Granted, Dragon Professional’s versions of some commands would do the task adequately, but I almost always want to tweak things like capitalize a word here or adjust the final position a little there.
As far as I can tell, any Dragon Professional natural language command that overlaps your own custom voice command phrasing will supersede your version. That’s completely backwards, but it’s what we have to deal with right now.
I ended up turning off Dragon Professional’s natural language commands in favor of my own, more detailed and comprehensive implementations.
Verdict?
As an author, I say buy Dragon Professional and get started. It is a powerful program overall, and the investment will easily pay for itself many times over, in my opinion. Then over time, implement some of your own natural language processing using regex (keep coming back here for help; hint, hint).
Dragon Professional’s advanced scripting is useful even if awkward in places. Quite a bit of free help will be available here, if you wish, but I also offer more advanced content for members or with paid individual consulting.
Even if you prefer to stick with the out-of-the-box natural language features in Dragon Professional. It’s still powerful and will let you work faster both in actual words on the page and editing via stock voice commands. Just be aware that your writing voice is a little different when you dictate rather than type content.