Tinderbox User's Manual : Appendix 2: Regular Expressions |
Appendix 2: Regular Expressions |
When analyzing a large number of notes, you will likely find many occasions to search for notes, or to have your assistant search on your behalf. While simple searches often suffice, Tinderbox can also search for complex textual patterns called regular expressions.
Regular expression search is available in Agent queries, in the Find dialog, and in selected template codes and actions. The agent query:
Text(pattern)
is true of the attribute Text contains a string that matches the pattern. Regular expression search may be applied to any attribute:
Name(pattern)
searches only the note titles for a textual pattern.
Sometimes, you may want to avoid regular expressions and to search for an exact match to a specific string. To search for an exact match, use
Name="this string"
A number of special characters represent “wild cards” and other classes of text patterns. Complete information on Tinderbox’s regular expression engine may be found at:
http://www.boost.org/libs/regex/doc/syntax_perl.html
The most common and useful examples are described here.
The period character, “.”, matches any single character.
The plus sign, “+”, matches one or more occurrences of the expression that precedes it. The pattern
!+
will match one or more exclamation points, and the pattern
...+
will match any string with at least three characters.
An asterisk matches zero or more occurrences of whatever precedes it;
10*
matches 1, 10, or 1000.
The question mark “?” matches zero or one occurrence of whatever precedes it.
You can also specify the minimum and maximum number of repetitions:
Xa{2,4}Y
will match XaaY, XaaaY, or XaaaaY, but won’t match XaaaaaaY.
A set of characters to be matched may be enclosed in square brackets. For example,
[0123456789]
will match any digit. Ranges of consecutive characters can be written more concisely:
[0-9]
will match any digit, and
[A-Z][a-z]*
will match any capitalized word. Beginning a set with the character “^” matches everything except the set;
[A-Z][^0-9]
will match any capital letter provided it’s not followed by a digit.
Several special sequences represent common sets of characters:
\w - any word character (including underscore)
\W - any non-word character
\< - the start of a word
\> - the end of a word
\s - any white space character
\d - any digit
\l - any lowercase letter
\u - any uppercase letter
ANCHORS
The special character "^" matches the beginning of the text or attribute being searched. When searching the text of a note, ^ matches the beginning of any paragraph in the note.
The special character “$” matches the end of the text or attribute being searched. When searching the text of a note, $ matches the end of the paragraph in the note.
The backslash character "\" removes the special meaning from the character that follows it. Use "\\" to search for the backslash character itself.
Grouping expressions in parenthesis determines the scope of wildcards. For example,
Name=(\u\l+)+
Would match “Rochester” and “SmallTalk”.
In addition, when Tinderbox sees a parenthetical expression, it remembers the substring(s) that matched it and can use those substrings in actions. For example, the agent
Query: Text(^ color: (\w+)\b$)
Action: Color=$1
scans the document for any notes that contain paragraphs like this:
Color: red
If it finds any matching notes, the agent extracts the word that follows the string “Color: ” and changes the note’s color to match. Here, $1 stands for “whatever matched the first set of parenthesis”, $2 for the second set, and so forth. $0 stands for the entire matched string.
Up: Tinderbox User's Manual | |
Previous: Appendix 1: Attributes | Next: Appendix 3: Actions and Rules |