Using parentheses within a regular expression regex, it is possible to set up to nine back-references from within the overall regex. These discrete sub-matches can then be used in the action connected with the query. The most normal means of setting a regex with back-references is by using the operators String.contains() or String.icontains().
Back-references can be used in actions in several contexts:
- Most obvious is in an agent's action (referencing the agent's query).
- Within general action code:
- with the if(query){actions} conditional action, back-references for the conditional query can be used within any action(s) enclosed within the operators'
{}
brackets—in both the 'if' and 'else' branches. - although not formally queries, the action String.replace(regex,replacement) can set back-references in the first (regex) input argument. The regex can be a literal string or a regular expression and, if so, then the back-references it creates can then be used within operator's second (replacement) argument.
- with the if(query){actions} conditional action, back-references for the conditional query can be used within any action(s) enclosed within the operators'
Exceptions:
- Although Macros use the same back-reference style of notation for inserting content, in that case the values are drawn from the macro's input arguments rather than from a regex.
- Whilst the find() action operator uses queries, the operator uses this to return the paths of matching notes and therefore does not support regex back-references.
From v9.5.0, if(){…} and if(){…} else {…} now restore regular expression back-references ($1, $2…) to their state prior to the next if() statement.
IMPORTANT: The examples below are not intended to teach how regex work but simply to illustrate how back-references are used once created.
Referring to a back-reference
The method of referring to a back-reference is via a $-prefixed number, $0 through $9. The back-reference $0 always refers to the the whole matched string (or sub-string) for the stated query regex, i.e. it may match all or part of the target string. $1 to $9 refer to any defined back-references within the overall regex, as discussed in examples below.
Back-references are returned (i.e. number-referenced) in the order created. The order is usually left-to right in order the parentheses open (note this allows for nesting) but to understand that process better, read up on regular expression back-references.
Do back-references need quoting? No, if $MyString is "This or that", all the following result in a value of "This and that":
$MyString=$MyString.replace("(^.+)or(.+$)","$1and$2");
$MyString=$MyString.replace("(^.+)or(.+$)","$1"+"and"+"$2");
$MyString=$MyString.replace("(^.+)or(.+$)", $1+"and"+$2);
Back-references 1: in an agent context
This is an example of an agent query designed to create back-references that can then be used in the agent's query:
query: $Text.contains("email: (\w+([,| |-]*\w*)*)\<([^>]+)\>, on (\d+/\d+/\d+)")
action: $FullName=$1; $Email=$3
The action will set the value of attributes $FullName and $Email using the back-references to regex found in the currently focused notes $Text (well strictly, the note's alias as this is an agent). So, for a worked example, if the $Text was:
Project X
Brief discussion to finalise resources allocation
Source email: John Doe<johndoe@example.com>, on 24/03/2010
Follow up actions: Bob, Mary.
…then the above query would give the following back-references:
- $0:
email: John Doe
(i.e. the full matched sub-string within the source text—i.e. not necessarily all of that text)., on 24/03/2010 - $1:
John Doe
i.e. the contents of the first parentheses-delimited code\w+([,| |-]*\w*)*
(note the nested parentheses to deal with names of two or more words. - $2: Empty, nested inside $1 it serves to capture second and subsequent words in $1. See below for more on nesting back-reference groups.
- $3:
johndoe@example.com
i.e. the contents of the third parentheses-delimited section of code[^>]+
- $4:
24/03/2010
i.e. the contents of the fourth parentheses-delimited section of code\d+/\d+/\d+
- ($5 through $9: returns nothing, as they have no source match defined.)
Back-references 2: using if(){}
Using the same examples as above, an if() usage might look like this (the line breaks are not significant and only for clarity of reading here):
if($Text.contains("Emailed by: (\w+([,| |-]*\w*)*)<([^>]+)>, on (\d+/\d+/\d+)")){
$MyString=$0;
$FullName=$1;
$Email=$3;
$StartDate=date($4);
};
In this method the if() operator holds the query and generates the back-references. These can be used anywhere within, but only within the operators { }
curly braces enclosing the action code. The back-reference could be used in the else { }
branch, but the nature of the overall usage (i.e. for back-reference generation) means this is unlikely.
See if() for further back-reference usage examples.
Back-references 3: using string.replace()
The use of string.replace() is to replace part of an existing current string attribute value. The operator can be thought of in terms of $SourceDataString("query","return string") where the "return string" might be one or more back references form the query and may include string literals.
For example, assume $MyString has the value of "AABBCC", from which it is desired to make a value of "BB". Essentially this means deleting all the non-'B' characters. This can be done by capturing the 'B's in a back-reference and using that to replace the original value. Thus to replace the original $MyString value:
$MyString = $MyString.replace(".*(BB).*","$1");
Note the $1 back-reference must be inside quotes for the second argument to work. Alternatively, the altered string can be saved to a different attribute, leaving $MyString unchanged
$AnotherString = $MyString.replace(".*(BB).*","$1");
The back-references created here cannot be used except in the second input ('replacement') argument. Clearly, the applications for using string.replace() are far more limited than when using an if() statement.
See String.replace() for further examples of use of back-references within an action context.
Nesting back-references
Back-references may be nested is side one another (as seen in the opening example above):
Query: $Name.contains("(a(ard))v(ark)")
Action: $MyString =$1; $MyStringA = $2; $MyStringB = $3;
For the matched note the 3 attributes set by the action will hold, in order, "aard", "ard" and "ark". This shows back-references are numbered in the order encountered running left to right and not by some other system such as the level of nesting.
Literal parentheses
Literal parentheses in regexs must be escaped by a backslash. To match "this (that) other", use:
$Text.contains("this \(that\) other")
To capture "(that)" as back-reference $1:
$Text.contains("this (\(that\)) other")
Sometimes parentheses are needed, e.g. in the agent example shown earlier above, in order to achieve the right match, but which do not match anything meaningful to back-reference use. Do not worry about that, you do not need to use every back-reference created.
What is the role of $0?
$0 is always the whole matched (sub-)string for the stated attribute value but if the regex regex creates additional back-references within the query then $1 through $9 may be used to access those additional match sub-strings.
In this case above, $0 is not all of the current note's $Text, the overall source for the query, but rather it is all the text matched within $Text by the regex code in the '.contains("regex")' operator's regex.
Often, the regex matches the entire source so $0 returns the whole source text. The structure of the example above is deliberate, so as show that $0 attaches to the regex's match rather than simply being the entire text being passed to the regular expression.
Do not worry too much about getting the right number. If new to this sort of work and using a regex with several back references, you are strongly advised to try it in a small test file first. This makes it easier to make sure:
- that the overall regex matches the right notes
- that the back references return the right content
- which $-number refers to which extracted content
Fetching all back-references
From v9.6.0, the %matches operator returns a list of all populated references in order $0 to $9. Thus if a query populates 3 back-references within the overall match, then %matches returns a List of $0, $1, $2, and $3.
Returning the match offset position (dot operators only)
If the regular expression regex used with the contains() family of dot-operators (e.g. String.contains()) is found the function returns the match's offset+1, where offset is the distance from the start of the string to the start of the matched regex. Formerly, .contains() returned true
if the regex was found. The '+1' modifier ensures that a match at position zero returns a number higher than zero which would otherwise coerce to false
. Since the offset+1 is always true
, no changes are required in existing documents but the function also gives usable offset information, albeit requiring adjustment for use with zero-based indices such as List.at() or String.substr().