Tinderbox v9 Icon

Using regular expression back-references

Using parentheses within a regular expression regex, it is possible to set up to nine back-references from within the overall regex. These discrete sub-matches can then be used in the action connected with the query. The most normal means of setting a regex with back-references is by using the operators String.contains() or String.icontains().

Back-references can be used in actions in several contexts:


From v9.5.0, if(){…} and if(){…} else {…} now restore regular expression back-references ($1, $2…) to their state prior to the next if() statement.

IMPORTANT: The examples below are not intended to teach how regex work but simply to illustrate how back-references are used once created.

Referring to a back-reference

The method of referring to a back-reference is via a $-prefixed number, $0 through $9. The back-reference $0 always refers to the the whole matched string (or sub-string) for the stated query regex, i.e. it may match all or part of the target string. $1 to $9 refer to any defined back-references within the overall regex, as discussed in examples below.

Back-references are returned (i.e. number-referenced) in the order created. The order is usually left-to right in order the parentheses open (note this allows for nesting) but to understand that process better, read up on regular expression back-references.

Do back-references need quoting? No, if $MyString is "This or that", all the following result in a value of "This and that":



$MyString=$MyString.replace("(^.+)or(.+$)", $1+"and"+$2); 

Back-references 1: in an agent context

This is an example of an agent query designed to create back-references that can then be used in the agent's query:

query: $Text.contains("email: (\w+([,| |-]*\w*)*)\<([^>]+)\>, on (\d+/\d+/\d+)")

action: $FullName=$1; $Email=$3

The action will set the value of attributes $FullName and $Email using the back-references to regex found in the currently focused notes $Text (well strictly, the note's alias as this is an agent). So, for a worked example, if the $Text was:

	Project X
	Brief discussion to finalise resources allocation
	Source email: John Doe<johndoe@example.com>, on 24/03/2010
	Follow up actions: Bob, Mary.

…then the above query would give the following back-references:

Back-references 2: using if(){}

Using the same examples as above, an if() usage might look like this (the line breaks are not significant and only for clarity of reading here):

	if($Text.contains("Emailed by: (\w+([,| |-]*\w*)*)<([^>]+)>, on (\d+/\d+/\d+)")){

In this method the if() operator holds the query and generates the back-references. These can be used anywhere within, but only within the operators { } curly braces enclosing the action code. The back-reference could be used in the else { } branch, but the nature of the overall usage (i.e. for back-reference generation) means this is unlikely.

See if() for further back-reference usage examples.

Back-references 3: using string.replace()

The use of string.replace() is to replace part of an existing current string attribute value. The operator can be thought of in terms of $SourceDataString("query","return string") where the "return string" might be one or more back references form the query and may include string literals.

For example, assume $MyString has the value of "AABBCC", from which it is desired to make a value of "BB". Essentially this means deleting all the non-'B' characters. This can be done by capturing the 'B's in a back-reference and using that to replace the original value. Thus to replace the original $MyString value:

$MyString = $MyString.replace(".*(BB).*","$1"); 

Note the $1 back-reference must be inside quotes for the second argument to work. Alternatively, the altered string can be saved to a different attribute, leaving $MyString unchanged

$AnotherString = $MyString.replace(".*(BB).*","$1"); 

The back-references created here cannot be used except in the second input ('replacement') argument. Clearly, the applications for using string.replace() are far more limited than when using an if() statement.

See String.replace() for further examples of use of back-references within an action context.

Nesting back-references

Back-references may be nested is side one another (as seen in the opening example above):

Query: $Name.contains("(a(ard))v(ark)")

Action: $MyString =$1; $MyStringA = $2; $MyStringB = $3;

For the matched note the 3 attributes set by the action will hold, in order, "aard", "ard" and "ark". This shows back-references are numbered in the order encountered running left to right and not by some other system such as the level of nesting.

Literal parentheses

Literal parentheses in regexs must be escaped by a backslash. To match "this (that) other", use:

$Text.contains("this \(that\) other") 

To capture "(that)" as back-reference $1:

$Text.contains("this (\(that\)) other") 

Sometimes parentheses are needed, e.g. in the agent example shown earlier above, in order to achieve the right match, but which do not match anything meaningful to back-reference use. Do not worry about that, you do not need to use every back-reference created.

What is the role of $0?

$0 is always the whole matched (sub-)string for the stated attribute value but if the regex regex creates additional back-references within the query then $1 through $9 may be used to access those additional match sub-strings.

In this case above, $0 is not all of the current note's $Text, the overall source for the query, but rather it is all the text matched within $Text by the regex code in the '.contains("regex")' operator's regex.

Often, the regex matches the entire source so $0 returns the whole source text. The structure of the example above is deliberate, so as show that $0 attaches to the regex's match rather than simply being the entire text being passed to the regular expression.

Do not worry too much about getting the right number. If new to this sort of work and using a regex with several back references, you are strongly advised to try it in a small test file first. This makes it easier to make sure:

Fetching all back-references

From v9.6.0, the %matches operator returns a list of all populated references in order $0 to $9. Thus if a query populates 3 back-references within the overall match, then %matches returns a List of $0, $1, $2, and $3.

Returning the match offset position (dot operators only)

If the regular expression regex used with the contains() family of dot-operators (e.g. String.contains()) is found the function returns the match's offset+1, where offset is the distance from the start of the string to the start of the matched regex. Formerly, .contains() returned true if the regex was found. The '+1' modifier ensures that a match at position zero returns a number higher than zero which would otherwise coerce to false. Since the offset+1 is always true, no changes are required in existing documents but the function also gives usable offset information, albeit requiring adjustment for use with zero-based indices such as List.at() or String.substr().