Stream Processing and parsing

Tinderbox's string processing operators are intended to help extract information from structured and semi-structured text. Such text may be hand-typed, for copied from sources like email. Often, it may be imported from other programs or downloaded from web services into a Tinderbox attribute. The need is to extract needed information from this text.

Tinderbox does not handle true streams, i.e. continuously reading an API output in real time. But in its 'stream processing' Tinderbox is behaving in the same manner. A key point is the stream is read by moving forward, never backwards—the process essentially forgets the already precessed parts of the stream. While the overall process can be run again on the same source, e.g. a note's $Text, within a running stream parse, it cannot move/look backwards in the source ('stream').

'Stream' parsing uses 'lines' of text. Importantly this isn't a line as seen on screen but content between the start, each line break and end. In lay terms a paragraph of text is a 'line'. Understanding this is important to following how stream parsing goes about its task.

Regular expressions. Be aware that stream processing operators do not use regular expressions (regex), except in clearly marked exceptions. If regex are needed to complete the task, either use ordinary String processing operators or insert appropriate delimiters into the text before processing.

Processing a text as if it were a stream

Broadly speaking, the parsing approach is to begin at the start of the string and proceed, step by step, following a recipe (of chained dot-operators). For example, such a 'recipe' might say:

Read until you find a line that begins with "To:", "From:", or "Subject:"
If you find a "To:", copy everything character following that up to the first space character encountered and save the copy in at the current note's $Email.
If you find a "From:", copy everything character following that up to the first space character encountered and save the copy in at the current note's $EmailFrom.
If you find a "Subject:", get the rest of the current line and use that for the $Name of the current note.
Having found a "Subject:, delete all the headers you have processed and leave the rest of the text.
If you never find a "Subject:", do not delete anything.

All functional string processing operators accept a string, in this documentation called the stream, of text being processed. In the the majority of cases, but not all, this is likely to be a note's $Text, or an attribute/variable value based on some $Text.

Stream processors act in some way on the stream possibly saving some data into an attribute or simply moving further forward (left-to-right) and returning the unprocessed remainder (right-most portion of the stream) which may be passed to another operators such as further chained dot-operators. For example:

$MyString.skip(22).captureNumber("MyNumber");

takes the value of MyString, skips exactly 22 characters, and extracts a number to be stored in $MyNumber. For instance MyString holds string "We think there may be 1234 items":

$MyString.skip(22).captureNumber("MyNumber");

$MyNumber is 1234. But if MyString holds string "We think there may be 1,234 items" then $MyNumber is 1 as a comma follows the first number (after the skip operator consumes the first 22 characters.).

The parsing operators can best be understood as a series of discrete roles: