Operator Type:
Operator Scope of Action:
Operator Purpose:
Operator First Added:
Operator Altered:
Function [other Function type actions]
Item [operators of similar scope]
Data manipulation [other Data manipulation operators]
5.7.1
String.split("pattern")
This operator splits a string into a list, as divided by instances of pattern in the original string. Source characters forming part of pattern are not passed to the list. The source string itself is not affected.
pattern is one of:
- an action code expression (which includes just referencing a single attribute name')
- a quoted string; quoted strings may be either:
- a literal string (i.e. actual text)
- a regular expression
Useful pattern values are:
-
"\W+"
. This splits the source at word boundaries removing spaces and punctuation. -
"\n"
. This divides the string into discrete paragraphs, ignoring blank lines and/or lines/paragraphs with only spaces but no textual content. -
"\."
. This divides on sentences ending with a period - it will strip the terminating punctuation. -
"[\.\?\!]"
. As above but the sentence may end with any of full stop, question mark or exclamation mark.
The result of the operator is a List-type attribute value, i.e. the data should be passed to a list. Passing the output to a Set-type attribute will de-dupe any list values in the output with the first instance of any duplicates forming its set entry.
For example:
$MyList = "ant bee ant cow".split(" ")
gives "ant;bee;ant;cow"
$MySet = "ant bee ant cow".split(" ")
gives "ant;bee;cow"
$MyList = "ant, bee, cow".split("\W+ ")
gives "ant;bee;cow"
$MyList = "ant, bee, cow".split(" ")
gives "ant,;bee,;cow"
$MyList = $MyString.split($MyString(agent))
$MyList = $MyString(parent).split("and")
If the string, stored in $MyString, is multi-line:
ant
bee
cow
…then:
$MyList = $MyString.split("\n")
gives "ant;bee;cow".
This approach can be useful if trying to retrieve a specific paragraph of $Text, perhaps from notes exploded from a larger consistently formatted text source. To get a string holding just paragraph #3 of the source $Text (or other multi-line string data):
$MyString = $Text.split("\n").at(2)
Don't overlook the fact that that List.at() is zero-based. That means the first list item is .at(0) and so the third list item is '2' and not '3' as might otherwise be assumed. the last item is '-1':
$MyString = $Text.split("\n").at(-1)
There is one one limitation of this approach to working with $Text or multi-line strings. The issue is that blank lines or lines with only spaces, are ignored - lists don't hold 'empty' items. So if the string $MyString is multi-line and contains blank lines, like so:
ant
bee
cow
…then:
$MyList = $MyString.split("\n")
still gives "ant;bee;cow".
It doesn't matter if the blank is just two successive line returns or actually contains some white space - no list item is created for it.
Luckily there is a simple workaround is to seed empty lines with a single hyphen (or whatever placeholder you prefer, e.g. "N/A" or such). Thus:
$MyList = $Text.replace("\n\n","\n-\n").split("\n");
…now gives $MyList "ant;-;bee;cow" such that "bee" is still paragraph #3 of the new list, as in the original text. If you wanted to make a deliberate review of such data you might use a more distinctive marker string:
$MyList = $Text.replace("\n\n","\n#####\n").split("\n");
You could then query for $MyList.contains(""#####)
.
Dealing with inline quote characters
Because pattern is parsed for regular expressions, it may be possible to use the '\dnn' form described here to work around the lack of escaping from single double quotes within strings.
Dealing with inline semi-colons
As this function outputs a list, where values are semi-colon delimited, if the source string - such as $Text - has semicolons in it they act as extra (unexpected!) splits when viewing the outcome. To get around this, escape the semicolons on the fly:
$MySet = $Text.replace(";","\\;").split("\n")