Tinderbox containers can sort their contents, as can agents. In addition action code offers methods for sorting lists. Sorting generally occurs in one of two forms, lexical or numerical.
Lexical sort. A lexical sorts characters in broadly alphanumeric order, for unaccented Roman alphabet languages like English. In fact such sorts look at the underlying ASCII/Unicode character number and sort from lowest to highest for each character, in turn of a word or string of characters. Thus numbers always sort before uppercase letters and upper case before lower case letters. Accented characters come after that. This unusual order reflects the numerical sequence of codes used indicate different letters symbols and numbers. This order has several odd effects:
- numbers sort out of arithmetical sequence:
1, 10, 11, 120, 13, 2instead of
- instances of the same word in different letter case do not sort together:
Ant, Bee, antnot
Ant, ant, Bee.
- words with accents may sot out of sequence.
- sequential numbers in word strings do not sort sequentially:
Chapter 1, Chapter 11, Chapter 2not
Chapter 1, Chapter 2, Chapter 11.
Numerical sort. Only used for number sequences. Here the numerical values of the whole number string is computed and these values sorted in ascending numerical sequence order. Thus the order
1,2,10,11,13,120 not the lexical order of
1, 10, 11, 120, 13, 2.
Dates. Date sort, in date order naturally. The exact form is neither strictly lexical or numerical but Tinderbox takes care of date sort correctly.
To work around some of the limitations of basic lexical sorts, as seen from a human perspective, Tinderbox also offers some 'transforms' which tweak the way sorting occurs:
- case-sensitive. Sorts upper and lower case sort in separate sequences, not alphabetically (a 'computer' sort). This is the default.
- case-insensitive. Converts elements to lower case before comparing them, resulting in a more normal human sort - where similar letter sort together alphabetically regardless of letter case.
- last word. Sorts on the last word of the value, using a case-insensitive sort (see above). This is useful when the target attribute values are personal names, as often occurs in bibliographies and blog rolls. The last 'word', is the substring of characters between the last space in the string and the
- original note. Instructs the sort routine to sort any (alias) note based on the properties of the original note, not on the intrinsic property of the alias. When applied to attributes that are not intrinsic, the transformer has no effect. This setting is useful in agents when the source of the alias is itself an alias as opposed to an original notes where sorting on $OutlineOrder would not necessarily give the expected outcome (see the Release Notes for fine detail as to why).
Sorting in accented/non-roman text languages
This may likely not be as expected due to the limitations of lexical sorts which are not, without further manipulation ('collation'), aware of per-language sorting nuances. This area of the application is noted as having scope for improvement and likely more locale-specific collation will become available in due course.
So for other characters, accents, etc., sorts may not meet linguistic expectation as the values will be based on Unicode sort order. Thus:
"dog" > "cat"
"dog" > "Dog"
"dogs" > "dog"
"dogs" > "dogma"
"dogs" < "døg" <-- NOTE!
From v6.5.0, the prevailing locale's sorting rules for handling diacritics and accents.
Sorting and Lists/Sets
The discrete values are sorted such they are listed in lexical sort order.