Lexical vs. numeric sorting

Tinderbox containers can sort their contents, as can agents. In addition action code offers methods for sorting lists. Sorting generally occurs in one of two forms, lexical or numerical.

Sorting can be set via action code or more normally via the Sort tab of the Action Inspector.

Lexical sort. A lexical sorts characters in broadly alphanumeric order, for unaccented Roman alphabet languages like English. In fact such sorts look at the underlying ASCII/Unicode character number and sort from lowest to highest for each character, in turn of a word or string of characters. Thus numbers always sort before uppercase letters and upper case before lower case letters. Accented characters come after that. This unusual order reflects the numerical sequence of codes used indicate different letters symbols and numbers. This order has several odd effects:

numbers sort out of arithmetical sequence: 1, 10, 11, 120, 13, 2 instead of 1,2,10,11,13,120.
instances of the same word in different letter case do not sort together: Ant, Bee, ant not Ant, ant, Bee.
words with accents may sot out of sequence.
sequential numbers in word strings do not sort sequentially: Chapter 1, Chapter 11, Chapter 2 not Chapter 1, Chapter 2, Chapter 11.

Numerical sort. Only used for number sequences. Here the numerical values of the whole number string is computed and these values sorted in ascending numerical sequence order. Thus the order 1,2,10,11,13,120 not the lexical order of 1, 10, 11, 120, 13, 2.

Dates. Date sort, in date order naturally. The exact form is neither strictly lexical or numerical but Tinderbox takes care of date sort correctly.

Transforms

To work around some of the limitations of basic lexical sorts, as seen from a human perspective, Tinderbox also offers some 'transforms' which tweak the way sorting occurs:

case-sensitive. Sorts upper and lower case sort in separate sequences, not alphabetically (a 'computer' sort). This is the default.
case-insensitive. Converts elements to lower case before comparing them, resulting in a more normal 'human sort' where similar letter sort together alphabetically regardless of letter case.
last word. Sorts on the last word of the value, using a case-insensitive sort (see above). This is useful when the specified attribute values are personal names, as often occurs in bibliographies and blog rolls. The last 'word', is the substring of characters between the last space in the string and the
original note. Instructs the sort routine to sort any (alias) note based on the properties of the original note, not on the intrinsic property of the alias. When applied to attributes that are not intrinsic, the transformer has no effect. This setting is useful in agents when the source of the alias is itself an alias as opposed to an original notes where sorting on $OutlineOrder would not necessarily give the expected outcome (see the Release Notes for fine detail as to why).

Sorting in accented/non-roman text languages

This may likely not be as expected due to the limitations of lexical sorts which are not, without further manipulation ('collation'), aware of per-language sorting nuances. This area of the application is noted as having scope for improvement and likely more locale-specific collation will become available in due course.

So for other characters, accents, etc., sorts may not meet linguistic expectation as the values will be based on Unicode sort order. Thus:

"dog" > "cat"

"dog" > "Dog"

"dogs" > "dog"

"dogs" > "dogma"

"dogs" < "døg" <-- NOTE!

The prevailing locale's sorting rules for handling diacritics and accents.

Tinderbox will use the OS' localisation settings to determine what rules apply for the sorting of accented and other characters such as a ß. If it is desirable to sort using a different localisation, consider use of locale() to alter the local Tinderbox environment.

Sorting and Lists/Sets

The discrete values are sorted such they are listed in lexical sort order.

Up: Coding conventions

Previous: Case-sensitivity Next: Quoting and escaping strings in Tinderbox coding

A Tinderbox Reference File : Objects & Concepts : Coding conventions : Lexical vs. numeric sorting