Very occasionally, on opening a file you may get a dialog with a message like:
Tinderbox was unable to parse this file. It may be damaged, or you may need a newer version of Tinderbox. The XML parser said : not-well formed (invalid token) (line 1232).
Whilst the exact line number stated at then end will vary, Tinderbox is telling you that the TBX file's data includes something it can not understand at the line (as referred to) in the source XML of the document. When Tinderbox opens a file, it has to read in all the XML data stored in the TBX document file. If this data has been corrupted in some way such that it is no longer in valid XML format, Tinderbox cannot read past that point and gives this error message.
How might such a thing occur? The error is unusual but causes include copying/pasting text from web pages that mis-declare their encoding, such as with web pages where quote marks show as question marks (e.g. a character) or less common accented characters as pairs of random characters. It is not always possible for the Tinderbox to detect that the data it is passed is not what it declares itself to be. This can cause the data to be stored inappropriately in Tinderbox - though the effect does not tend to surface until the document is next opened - which for some users can be hours or days after the triggering event. More technically, data is saved in a form that's not intelligible to the XML parser used to read the data when opening a TBX file. Similarly, AutoFetch of data from badly-encoded pages/feeds/sources can ingest data that has the same effect as above.
The solution is to send the file (or better, a zip of it) to Tinderbox support (firstname.lastname@example.org) and they should be able to fix it and return the document. At times the fix may result in the loss of all or part of $Text (or the affect attribute(s) of the affected note(s), but generally just the offending characters can be excised.
If you are doing some task where this happens more than once (perhaps you have very 'dirty' source material) then ensure you have reviewed your settings for back-ups and autosave. You can always change back to your defaults once the problem is resolved.
For the more technically minded…
If you are confident using a text editor and looking at XML source code you can have a go at fixing this yourself. If you do not have a text (code) editor - do not use TextEdit - a good free option is BBEdit. Now:
- Make a copy of your broken TBX and give it a new name indicating it is (will be) the 'fixed' version.
- In your text editor turn on line numbering (see your editor's manual if unsure how). It is also a good idea to turn on 'invisibles' so you see an on screen character for spaces, tabs, control characters, etc. Turn line wrap 'off', at least until you have located the problem.
- Open the TBX in the text editor app
- Scroll to the line mentioned in the original Tinderbox error message.
- Examine that line in the code for anything untoward (use the window's line numbering to find the right place). If your editor has syntax colouring it should stop working around where the error occurs.
- If it is not self evident as to the exact error, try removing the whole attribute value containing the bad line. Most commonly these errors occur in $Text. In such a case you might delete the value of that $Text, i.e. all code from the last
tag preceding the error position and the first
tag following it. You can always cut/paste the excised data to a text file for temporary safe keeping.
- Save the file close and re-open in Tinderbox. You may get another error (likely caused by the same thing in a different note). Rinse and repeat the previous steps until the file opens. You can then review what's been lost and the likely probable cause. If you think you know the latter, it is probably worth letting Tinderbox support know in case it is a cause they can predict and guard against in the future.
- If you have no joy at all delete your edited file and sent a copy of the original TBX to support.
For TextWrangler and BBEdit users a slightly more hands-off approach is offered via those apps' "zap gremlins" option. It should prune non-XML-safe characters from the file though you will not know exactly what that is. You will just have to look at the resulting TBX file and guess.