BASIC file formats
BASIC programs written for TI BASIC and Extended BASIC are not stored as plain text in memory. This is different with assembler programs which are edited as text files and then assembled to a Tagged Object Code file.
This is not appropriate for BASIC. When the program is started, and it would be stored as plain text, the BASIC interpreter would have to parse the line first, finding out the commands and the arguments, and then execute it. This is typical for script languages of today, but it would be just too slow, and we know well that TI BASIC and Extended BASIC are quite slow, compared with other platforms.
BASIC lines are tokenized. For each command or special character or character sequence that has a meaning in BASIC there is a one-byte code, the token. Example:
|"..." (quoted string)||c7|
You can find a complete table here.
So let us take a simple BASIC line like
There will not be a string like "PRINT" in memory, because the parser recognized this word as a command and replaced it with its token. Second, there is a string following the command, which is enclosed in quotes. The contents can be anything, so the parser must copy it into memory as is.
Finally, the line is converted to the following byte sequence:
|line length||"..."||string length||H||E||L||L||O||end|
000000: 00 3f 37 a7 37 98 37 d7 00 28 37 a9 00 1e 37 ac .?7.7.7..(7...7. 000010: 00 14 37 b2 00 0a 37 ca 02 8b 00 05 96 52 4f 57 ..7...7......ROW 000020: 00 17 a2 f0 b7 52 4f 57 b3 c8 01 31 b6 b5 c7 04 .....ROW...1.... 000030: 54 45 53 54 b4 52 4f 57 00 0e 8c 52 4f 57 be c8 TEST.ROW...ROW.. 000040: 01 31 b1 c8 02 32 30 00 .1...20.
The numbers on the left (xxxxx:) are the offset from the beginning of the file. At the right side we see the ASCII representation of the bytes, where unprintable characters are shown by a dot. The offsets and the ASCII column are not part of the file but added for better readability.
There are no commands to be seen, but we should expect nothing like that, after reading the above paragraphs.
At first we cut away the offsets and the ASCII column, and we add some line breaks so we see the file structure. We join some bytes together as they are parts of words.
003f 37a7 3798 37d7 0028 37a9 001e 37ac 0014 37b2 000a 37ca 02 8b 00 05 96 52 4f 57 00 17 a2 f0 b7 52 4f 57 b3 c8 01 31 b6 b5 c7 04 54 45 53 54 b4 52 4f 57 00 0e 8c 52 4f 57 be c8 01 31 b1 c8 02 32 30 00
Everything is still the same. We can now analyse the contents of the file. The memory location refers to the locations where the portions of the program will reside when we load it into memory with OLD.
|Header||003f 37a7 3798 37d7|
|Line Number Table||3798 - 379b||0028 37a9|
|379c - 379f||001e 37ac|
|37a0 - 37a3||0014 37b2|
|37a4 - 37a7||000a 37ca|
|Program lines||37a8 - 37aa||02 8b 00|
|37ab - 37b0||05 96 52 4f 57 00|
|37b1 - 37c8||17 a2 f0 b7 52 4f 57 b3 c8 01 31 b6 b5 c7 04 54 45 53 54 b4 52 4f 57 00|
|37c9 - 37d7||0e 8c 52 4f 57 be c8 01 31 b1 c8 02 32 30 00|
The file starts with a header, containing four 16-bit words. When you carefully look at the table above, you can already deduce how those numbers are calculated.
- The second word is the start of the line number table.
- The third word is the end of the line number table.
- The fourth word is the end of available memory. Programs are always loaded so that their last byte falls on the highest available address.
The first word is calculated as the XOR of the addresses of the start and end of the line number table:
37a7 = 0011011110100111 XOR 3798 = 0011011110011000 ----------------------- 003f = 0000000000111111
If this word is negated, the BASIC program is protected and cannot be listed. This is only available in Extemded BASIC.
Line number table
The next block is the line number table (LNT). Again, when you look carefully you see that we have a list of entries, each of which contains two 16-bit words:
- The first word is the line number.
- The second word is the location of the BASIC line in memory.
To be precise, the memory location is the second byte of a BASIC line. For example, the fourth entry (000a 37ca) tells us that the line in memory at address 37c9 is BASIC line 10.
Another interesting point is that the LNT is sorted, with the highest line number appearing at the low end, and the lowest number at the high end. In our example, the line numbers are 000a (10), 0014 (20), 001e (30), and 0028 (40). Moreover, the BASIC lines seem to be sorted in the same way, with the contents of line 10 near the memory end, and later lines growing towards lower memory.
What remains are the tokenized lines. Again, here are two things we can quickly find out.
- The last byte of each line is 00.
- The first byte is a length byte. The length byte does not count itself, but includes the 00 byte at the end.
Now it is time to find out what the program line consists of. We will now replace the tokens by their respective texts; actually, we do what happens in the computer when we execute the LIST command. Moreover, we assign the line numbers from the LNT.
|37a8 - 37aa||40||02||8b||00|
|37ab - 37b0||30||05||96 52 4f 57||00|
|37b1 - 37c8||20||17||a2 f0 b7 52 4f 57 b3 c8 01 31 b6 b5 c7 04 54 45 53 54 b4 52 4f 57||00|
|37c9 - 37d7||10||0e||8c 52 4f 57 be c8 01 31 b1 c8 02 32 30||00|
Next, we replace all tokens in the lines as we find them in the table.
|37a8 - 37aa||40||02||END||00|
|37ab - 37b0||30||05||NEXT 52 4f 57||00|
|37b1 - 37c8||20||17||DISPLAY AT ( 52 4f 57 , c8 01 31 ) : c7 04 54 45 53 54 ; 52 4f 57||00|
|37c9 - 37d7||10||0e||FOR 52 4f 57 = c8 01 31 TO c8 02 32 30||00|