BASIC file formats

From Ninerpedia
Revision as of 13:00, 29 March 2015 by Mizapf (talk | contribs)
Jump to navigation Jump to search

BASIC programs written for TI BASIC and Extended BASIC are not stored as plain text in memory. This is different with assembler programs which are edited as text files and then assembled to a Tagged Object Code file.

This is not appropriate for BASIC. When the program is started, and it would be stored as plain text, the BASIC interpreter would have to parse the line first, finding out the commands and the arguments, and then execute it. This is typical for script languages of today, but it would be just too slow, and we know well that TI BASIC and Extended BASIC are quite slow, compared with other platforms.

BASIC lines are tokenized. For each command or special character or character sequence that has a meaning in BASIC there is a one-byte code, the token. Example:

Command Token (hex)
NEW 00
SAVE 07
EDIT 09
PRINT 9c
& b8
"..." (quoted string) c7
SEG$ d8
VALIDATE fe

You can find a complete table here.

So let us take a simple BASIC line like

PRINT "HELLO"

There will not be a string like "PRINT" in memory, because the parser recognized this word as a command and replaced it with its token. Second, there is a string following the command, which is enclosed in quotes. The contents can be anything, so the parser must copy it into memory as is.

Finally, the line is converted to the following byte sequence:

09 9c c7 05 48 45 4c 4c 4f 00
line length PRINT "..." string length H E L L O end

Sample program

Let's have a look at a real Extended BASIC program. This is an output of TIImageTool which shows the contents of a PROGRAM file.

000000: 00 3f 37 a7 37 98 37 d7 00 28 37 a9 00 1e 37 ac     .?7.7.7..(7...7.
000010: 00 14 37 b2 00 0a 37 ca 02 8b 00 05 96 52 4f 57     ..7...7......ROW
000020: 00 17 a2 f0 b7 52 4f 57 b3 c8 01 31 b6 b5 c7 04     .....ROW...1....
000030: 54 45 53 54 b4 52 4f 57 00 0e 8c 52 4f 57 be c8     TEST.ROW...ROW..
000040: 01 31 b1 c8 02 32 30 00                             .1...20.

The numbers on the left (xxxxx:) are the offset from the beginning of the file. At the right side we see the ASCII representation of the bytes, where unprintable characters are shown by a dot. The offsets and the ASCII column are not part of the file but added for better readability.

There are no commands to be seen, but we should expect nothing like that, after reading the above paragraphs.

At first we cut away the offsets and the ASCII column, and we add some line breaks so we see the file structure. We join some bytes together as they are parts of words.

003f 37a7 3798 37d7 
0028 37a9 
001e 37ac
0014 37b2 
000a 37ca 
02 8b 00
05 96 52 4f 57 00 
17 a2 f0 b7 52 4f 57 b3 c8 01 31 b6 b5 c7 04 54 45 53 54 b4 52 4f 57 00 
0e 8c 52 4f 57 be c8 01 31 b1 c8 02 32 30 00

Everything is still the same. We can now analyse the contents of the file.

Meaning Contents
Header 003f 37a7 3798 37d7
Line Number Table 0028 37a9
001e 37ac
0014 37b2
000a 37ca
Program lines 02 8b 00
05 96 52 4f 57 00
17 a2 f0 b7 52 4f 57 b3 c8 01 31 b6 b5 c7 04 54 45 53 54 b4 52 4f 57 00
0e 8c 52 4f 57 be c8 01 31 b1 c8 02 32 30 00

TODO: continue