(This is sections 70 to 76; see below, after 76, for comments.)
To summarize the tokens from part 5 and this one: sequences of tokens are placed into the (sub-)arrays of tok_mem
(more specifically, into the arrays tok_mem[0]
to tok_mem[2]
where the “2” here is zz - 1
) as follows:
printable chars (33 to 126) are represented as themselves,
some other special tokens are represented as special codes (0
represents param
, etc),
the identifier n
(i.e. the one whose name starts at byte_start[n]
in byte_mem
) is represented as two bytes 128 + (n // 256)
and n % 256
,
the module name n
(i.e. the one whose name starts at byte_start[n]
in byte_mem
) is represented as two bytes 168 + (n // 256)
and n % 256
,
the module number n
is represented as two bytes 208 + (n // 256)
and n % 256
.
For example, after reading 2000 lines of tex.web
, these are the contents of the (first 3 of 5) tok_mem
arrays:
gdb /path/to/tangle
(gdb) break zinputln
Breakpoint 1 at 0x2ba7: file tangle.c, line 438.
(gdb) run /path/to/tex.web
(gdb) continue 2000
(gdb) p line
$1 = 2000
(gdb) p tokmem[0]
$2 = "\t\320\v\200#=30000;\200$=0;\200%=500;\200&=72;\200'=42;\200(=79;\200)=200;\200*=6;\200+=75;\200,=20000;\200-=60;\200.=40;\200/=3000;\200\060=8000;\200\061=32000;\200\062=600;\200\063=8000;\200\064=500;\200\065=800;\200\066=40;\200\067='TeXformats:TEX.POOL ';\000\030\000-1\320\022\200`=0 255;\320\027\200li\030\060\200m\f37\200[\200h[i]\030' ';\200li\030\f177\200m\f377\200[\200h[i]\030' ';\200w(\000)=0\320 \200\227:\200p;\200\230:\200p;\200\\\000\200\020\200\260[\200\262]\030\200\253(\000);\200U(\200\262);\200\022\320-\200y\200\277(s:\200\256;k:\200?):\200{;\200\006\200S;\200\vj:\200\255;\200\300:\200{;\200\020j\030\200\261[s];\200Yj<\200\261[s+1]\200[\200\020\200A\200\254(\200\260[j])\032\200\210[k]\200C\200\020\200\300\030\200\222;\200^\200S;\200\022;\200U(j);\200U(k);\200\022;\200\300\030\200Z;\200S:\200\277\030\200\300;\200\022;\320\061(k<\200\226)\200B(k>\200\313)\320\065\200\020a\030\060;k\030\061;\200X\200\020\200A(\200e[n]<\200\306)\200B(\200e[n]>\200\323)\200C\200\315('! TEX.POOL check sum doesn''t have nine digits.');a\030\061\060*a+\200e[n]-\200\306;\200Ak=9\200C\200^\200I;\200U(k);\200\321(\200\314,n);\200\022;\200I:\200Aa\032}\200C\200\315('! TEX.POOL doesn''t match; TANGLE me again.');c\030\200Z;\200\022\200\241(\200\230)\320:\200\r\200\357(s:\200`);\200\006\200E;\200\020\200A\250\360\200C\200A\200\335<\200ـC\200\020\200\354;\200];\200\022;\200\355\200݀g\200\330:\200\020\200\346(\200h[s]);\200\351(\200h[s]);\200U(\200\340);\200U(\200\341);\200A\200\340=\200(\200C\200\020\200\350;\200\340\030\060;\200\022;\200A\200\341=\200(\200C\200\020\200\353;\200\341\030\060;\200\022;\200\022;\200\327:\200\020\200\351(\200h[s]);\200U(\200\341);\200A\200\341=\200(\200C\200\354;\200\022;\200\326:\200\020\200\346(\200h[s]);\200U(\200\340);\200A\200\340=\200(\200C\200\354;\200\022;\200\325:\200\\;\200\331:\200A\200\337<\200\343\200C\200\342[\200߀D\200&]\030s;\200\332:\200\020\200A\200\262<\200\061\200C\200\270(s);\200\022;\200 \200\251(\200\356[\200\335],\200h[s])\200\";\200U(\200\337);\200E:\200\022;\320?\200\r\200\370(s:\200\256);\200\vc:\200?;\200\020\250\371;\200Ac\035\060\200C\200Ac<256\200C\200\361(c);\200\365(s);\200\022;\200\361\200\020\200A\201\021=\201\017\200C\200\237;\200\366(\201\022);\200\361(\000);\200\022\320M\201\030\030\200Z;\201\031\030\200Z;\201\033\030\060;\201'[3]\030\000;\201)\200\020\201.\030\062;\201(\320O\201':\200f[0 5]\200g\200\256;\201.:0 6;\201\065:\200{;\320T\200\355c\200g\200\306,\201B,\201\006,\201C,\201D,\201E,\201F,\201G,\201H,\200\323:\200A\201\030\200C\251I;\200\030\201J:\200\020\201%;\200^\200H;\200\022;\200\031\201K:\200A\201L>0\200C\200\020\200\366(\201M);\200\365(\201N[\201L].\201O);\200\361(\201P);\200\374(\201Q);\201\021\030\201\016;\201\066;\200\022;\201R:\251S;\201T:\251U;\201V,\201W,\201X:\251Y;\201Z:\200\020\201\021\030\201\016;\201\066;\200\022;\200 \200\\\200\";\251[", '\000' <repeats 64240 times>
(gdb) p tokmem[1]
$3 = "'This is TeX, Version 3.14159265'\n\320\b\250\036\200\034\250\037\200\035\063\060\060\060\060\000\030-\000\200b\320\030\200li\030\200c\200m\200d\200[\200e[\200n(i)]\030\200k;\200li\030\f200\200m\f377\200[\200e[\200h[i]]\030i;\200li\030\060\200m\f176\200[\200e[\200h[i]]\030i;\320\033\200y\200z(\200\vf:\200p):\200{;\200\020\200|(f,\200t,'/O');\200z\030\200v(f);\200\022;\200y\200}(\200\vf:\200p):\200{;\200\020\200~(f,\200t,'/O');\200}\030\200x(f);\200\022;\200y\200\177(\200\vf:\200s):\200{;\200\020\200|(f,\200t,'/O');\200\177\030\200v(f);\200\022;\200y\200\200(\200\vf:\200s):\200{;\200\020\200~(f,\200t,'/O');\200\200\030\200x(f);\200\022;\200y\200\201(\200\vf:\200\202):\200{;\200\020\200|(f,\200t,'/O');\200\201\030\200v(f);\200\022;\200y\200\203(\200\vf:\200\202):\200{;\200\020\200~(f,\200t,'/O');\200\203\030\200x(f);\200\022;\200|(\200\227,'TTY:','/O/I')\320#\200A\200\240=0\200C\200\020\200\241(\200\230,'Buffer size exceeded!');\200^\200\027;\200\022\200\223\200\020\200\242.\200\243\030\200\211;\200\242.\200\244\030\200\212-1;\200\245(\200\246,\200%);\200\022\320&\200\255=0 \200\061;\200\256=0 \200/;\200\257=0 255;\200V(\200\262)\320.\200y\200\301(s,t:\200\256):\200{;\200\006\200S;\200\vj,k:\200\255;\200\300:\200{;\200\020\200\300\030\200\222;\200A\200\266(s)\032\200\266(t)\200C\200^\200S;j\030\200\261[s];k\030\200\261[t];\200Yj<\200\261[s+1]\200[\200\020\200A\200\260[j]\032\200\260[k]\200C\200^\200S;\200U(j);\200U(k);\200\022;\200\300\030\200Z;\200S:\200\301\030\200\300;\200\022;\320\062\200\034\200\314:\200p;\200\035\320\066\200\334:\200p;\200\335:0 \200\333;\200\336:\200f[0 22]\200g0 15;\200\337:\200?;\200\340:0 \200(;\200\341:0 \200(;\200\342:\200f[0 \200&]\200g\200`;\200\343:\200?;\200\344:\200?;\200\251(\200\334,\000)\320;\200\r\200\361(s:\200?);\200\006\200E;\200\vj:\200\255;\200\362:\200?;\200\020\200As\035\200\263\200Cs\030\200\363\200\223\200As<256\200C\200As<0\200Cs\030\200\363\200\223\200\020\200A\200\335>\200ـC\200\020\200\357(s);\200];\200\022;\200A(\250\360)\200C\200A\200\335<\200ـC\200\020\200\354;\200];\200\022;\200\362\030\200\364;\200\364\030-1;j\030\200\261[s];\200Yj<\200\261[s+1]\200[\200\020\200\357(\200\254(\200\260[j]));\200U(j);\200\022;\200\364\030\200\362;\200];\200\022;j\030\200\261[s];\200Yj<\200\261[s+1]\200[\200\020\200\357(\200\254(\200\260[j]));\200U(j);\200\022;\200E:\200\022;\320@\200\r\200\372(k:\200o);\200\020\200Yk>0\200[\200\020\200V(k);\200A\200\336[k]<10\200C\200\357(\200\306+\200\336[k])\200\223\200\357(\200\373-10+\200\336[k]);\200\022;\200\022;\320E\200\r\201\003(n:\200?);\200\006\200E;\200\vj,k:\200\255;u,v:\201\004;\200\020j\030\200\261[\201\005];v\030\061\060\060\060;\200X\200\020\200Yn\035v\200[\200\020\200\357(\200\254(\200\260[j]));n\030n-v;\200\022;\200An\034\060\200C\200];k\030j+2;u\030v\200\312(\200\254(\200\260[k-1])-\200\306);\200A\200\260[k-1]=\200\253(\201\006)\200C\200\020k\030k+2;u\030u\200\312(\200\254(\200\260[k-1])-\200\306);\200\022;\200An+u\035v\200C\200\020\200\357(\200\254(\200\260[k]));n\030n+u;\200\022\200\223\200\020j\030j+2;v\030v\200\312(\200\254(\200\260[j-1])-\200\306);\200\022;\200\022;\200E:\200\022;\320I\201\021:\201\f \201\017;\320N\200\r\201\034;\201\035;\200\r\201\036;\201\035;\200\r\201\t;\201\035;\200\r\201\037;\201\035;\200\r\201 ;\201\035;\200\r\201!;\201\035;\200\r\201\";\201\035;\200\r\201#;\201\035;\200\r\201$;\201\035;\200\030\200\r\201%;\201\035;\200\031\201'[4]\030\000;\201*\200\020\201.\030\063;\201)\320P\201.\030\060;\201\065\030\200\222;\320U\200\020\200\361(\201\\);\200\366(\201]);\200\366(\201^);\200A\201L>0\200C\200\361(\201_);\200A\201\030\200C\200\366(\201`);\200\366(\201a);\200\022", '\000' <repeats 63974 times>
(gdb) p tokmem[2]
$4 = "t\177y\177p\177e\t\320\t\t\177$C-,A+,D-\n\200\030\t\177$C+,D+\n\200\031\320\r\200>:\200?;\200Y\200Z\200[\320\023i:\200?;\320\031\200o=0 255;\200p=\200q\200r\200g\200a;\200s=\200q\200r\200g\200o;\320\034\200\r\200\204(\200\vf:\200p);\200\020\200\205(f);\200\022;\200\r\200\206(\200\vf:\200s);\200\020\200\205(f);\200\022;\200\r\200\207(\200\vf:\200\202);\200\020\200\205(f);\200\022;\200~(\200\230,'TTY:','/O')\200\242.\200\243\320'\200\260:\200q\200f[\200\255]\200g\200\257;\200\261:\200f[\200\256]\200g\200\255;\200\262:\200\255;\200\263:\200\256;\200\264:\200\255;\200\265:\200\256;\200\020\200A\200\262+\000>\200\061\200C\200\245(\200\273,\200\061-\200\264);\200\022\320/\200\034\200y\200\302:\200{;\200\006\200I,\200E;\200\vk,l:0 255;m,n:\200a;g:\200\256;a:\200?;c:\200{;\200\020\200\262\030\060;\200\263\030\060;\200\261[0]\030\060;\250\303;\250\304;\200E:\200\022;\200\035\200\020\200\237;\200\241(\200\230,\000);\200\204(\200\314);\200\302\030\200\222;\200];\200\022\320\067\200\335\030\200\326;\200\337\030\060;\200\340\030\060;\200\341\030\060;\200\241(\200\334,\000)\320<\200\r\200\365(s:\200?);\200\vj:\200\255;\200\020\200A(s\035\200\263)\200B(s<256)\200C\200\361(s)\200\223\200\020j\030\200\261[s];\200Yj<\200\261[s+1]\200[\200\020\200\361(\200\254(\200\260[j]));\200U(j);\200\022;\200\022;\200\022;\320A\200\r\200\374(n:\200?);\200\vk:0 23;m:\200?;\200\020k\030\060;\200An<0\200C\200\020\200\357(\200\375);\200An>-100000000\200C\200W(n)\200\223\200\020m\030-1-n;n\030m\200\312\061\060;m\030(m\200D10)+1;k\030\061;\200Am<10\200C\200\336[0]\030m\200\223\200\020\200\336[0]\030\060;\200U(n);\200\022;\200\022;\200\022;\200\316\200\336[k]\030n\200D10;n\030n\200\312\061\060;\200U(k);\200\320n=0;\200\372(k);\200\022;\320F\200\r\201\a;\200\vj:\200\255;\200\020j\030\200\261[\200\263];\200Yj<\200\262\200[\200\020\200\357(\200\254(\200\260[j]));\200U(j);\200\022;\200\022;\320J\201\021\030\201\017;\201'[0]\030\000;\200\022\201'[5]\030\000;\201+\200\020\201.\030\064;\201*\320Q\200\r\201\066;\200\020\200^\200\026;\200\022;\320V\200\020\201\033\030\060;\201\021\030\201\f+c-\201V;\200\361(\201b);\200\355c\200g\201V:\200\020\200\370(\201c);\200V(\200\335);\200\022;\201W:\200\370(\201d);\201X:\200\370(\201e);\200\022;\200\361(\201f);\200\354;\200\233;\200];\200\022", '\000' <repeats 64610 times>
(gdb) p tokstart
$6 = {0, 0, 0, 0, 0, 0, 33, 7, 45, 11, 1, 34, 8, 46, 11, 1, 44, 35, 49, 13, 187, 49, 43, 180, 18, 192, 53, 49, 180, 22, 203, 55, 56, 210, 1228, 253, 135, 91, 238, 1235, 260, 401, 165,
280, 1440, 274, 421, 183, 286, 1449, 276, 507, 188, 530, 1450, 277, 534, 244, 545, 1461, 301, 540, 272, 607, 1482, 433, 690, 360, 640, 1608, 449, 702, 392, 722, 1931, 658, 786, 415,
730, 1939, 664, 794, 423, 736, 2040, 964, 1038, 525, 801, 2108, 1020, 1116, 715, 862, 2186, 1022, 1370, 776, 878, 2287, 1051, 1381, 784, 901, 2320, 1070, 1467, 794, 911, 2330, 1080,
1477, 804, 915, 2339, 1089, 1486, 813, 924, 2348, 1119, 1499, 830, 1047, 2411, 1295, 1561, 925, 1129, 0 <repeats 9872 times>}
Here is the data in a more symbolic form. Hover over an index in the tok_start
array (last row) to see the string it represents in the tok_mem
arrays (first five rows).
And here are just the texts (text i
for each i
), without the complications of splitting into five arrays, and keeping track of start
indices:
To make it even more readable, we can pair with the byte memory (the identifiers aka names), so that the “N@” and “M@” references above can be resolved. The “next” below refers to the text_link
array.
If we recall that the name 0
refers to the unnamed module, we can now finally put together the “name” and “text” arrays, along with the pointers between them, namely: equiv
points from names to (sometimes) text equivalents, and text_link
points from one text to another (its sequel). This will make sense of most the data structures introduced in Part 5 (section 38), except we don’t need to care about the link
and ilk
arrays anymore because they are either an internal implementation detail for finding the number from the name, or contain only trivial information.
(gdb) p equiv
$1 = {0, 0, 0, 0, 0, 0, 0, 5, 0, 8, 0, 0, 7, 0, 9, 0, 2, 3, 4, 0, 6, 0, 1073741824, 1073742079, 0, 0, 0, 0, 0, 1073741824, 1073741837, 1073741951, 0, 0, 0, 0, 0, 15, 0, 0, 18, 21, 17, 0, 0, 1073741872, 0, 1073741921, 0, 19, 1073741918, 0, 0, 1073741858, 1073741856, 0, 1073741950, 0, 0, 0, 0, 0, 0, 51, 0, 22, 0, 0, 0, 1073741881, 0 <repeats 9931 times>}
(gdb) p textlink
$2 = {1, 16, 0, 0, 0, 10000, 0, 10000, 12, 10, 11, 14, 13, 20, 10000, 0, 10000, 0, 10000, 10000, 10000, 10000, 10000, 0 <repeats 9978 times>}
Let’s summarize what we have so far:
The “bytes” of the program (names of modules, names of macros, names of identifiers, and double-quoted strings) are written into byte_mem
, with indexes in byte_start
.
The “tokens” of the program (the replacement text for simple and parametric macros, the replacement text for modules) are written into tok_mem
, with indexes in tok_start
.
We have ways of looking up either modules or identifiers (any of the others) by name.
We have the numeric equivalents (for numeric macros and double-quoted strings) and text equivalents (for simple macros, parametric macros, modules) (and the latter of these we can follow “to be continued” links for, using text_link
).
Given a name (or when looking at the replacement text for a macro / module), we can say what type it is.
Here is a visualization of some of all of this, for the memory at the end of phase one of reading POOLTYPE.web.