Tangle -- Tokens

(This is sections 70 to 76; see below, after 76, for comments.)

To summarize the tokens from part 5 and this one: sequences of tokens are placed into the (sub-)arrays of tok_mem (more specifically, into the arrays tok_mem[0] to tok_mem[2] where the “2” here is zz - 1) as follows:

For example, after reading 2000 lines of tex.web, these are the contents of the (first 3 of 5) tok_mem arrays:

gdb /path/to/tangle

(gdb) break zinputln
Breakpoint 1 at 0x2ba7: file tangle.c, line 438.

(gdb) run /path/to/tex.web

(gdb) continue 2000

(gdb) p line
$1 = 2000

(gdb) p tokmem[0]
$2 = "\t\320\v\200#=30000;\200$=0;\200%=500;\200&=72;\200'=42;\200(=79;\200)=200;\200*=6;\200+=75;\200,=20000;\200-=60;\200.=40;\200/=3000;\200\060=8000;\200\061=32000;\200\062=600;\200\063=8000;\200\064=500;\200\065=800;\200\066=40;\200\067='TeXformats:TEX.POOL                     ';\000\030\000-1\320\022\200`=0 255;\320\027\200li\030\060\200m\f37\200[\200h[i]\030' ';\200li\030\f177\200m\f377\200[\200h[i]\030' ';\200w(\000)=0\320 \200\227:\200p;\200\230:\200p;\200\\\000\200\020\200\260[\200\262]\030\200\253(\000);\200U(\200\262);\200\022\320-\200y\200\277(s:\200\256;k:\200?):\200{;\200\006\200S;\200\vj:\200\255;\200\300:\200{;\200\020j\030\200\261[s];\200Yj<\200\261[s+1]\200[\200\020\200A\200\254(\200\260[j])\032\200\210[k]\200C\200\020\200\300\030\200\222;\200^\200S;\200\022;\200U(j);\200U(k);\200\022;\200\300\030\200Z;\200S:\200\277\030\200\300;\200\022;\320\061(k<\200\226)\200B(k>\200\313)\320\065\200\020a\030\060;k\030\061;\200X\200\020\200A(\200e[n]<\200\306)\200B(\200e[n]>\200\323)\200C\200\315('! TEX.POOL check sum doesn''t have nine digits.');a\030\061\060*a+\200e[n]-\200\306;\200Ak=9\200C\200^\200I;\200U(k);\200\321(\200\314,n);\200\022;\200I:\200Aa\032}\200C\200\315('! TEX.POOL doesn''t match; TANGLE me again.');c\030\200Z;\200\022\200\241(\200\230)\320:\200\r\200\357(s:\200`);\200\006\200E;\200\020\200A\250\360\200C\200A\200\335<\200ـC\200\020\200\354;\200];\200\022;\200\355\200݀g\200\330:\200\020\200\346(\200h[s]);\200\351(\200h[s]);\200U(\200\340);\200U(\200\341);\200A\200\340=\200(\200C\200\020\200\350;\200\340\030\060;\200\022;\200A\200\341=\200(\200C\200\020\200\353;\200\341\030\060;\200\022;\200\022;\200\327:\200\020\200\351(\200h[s]);\200U(\200\341);\200A\200\341=\200(\200C\200\354;\200\022;\200\326:\200\020\200\346(\200h[s]);\200U(\200\340);\200A\200\340=\200(\200C\200\354;\200\022;\200\325:\200\\;\200\331:\200A\200\337<\200\343\200C\200\342[\200߀D\200&]\030s;\200\332:\200\020\200A\200\262<\200\061\200C\200\270(s);\200\022;\200 \200\251(\200\356[\200\335],\200h[s])\200\";\200U(\200\337);\200E:\200\022;\320?\200\r\200\370(s:\200\256);\200\vc:\200?;\200\020\250\371;\200Ac\035\060\200C\200Ac<256\200C\200\361(c);\200\365(s);\200\022;\200\361\200\020\200A\201\021=\201\017\200C\200\237;\200\366(\201\022);\200\361(\000);\200\022\320M\201\030\030\200Z;\201\031\030\200Z;\201\033\030\060;\201'[3]\030\000;\201)\200\020\201.\030\062;\201(\320O\201':\200f[0 5]\200g\200\256;\201.:0 6;\201\065:\200{;\320T\200\355c\200g\200\306,\201B,\201\006,\201C,\201D,\201E,\201F,\201G,\201H,\200\323:\200A\201\030\200C\251I;\200\030\201J:\200\020\201%;\200^\200H;\200\022;\200\031\201K:\200A\201L>0\200C\200\020\200\366(\201M);\200\365(\201N[\201L].\201O);\200\361(\201P);\200\374(\201Q);\201\021\030\201\016;\201\066;\200\022;\201R:\251S;\201T:\251U;\201V,\201W,\201X:\251Y;\201Z:\200\020\201\021\030\201\016;\201\066;\200\022;\200 \200\\\200\";\251[", '\000' <repeats 64240 times>

(gdb) p tokmem[1]
$3 = "'This is TeX, Version 3.14159265'\n\320\b\250\036\200\034\250\037\200\035\063\060\060\060\060\000\030-\000\200b\320\030\200li\030\200c\200m\200d\200[\200e[\200n(i)]\030\200k;\200li\030\f200\200m\f377\200[\200e[\200h[i]]\030i;\200li\030\060\200m\f176\200[\200e[\200h[i]]\030i;\320\033\200y\200z(\200\vf:\200p):\200{;\200\020\200|(f,\200t,'/O');\200z\030\200v(f);\200\022;\200y\200}(\200\vf:\200p):\200{;\200\020\200~(f,\200t,'/O');\200}\030\200x(f);\200\022;\200y\200\177(\200\vf:\200s):\200{;\200\020\200|(f,\200t,'/O');\200\177\030\200v(f);\200\022;\200y\200\200(\200\vf:\200s):\200{;\200\020\200~(f,\200t,'/O');\200\200\030\200x(f);\200\022;\200y\200\201(\200\vf:\200\202):\200{;\200\020\200|(f,\200t,'/O');\200\201\030\200v(f);\200\022;\200y\200\203(\200\vf:\200\202):\200{;\200\020\200~(f,\200t,'/O');\200\203\030\200x(f);\200\022;\200|(\200\227,'TTY:','/O/I')\320#\200A\200\240=0\200C\200\020\200\241(\200\230,'Buffer size exceeded!');\200^\200\027;\200\022\200\223\200\020\200\242.\200\243\030\200\211;\200\242.\200\244\030\200\212-1;\200\245(\200\246,\200%);\200\022\320&\200\255=0 \200\061;\200\256=0 \200/;\200\257=0 255;\200V(\200\262)\320.\200y\200\301(s,t:\200\256):\200{;\200\006\200S;\200\vj,k:\200\255;\200\300:\200{;\200\020\200\300\030\200\222;\200A\200\266(s)\032\200\266(t)\200C\200^\200S;j\030\200\261[s];k\030\200\261[t];\200Yj<\200\261[s+1]\200[\200\020\200A\200\260[j]\032\200\260[k]\200C\200^\200S;\200U(j);\200U(k);\200\022;\200\300\030\200Z;\200S:\200\301\030\200\300;\200\022;\320\062\200\034\200\314:\200p;\200\035\320\066\200\334:\200p;\200\335:0 \200\333;\200\336:\200f[0 22]\200g0 15;\200\337:\200?;\200\340:0 \200(;\200\341:0 \200(;\200\342:\200f[0 \200&]\200g\200`;\200\343:\200?;\200\344:\200?;\200\251(\200\334,\000)\320;\200\r\200\361(s:\200?);\200\006\200E;\200\vj:\200\255;\200\362:\200?;\200\020\200As\035\200\263\200Cs\030\200\363\200\223\200As<256\200C\200As<0\200Cs\030\200\363\200\223\200\020\200A\200\335>\200ـC\200\020\200\357(s);\200];\200\022;\200A(\250\360)\200C\200A\200\335<\200ـC\200\020\200\354;\200];\200\022;\200\362\030\200\364;\200\364\030-1;j\030\200\261[s];\200Yj<\200\261[s+1]\200[\200\020\200\357(\200\254(\200\260[j]));\200U(j);\200\022;\200\364\030\200\362;\200];\200\022;j\030\200\261[s];\200Yj<\200\261[s+1]\200[\200\020\200\357(\200\254(\200\260[j]));\200U(j);\200\022;\200E:\200\022;\320@\200\r\200\372(k:\200o);\200\020\200Yk>0\200[\200\020\200V(k);\200A\200\336[k]<10\200C\200\357(\200\306+\200\336[k])\200\223\200\357(\200\373-10+\200\336[k]);\200\022;\200\022;\320E\200\r\201\003(n:\200?);\200\006\200E;\200\vj,k:\200\255;u,v:\201\004;\200\020j\030\200\261[\201\005];v\030\061\060\060\060;\200X\200\020\200Yn\035v\200[\200\020\200\357(\200\254(\200\260[j]));n\030n-v;\200\022;\200An\034\060\200C\200];k\030j+2;u\030v\200\312(\200\254(\200\260[k-1])-\200\306);\200A\200\260[k-1]=\200\253(\201\006)\200C\200\020k\030k+2;u\030u\200\312(\200\254(\200\260[k-1])-\200\306);\200\022;\200An+u\035v\200C\200\020\200\357(\200\254(\200\260[k]));n\030n+u;\200\022\200\223\200\020j\030j+2;v\030v\200\312(\200\254(\200\260[j-1])-\200\306);\200\022;\200\022;\200E:\200\022;\320I\201\021:\201\f \201\017;\320N\200\r\201\034;\201\035;\200\r\201\036;\201\035;\200\r\201\t;\201\035;\200\r\201\037;\201\035;\200\r\201 ;\201\035;\200\r\201!;\201\035;\200\r\201\";\201\035;\200\r\201#;\201\035;\200\r\201$;\201\035;\200\030\200\r\201%;\201\035;\200\031\201'[4]\030\000;\201*\200\020\201.\030\063;\201)\320P\201.\030\060;\201\065\030\200\222;\320U\200\020\200\361(\201\\);\200\366(\201]);\200\366(\201^);\200A\201L>0\200C\200\361(\201_);\200A\201\030\200C\200\366(\201`);\200\366(\201a);\200\022", '\000' <repeats 63974 times>

(gdb) p tokmem[2]
$4 = "t\177y\177p\177e\t\320\t\t\177$C-,A+,D-\n\200\030\t\177$C+,D+\n\200\031\320\r\200>:\200?;\200Y\200Z\200[\320\023i:\200?;\320\031\200o=0 255;\200p=\200q\200r\200g\200a;\200s=\200q\200r\200g\200o;\320\034\200\r\200\204(\200\vf:\200p);\200\020\200\205(f);\200\022;\200\r\200\206(\200\vf:\200s);\200\020\200\205(f);\200\022;\200\r\200\207(\200\vf:\200\202);\200\020\200\205(f);\200\022;\200~(\200\230,'TTY:','/O')\200\242.\200\243\320'\200\260:\200q\200f[\200\255]\200g\200\257;\200\261:\200f[\200\256]\200g\200\255;\200\262:\200\255;\200\263:\200\256;\200\264:\200\255;\200\265:\200\256;\200\020\200A\200\262+\000>\200\061\200C\200\245(\200\273,\200\061-\200\264);\200\022\320/\200\034\200y\200\302:\200{;\200\006\200I,\200E;\200\vk,l:0 255;m,n:\200a;g:\200\256;a:\200?;c:\200{;\200\020\200\262\030\060;\200\263\030\060;\200\261[0]\030\060;\250\303;\250\304;\200E:\200\022;\200\035\200\020\200\237;\200\241(\200\230,\000);\200\204(\200\314);\200\302\030\200\222;\200];\200\022\320\067\200\335\030\200\326;\200\337\030\060;\200\340\030\060;\200\341\030\060;\200\241(\200\334,\000)\320<\200\r\200\365(s:\200?);\200\vj:\200\255;\200\020\200A(s\035\200\263)\200B(s<256)\200C\200\361(s)\200\223\200\020j\030\200\261[s];\200Yj<\200\261[s+1]\200[\200\020\200\361(\200\254(\200\260[j]));\200U(j);\200\022;\200\022;\200\022;\320A\200\r\200\374(n:\200?);\200\vk:0 23;m:\200?;\200\020k\030\060;\200An<0\200C\200\020\200\357(\200\375);\200An>-100000000\200C\200W(n)\200\223\200\020m\030-1-n;n\030m\200\312\061\060;m\030(m\200D10)+1;k\030\061;\200Am<10\200C\200\336[0]\030m\200\223\200\020\200\336[0]\030\060;\200U(n);\200\022;\200\022;\200\022;\200\316\200\336[k]\030n\200D10;n\030n\200\312\061\060;\200U(k);\200\320n=0;\200\372(k);\200\022;\320F\200\r\201\a;\200\vj:\200\255;\200\020j\030\200\261[\200\263];\200Yj<\200\262\200[\200\020\200\357(\200\254(\200\260[j]));\200U(j);\200\022;\200\022;\320J\201\021\030\201\017;\201'[0]\030\000;\200\022\201'[5]\030\000;\201+\200\020\201.\030\064;\201*\320Q\200\r\201\066;\200\020\200^\200\026;\200\022;\320V\200\020\201\033\030\060;\201\021\030\201\f+c-\201V;\200\361(\201b);\200\355c\200g\201V:\200\020\200\370(\201c);\200V(\200\335);\200\022;\201W:\200\370(\201d);\201X:\200\370(\201e);\200\022;\200\361(\201f);\200\354;\200\233;\200];\200\022", '\000' <repeats 64610 times>

(gdb) p tokstart
$6 = {0, 0, 0, 0, 0, 0, 33, 7, 45, 11, 1, 34, 8, 46, 11, 1, 44, 35, 49, 13, 187, 49, 43, 180, 18, 192, 53, 49, 180, 22, 203, 55, 56, 210, 1228, 253, 135, 91, 238, 1235, 260, 401, 165,
  280, 1440, 274, 421, 183, 286, 1449, 276, 507, 188, 530, 1450, 277, 534, 244, 545, 1461, 301, 540, 272, 607, 1482, 433, 690, 360, 640, 1608, 449, 702, 392, 722, 1931, 658, 786, 415,
  730, 1939, 664, 794, 423, 736, 2040, 964, 1038, 525, 801, 2108, 1020, 1116, 715, 862, 2186, 1022, 1370, 776, 878, 2287, 1051, 1381, 784, 901, 2320, 1070, 1467, 794, 911, 2330, 1080,
  1477, 804, 915, 2339, 1089, 1486, 813, 924, 2348, 1119, 1499, 830, 1047, 2411, 1295, 1561, 925, 1129, 0 <repeats 9872 times>}

Here is the data in a more symbolic form. Hover over an index in the tok_start array (last row) to see the string it represents in the tok_mem arrays (first five rows).

And here are just the texts (text i for each i), without the complications of splitting into five arrays, and keeping track of start indices:

To make it even more readable, we can pair with the byte memory (the identifiers aka names), so that the “N@” and “M@” references above can be resolved. The “next” below refers to the text_link array.

If we recall that the name 0 refers to the unnamed module, we can now finally put together the “name” and “text” arrays, along with the pointers between them, namely: equiv points from names to (sometimes) text equivalents, and text_link points from one text to another (its sequel). This will make sense of most the data structures introduced in Part 5 (section 38), except we don’t need to care about the link and ilk arrays anymore because they are either an internal implementation detail for finding the number from the name, or contain only trivial information.

(gdb) p equiv
$1 = {0, 0, 0, 0, 0, 0, 0, 5, 0, 8, 0, 0, 7, 0, 9, 0, 2, 3, 4, 0, 6, 0, 1073741824, 1073742079, 0, 0, 0, 0, 0, 1073741824, 1073741837, 1073741951, 0, 0, 0, 0, 0, 15, 0, 0, 18, 21, 17, 0, 0, 1073741872, 0, 1073741921, 0, 19, 1073741918, 0, 0, 1073741858, 1073741856, 0, 1073741950, 0, 0, 0, 0, 0, 0, 51, 0, 22, 0, 0, 0, 1073741881, 0 <repeats 9931 times>}
(gdb) p textlink
$2 = {1, 16, 0, 0, 0, 10000, 0, 10000, 12, 10, 11, 14, 13, 20, 10000, 0, 10000, 0, 10000, 10000, 10000, 10000, 10000, 0 <repeats 9978 times>}

Let’s summarize what we have so far:

Here is a visualization of some of all of this, for the memory at the end of phase one of reading POOLTYPE.web.