Top Qs
Timeline
Chat
Perspective

Digital encoding of APL symbols

Code pages used specifically to write programs in the APL programming language From Wikipedia, the free encyclopedia

Remove ads

The programming language APL uses a number of symbols, rather than words from natural language, to identify operations, similarly to mathematical symbols. Prior to the wide adoption of Unicode, a number of special-purpose EBCDIC and non-EBCDIC code pages were used to represent the symbols required for writing APL.

Character sets

Summarize
Perspective

Due to its origins on IBM Selectric-based teleprinters, APL symbols have traditionally been represented on the wire using a unique, non-standard character set. In the 1960s and 1970s, few terminal devices existed which could reproduce them, the most popular ones being the IBM 2741 and IBM 1050 fitted with a specific APL print head. Over time, with the universal use of high-quality graphic display, printing devices and Unicode support, the APL character font problem has largely been eliminated.

Character repertoire

IBM assigns the following character IDs (GCGIDs) to APL syntax, which are used in the definitions of its code pages.[1][2][3]

More information GCGID, IBM name ...

EBCDIC code pages

Code page 293

Code page 293 (CCSID 293),[20] called "APL (USA)", is an EBCDIC code page which includes APL symbols, in addition to preserving the basic Latin letters and Western Arabic numerals at their usual EBCDIC locations.[17][18]

Code page 293[21][17][18]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x NUL SOH STX ETX SEL  HT  RNL DEL  GE  SPS RPT  VT   FF   CR   SO   SI  
1x DLE DC1 DC2 DC3 RES/
ENP
 NL    BS  POC CAN  EM  UBS CU1  IFS  IGS  IRS IUS/
ITB
2x  DS  SOS  FS  WUS BYP/
INP
 LF  ETB ESC  SA  SFE  SM/
SW
CSP MFA ENQ ACK BEL
3x SYN   IR   PP  TRN NBS EOT SBS   IT  RFF CU3 DC4 NAK SUB
4x  SP  𝐴̲ 𝐵̲ 𝐶̲ 𝐷̲ 𝐸̲ 𝐹̲ 𝐺̲ 𝐻̲ 𝐼̲ ¢ . < ( + |
5x & 𝐽̲ 𝐾̲ 𝐿̲ 𝑀̲ 𝑁̲ 𝑂̲ 𝑃̲ 𝑄̲ 𝑅̲ ! $ ⋆/* ) ; ¬
6x -/− / 𝑆̲ 𝑇̲ 𝑈̲ 𝑉̲ 𝑊̲ 𝑋̲ 𝑌̲ 𝑍̲ ¦ , % _ > ?
7x ⋄/◊/◆ ∧/⋀ ¨ ` :/∶ # @ ' = "
8x ∼/~ a b c d e f g h i
9x j k l m n o p q r
Ax ~ s t u v w x y z ∩/⋂ ∪/⋃ [
Bx ⍺/α ∊/ε/∈ ⍳/ι ⍴/ρ ⍵/ω × \/∖ ÷ ] ∣/│
Cx { A B C D E F G H I
Dx } J K L M N O P Q R !/ǃ
Ex \ S T U V W X Y Z
Fx 0 1 2 3 4 5 6 7 8 9  EO 
  Differences from Code page 37

Code page 310

Code page 310 ("Graphic Escape APL/TN") includes a larger gamut of symbols, but does not itself include the basic Latin letters or the basic digits.[22][4] It is used alongside Code page 37-2,[23] with the Code page 310 codes being prefixed by the Graphic Escape (EBCDIC 0x08)[24] control character.[6][25]

Code page 310 (prefixed with 0x08)[26][22][4][6][c]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x
1x
2x
3x
4x  SP  𝐴̲ 𝐵̲ 𝐶̲ 𝐷̲ 𝐸̲ 𝐹̲ 𝐺̲ 𝐻̲ 𝐼̲
5x 𝐽̲ 𝐾̲ 𝐿̲ 𝑀̲ 𝑁̲ 𝑂̲ 𝑃̲ 𝑄̲ 𝑅̲
6x 𝑆̲ 𝑇̲ 𝑈̲ 𝑉̲ 𝑊̲ 𝑋̲ 𝑌̲ 𝑍̲
7x ◊/⋄/◆ ∧/⋀ ¨
8x ∼/~ │/⎥
9x █/■ ⌑/¤ ±
Ax ¯/‾ ° ∙/• ∩/⋂ ∪/⋃ [
Bx ⍺/α ∊/∈/ε ⍳/ι ⍴/ρ ⍵/ω × ∖/\ ÷ ] ∣/│
Cx { ⁺/+ ■/∎ §
Dx } ⁻/- ǃ/!
Ex [d] [d] [d] [d]
Fx ¹ ² ³

Code page 351

Code page 351 ("GDDM Default (USA)")[27] contains most of the characters of Code page 293 and Code page 310 (except , epsilon with underline) in addition to the letters and digits, by replacing several control characters with symbols.

Code page 351[27]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x NUL {  HT   FF   CR 
1x  NL    BS 
2x }  LF  §
3x ¹ ² ³
4x  SP  𝐴̲ 𝐵̲ 𝐶̲ 𝐷̲ 𝐸̲ 𝐹̲ 𝐺̲ 𝐻̲ 𝐼̲ ¢ . < ( + |
5x & 𝐽̲ 𝐾̲ 𝐿̲ 𝑀̲ 𝑁̲ 𝑂̲ 𝑃̲ 𝑄̲ 𝑅̲ ! $ * ) ; ¬
6x - / 𝑆̲ 𝑇̲ 𝑈̲ 𝑉̲ 𝑊̲ 𝑋̲ 𝑌̲ 𝑍̲ ¦ , % _ > ?
7x ¨ ° ` : # @ ' = "
8x a b c d e f g h i
9x j k l m n o p q r ±
Ax ¯ ~ s t u v w x y z [
Bx ∈/∊ × ∖ / \ ÷ ]
Cx { A B C D E F G H I
Dx } J K L M N O P Q R ǃ/!
Ex \ S T U V W X Y Z
Fx 0 1 2 3 4 5 6 7 8 9

7-bit modified ASCII

Code page 371 (IR-68)

Code page 371,[28] registered for use with ISO/IEC 2022 as ISO-IR-68,[29][5] is a 7-bit heavily modified ASCII, designed by the APL Working Group of the Canadian Standards Association, intended for use with APL in an environment allowing overstriking of characters using the BS (backspace, 0x08) control code.[29][5]

8-bit modified and/or extended ASCII

Code page 907

Code page 907 is used by the IBM 3812, like code page 906.

Code page 907[9]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x
1x §
2x  SP  !/ǃ " # $ % & ' ( ) ⋆/* + , -/− . /
3x 0 1 2 3 4 5 6 7 8 9 :/∶ ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \/∖ ] ∧/⋀ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { ∣/│ } ∼/~
8x 𝐴̲ 𝐵̲ 𝐶̲ 𝐷̲ 𝐸̲ 𝐹̲ 𝐺̲ 𝐻̲ 𝐼̲ 𝐽̲ 𝐾̲ 𝐿̲ 𝑀̲ 𝑁̲ 𝑂̲ 𝑃̲
9x 𝑄̲ 𝑅̲ 𝑆̲ 𝑇̲ 𝑈̲ 𝑉̲ 𝑊̲ ¢ 𝑋̲
Ax 𝑌̲ 𝑍̲ ¬ ∪/⋃
Bx
Cx
Dx
Ex ⍺/α ß ⍴/ρ ⍳/ι ∊/ε/∈ ∩/⋂
Fx × ÷ ⍵/ω ¨ NBSP
  Differences from code page 437

Code page 909

Code page 909 is another encoding for APL, differing from code page 907 in not including the underlined characters, assigning different codes to the APL characters which fall in the 0xB0–DF range, and replacing some of the C0 replacement graphics from code page 437 with alternative encodings for certain APL symbols.

Code page 909[10]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x
1x §
2x  SP  !/ǃ " # $ % & ' ( ) ⋆/* + , -/− . /
3x 0 1 2 3 4 5 6 7 8 9 :/∶ ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \/∖ ] ∧/⋀ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { ∣/│ } ∼/~
8x Ç ü é â ä à å ç ê ë è ï î ì Ä Å
9x ô ö ò û ù Ö Ü £
Ax á í ó ú ñ Ñ ª º ¿ ¬ ∪/⋃ ¡
Bx
Cx
Dx ⋄/◊/◆
Ex ⍺/α ß ⍴/ρ ⍳/ι ∊/ε/∈ ∩/⋂
Fx × ÷ ⍵/ω ¨ NBSP
  Differences from code page 437

Code page 910

Code page 910 is similar to code page 909, but with fewer duplicate horizontal arrows, using the same C0 graphics as code page 437, and including some additional characters.

Code page 910[11]
0 1 2 3 4 5 6 7 8 9 A B C D E F
0x
1x §
2x  SP  !/ǃ " # $ % & ' ( ) ⋆/* + , -/− . /
3x 0 1 2 3 4 5 6 7 8 9 :/∶ ; < = > ?
4x @ A B C D E F G H I J K L M N O
5x P Q R S T U V W X Y Z [ \/∖ ] ∧/⋀ _
6x ` a b c d e f g h i j k l m n o
7x p q r s t u v w x y z { ∣/│ } ∼/~
8x Ç ü é â ä à å ç ê ë è ï î ì Ä Å
9x ô ö ò û ù Ö Ü ø £
Ax á í ó ú ñ Ñ ª º ¿ ¬ ½ ∪/⋃ ¡
Bx
Cx
Dx ⋄/◊/◆ ¦ Ì
Ex ⍺/α ß ⍴/ρ ⍳/ι ∊/ε/∈ ∩/⋂
Fx × ÷ ⍵/ω ¨ NBSP
  Differences from code page 437

Unicode

Most APL symbols are present in Unicode, in the Miscellaneous Technical range,[30] although some APL products may not yet feature Unicode, and some APL symbols may be unused or unavailable in a given vendor's implementation.

As of 2010, Unicode allows APL to be stored in text files, published in print and on the web, and shared through email and instant messaging. Entering APL characters still requires the use of either a specific input method editor or keyboard mapping, or of a specific touch interface. APL keyboard mappings are available for free for the most common operating systems, or can be obtained by adding the Unicode APL symbols to existing keyboard map.

Underscored alphabetic characters

Missing from Unicode are the traditional underscored alphabetic characters included in some of the APL code pages; their usage has been eliminated or deprecated in most APL implementations. These were produced on APL printing terminals by over-striking a straight capital letter with an underscore character. Some tables show them simulated with underlined and italic markup, not listing Unicode mappings.[4]

IBM assigns them GCGIDs as "LA480000" (which they name "A Line Below Capital/A Underscore (APL)"), "LB480000" ("B Line Below Capital/B Underscore (APL)") and so forth, under the "L" series used for Latin letters.[1] The use of an even number (48) rather than an odd number (47) is due to being uppercase: compare the use of SD110000 for a lone acute accent ´, LA110000 for the lowercase á, and LA120000 for the uppercase Á.[31] They are included in IBM's private use area scheme, encoded in reverse‑alphabetical order in the odd-numbered code points from U+F8BF to U+F8F1.[12]

Homologous uses of 47 include the "SD" (diacritic) series GCGID SD470000 for "Line Below/Discontinuous Underscore"[32]—i.e. macron below, distinct from the ASCII underscore which is SP090000 ("Underline/Continuous Underscore")[31]—and the "A" (Arabic letter) series GCGID AD470009 for the ḏāl,[33] for example. Unicode's Latin Extended Additional block includes the following capital "Line Below" characters with the macron below diacritic, for Semitic transcription (it includes a pre-composed ẖ only in lowercase):

  • U+1E06 LATIN CAPITAL LETTER B WITH LINE BELOW
  • U+1E0E LATIN CAPITAL LETTER D WITH LINE BELOW
  • U+1E34 LATIN CAPITAL LETTER K WITH LINE BELOW
  • U+1E3A LATIN CAPITAL LETTER L WITH LINE BELOW
  • U+1E48 LATIN CAPITAL LETTER N WITH LINE BELOW
  • U+1E5E LATIN CAPITAL LETTER R WITH LINE BELOW
  • U+1E6E LATIN CAPITAL LETTER T WITH LINE BELOW
  • U+1E94 LATIN CAPITAL LETTER Z WITH LINE BELOW

However, this does not cover the entire ISO basic Latin alphabet, and IBM's reference glyphs for the APL characters show them both underlined and oblique,[2] and tables simulating them with markup may follow suit.[4] Unicode's Mathematical Alphanumeric Symbols block includes italic characters for use in notations where they are contrastive with non-italic characters. Unicode also includes combining forms of the macron below and underscore in the Combining Diacritical Marks block; the characters above canonically decompose with the former:

  • U+0331 ̱ COMBINING MACRON BELOW
  • U+0332 ̲ COMBINING LOW LINE
Remove ads

Keyboard layout

Summarize
Perspective

Note the mnemonics associating an APL character with a letter: ? (question mark) on Q, (power) on P, ρ (rho) on R, (base value) on B, (eNcode) on N, (modulus) on M and so on. This makes it easier for an English-language speaker to type APL on a non-APL keyboard, providing one has visual feedback on one's screen. Also, decals have been produced for attachment to standard keyboards, either on the front of the keys or on the top of them.

Thumb
APL keyboard layout.[34]

Later IBM terminals, notably the IBM 3270 display stations, had an alternate keyboard arrangement which is the basis for some of the modern APL keyboard layouts in use today.

Further APL characters were available by overstriking one character with another. For example, the log symbol (⍟) was formed by overstriking ⇧ Shift+P with ⇧ Shift+O. This extended the graphic abilities of the earlier teleprinters, but made it more complex to correct errors and edit program lines.

New overstrikes were introduced by vendors as they produced versions of APL tailored to specific hardware, system features, file systems, and so on. Further, printing terminals and early APL cathode-ray terminals were able to display arbitrary overstrikes, but as personal computers rapidly replaced terminals as a data-entry device, APL character support became provided as an APL Character Generator ROM or a soft character set rendered by the display device. With the advent of the modern PC, APL characters were defined in specific fonts, eliminating the distinction between overstruck characters and standard characters.

Finally, the symbols were ratified in Unicode and given specific code points, with unambiguous interpretations, independently of the graphic font.

Remove ads

See also

Footnotes

  1. There are two naming conventions (which way around "up" and "down" are, and which way around "left" and "right" are) for tack characters, the "London" and "Bosworth" conventions.[13] Which convention is used differs between IBM and Unicode. Naming also differs between composite Unicode characters intended solely for APL (which match IBM naming and use the Bosworth convention) versus plain tacks also intended for other applications (which use the London convention).[13][14] APL specifications subsequently adopted the London convention.[14] The documentation for Dyalog APL notes that the Unicode naming for composite tacks (and thus the IBM naming for all tacks), which follows the lesser-used "Bosworth" convention,[13] runs contrary to convention in the APL community.[7]
  2. Unicode 1.0 had the "APL out" character at U+2301, but it was removed in Unicode 1.0.1.[15]
  3. Documented mappings vary.[4][6][26][16]
  4. Sharp extension.[6]
Remove ads

References

Loading content...
Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.

Remove ads