Fascicle 1 of the ARIB STD-B62 standard, published in 2014, defines Unicode mappings for a selection of the B24 extended characters (excluding, for example, those duplicated by JIS X 0213), as well as a few extended Kanji.[5] It also includes a mapping of utilised characters outside the Basic Multilingual Plane to the BMP's private use area.
The ARIB STD B24 standard defines multiple character sets and a method of switching between them. These include a Kanji set (an extension of JIS X 0208), an Alphanumeric set, a Hiragana set, Katakana sets of two distinct layouts and four mosaic sets.[6] The sets are selected using ISO 2022 mechanisms for 94-sets, using the following codes (proportional sets use the same layout as the corresponding non-proportional ones):[7]
Set
Type
Code (column/line)
Code (hexadecimal)
Code (ASCII character)
Comments
Kanji
2-byte
4/2
42
B
The escape code B used for the ARIB Kanji set[7] is used for the 1983 version of JIS C 6226 (JIS X 0208, of which the ARIB Kanji set is an extension) in ISO-2022-JP.[8][9]
Alphanumeric
1-byte
4/10
4A
J
JIS_C6220-ro (ISO646-JP, JIS X 0201 Roman set). Similar to ASCII, with two assignments differing. Escape code J matches usage in ISO-2022-JP.[9]
Proportional alphanumeric
1-byte
3/6
36
6
Hiragana
1-byte
3/0
30
0
Hiragana themselves follow the same layout as row 4 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation.
Proportional Hiragana
1-byte
3/7
37
7
Katakana
1-byte
3/1
31
1
Katakana themselves follow the same layout as row 5 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation.
Non-spacing pseudographics (ISO-IR-71 subset with separated mosaic blocks)
Mosaic D
1-byte
3/5
35
5
Non-spacing pseudographics
Code charts
Kanji (double-byte) set
This is a double-byte character set extending JIS X 0208.
Lead byte
The encoding bytes correspond to the row or cell number plus 0x20, or 32 in decimal (see below). Hence, the code set starting with 0x21 has a row number of 1, and its cell 1 has a continuation byte of 0x21 (or 33), and so forth. Most of the code corresponds to JIS X 0208.
Character set 0x75–0x76 (row numbers 85–86, additional kanji)
This part is the source standard for a small number of CJK Unified Ideographs in Unicode, where it is designated with the JARIB- source prefix in the Unihan database.[10]
ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x75)[11]
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
2x
㐂 3402
𠅘 20158
份 4EFD
仿 4EFF
侚 4F9A
俉 4FC9
傜 509C
儞 511E
冼 51BC
㔟 351F
匇 5307
卡 5361
卬 536C
詹 8A79
𠮷 20BB7
3x
呍 544D
咖 5496
咜 549C
咩 54A9
唎 550E
啊 554A
噲 5672
囤 56E4
圳 5733
圴 5734
塚 FA10
墀 5880
姤 59E4
娣 5A23
婕 5A55
寬 5BEC
4x
﨑 FA11
㟢 37E2
庬 5EAC
弴 5F34
彅 5F45
德 5FB7
怗 6017
恵 FA6B
愰 6130
昤 6624
曈 66C8
曙 66D9
曺 66FA
曻 66FB
桒 6852
鿄 9FC4
5x
椑 6911
椻 693B
橅 6A45
檑 6A91
櫛 6ADB
𣏌 233CC
𣏾 233FE
𣗄 235C4
毱 6BF1
泠 6CE0
洮 6D2E
海 FA45
涿 6DBF
淊 6DCA
淸 6DF8
渚 FA46
6x
潞 6F5E
濹 6FF9
灤 7064
𤋮 FA6C
𤋮 242EE
煇 7147
燁 71C1
爀 7200
玟 739F
玨 73A8
珉 73C9
珖 73D6
琛 741B
琡 7421
琢 FA4A
琦 7426
7x
琪 742A
琬 742C
琹 7439
瑋 744B
㻚 3EDA
畵 7575
疁 7581
睲 7772
䂓 4093
磈 78C8
磠 78E0
祇 7947
禮 79AE
鿆 9FC6
䄃 4103
ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x76)[11]
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
2x
鿅 9FC5
秚 79DA
稞 7A1E
筿 7B7F
簱 7C31
䉤 4264
綋 7D8B
羡 7FA1
脘 8118
脺 813A
舘 FA6D
芮 82AE
葛 845B
蓜 84DC
蓬 84EC
3x
蕙 8559
藎 85CE
蝕 8755
蟬 87EC
蠋 880B
裵 88F5
角 89D2
諶 8AF6
跎 8DCE
辻 8FBB
迶 8FF6
郝 90DD
鄧 9127
鄭 912D
醲 91B2
鈳 9233
4x
銈 9288
錡 9321
鍈 9348
閒 9592
雞 96DE
餃 9903
饀 9940
髙 9AD9
鯖 9BD6
鷗 9DD7
麴 9EB4
麵 9EB5
5x
6x
7x
Character set 0x7A (row number 90, traffic symbols)
Characters 90-45 through 90-63 and 90-66 through 90-84 (shown below shaded) are listed in the B24 standard only in table 7-10 (the list of extension characters), and are also the only characters in rows 90 through 91 which are not transport-related symbols; this is noted in the B24 standard in an endnote to table 7-10.[12] The remainder of the extensions are listed in both table 7-4 (the double-byte code chart) and table 7-10.[12]
ARIB STD-B24 Kanji (double-byte) set (prefixed with 0x7A)[5][13]
Most of ARIB STD-B24 Mosaic Set D does not exist in Unicode.
Shift_JIS variant
In addition to the modified ISO 2022 encoding, the B24 standard also specifies a Shift JIS encoding following JIS X 0208:1997, but with the addition of the extended characters in the kanji set.[1]
First byte
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
0
␀
␁
␂
␃
␄
␅
␆
␇
␈
␉
␊
␋
␌
␍
␎
␏
1
␐
␑
␒
␓
␔
␕
␖
␗
␘
␙
␚
␛
␜
␝
␞
␟
2
␠
!
"
#
$
%
&
'
(
)
*
+
,
-
.
/
3
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
4
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
5
P
Q
R
S
T
U
V
W
X
Y
Z
[
¥
]
^
_
6
`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
7
p
q
r
s
t
u
v
w
x
y
z
{
|
}
‾
␡
8
9
A
。
「
」
、
・
ヲ
ァ
ィ
ゥ
ェ
ォ
ャ
ュ
ョ
ッ
B
ー
ア
イ
ウ
エ
オ
カ
キ
ク
ケ
コ
サ
シ
ス
セ
ソ
C
タ
チ
ツ
テ
ト
ナ
ニ
ヌ
ネ
ノ
ハ
ヒ
フ
ヘ
ホ
マ
D
ミ
ム
メ
モ
ヤ
ユ
ヨ
ラ
リ
ル
レ
ロ
ワ
ン
゙
゚
E
F
Second byte
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
Non printable ASCII character
Unaltered ASCII character
Modified ASCII character
Single-byte half-width katakana
First byte of a double-byte character, used by JIS X 0208
First byte of an ARIB extended character
Not used as first byte, unallocated space in JIS X 0208
Not used as first byte
Second byte of a double-byte character whose first half of the JIS sequence was odd
Second byte of a double-byte character whose first half of the JIS sequence was even
^Glossed as "temple" (i.e. Buddhist temple) in B24 table 7-10 (the list of extension characters).
^ abcdefSmall form (70% size per code chart / table 7-10) of a kanji character. Shown here simulated. Private Use Area code points shown are those used by the Nishiki-teki font.[15]
^ abcdefghijklmnopqrstuvwxyzaaabacadMusical abbreviation (or half thereof) not present in Unicode, simulated here with multiple characters. Private Use Area code points shown are those used by the Nishiki-teki font.