aboutsummaryrefslogtreecommitdiff
path: root/language/readme.md
blob: c889c9dd4967e2ddf9aafc2b18b92dabcd5e4a5e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Language

This directory contains files that provide an abstracted interface with the
database for looking up sentences ~and words~.

## Tags

All dictionary entries have tags. Tags are combined from term info, dictionary
info, and glossary info. Tags can have subcategories separated by `:`. A
separate tags table handles displaying tags for different display languages,
including abbreviated versions.

Tags that may alter behavior are stored as constants in [tags.ts](./tags.ts).
Dictionary importers should map the dictionary-specific version of these tags
to Yomikun's tags for compatibility. Other tags include:

|tag|description|
|-|-|
|`series:*`|abbreviated series name, e.g. "The Legend of Zelda" is `series:zelda`, and "Tears of the Kingdom" is `series:totk`. series with multiple entries should split the series and entry into separate tags, e.g. `series:zelda series:totk` instead of `series:zelda_totk`.
|`dict:*`|dictionary tag. e.g. `dict:jmdict_dutch` or `dict:daijisen`|
|`pitch:*`|`pitch:0` for 平板, `pitch:1` for 頭高, etc.
|`aux:*`|used for other tags (joyo kanji, commonly used term, usually kana, etc.)

### Behavior-altering tags

Some tag classes impact the parser's behavior. For example, the input text
「完了しました」 will be parsed as just 「完了」, but with the
`class:verb:suru-included` tag added by the parser. This is because the word
「完了」 has the tag `class:verb:suru` in the database, which allows the parser
to deconjugate a noun with the verb 「する」 back into the stem.

Other uses of this behavior include more accurate automatic kanji reading
generation, for example 「城」 b:ing read as 「じょう」 in 「ハイラル城」
because 「ハイラル」 has the tag `name:place` in the database, and
「城(じょう)」 has `class:suffix`, while 「城(しろ)」 has `class:noun`.

Yomikun encourages homebrew dictionary sharing, and encourages using
behavior-altering tags for fixing readings for cases like the above examples.
As another example of this, it is encouraged that a dictionary for (for
example) Zelda add 「トト」 as a term with tags `class:noun` and `name:place`,
instead of 「トト湖(こ)」 as an expression to fix the reading of the kanji
「湖(みずうみ)」.

If Yomikun doesn't generate the correct reading, and the reading isn't based on
natural language context (=a computer *could* accurately decide which reading
is correct based on other words/tags in the sentence), please submit a pull
request with the sentence and it's (expected) reading. An example of a
non-deterministic reading is 「何」 in the sentence 「何できた?」 which can be
read as both 「なん」 in which case 「何で」 turns into a single word, or
「なに」 where 「何」 is a regular word and 「で」 is particle.

[taekim]: https://guidetojapanese.org/learn/