aboutsummaryrefslogtreecommitdiff
path: root/search/readme.md
blob: 400c8ce85b34f510d5293a1673fdab58a87f3b3c (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Search

This directory contains files that provide an abstracted interface with the
database for looking up sentences and words.

## Tags

All dictionary entries have tags. Tags are combined from term info, dictionary
info, and glossary info. Tags can have subcategories separated by `:`. A
separate tags table handles displaying tags for different display languages,
including abbreviated versions.

Tags that may alter behavior are stored as constants in [tags.ts](./tags.ts).
Dictionary importers should map the dictionary-specific version of these tags
to Yomikun's tags for compatibility. Other tags include:

|tag|description|
|-|-|
|`series:*`|abbreviated series name, e.g. "The Legend of Zelda" is `series:zelda`, and "Tears of the Kingdom" is `series:totk`. series with multiple entries should split the series and entry into separate tags, e.g. `series:zelda series:totk` instead of `series:zelda_totk`.
|`dict:*`|dictionary tag. e.g. `dict:jmdict_dutch` or `dict:daijisen`|
|`pitch:*`|`pitch:0` for 平板, `pitch:1` for 頭高, etc.
|`aux:*`|used for other tags (joyo kanji, commonly used term, usually kana, etc.)

### Behavior-altering tags

Some tag classes impact the parser's behavior. For example, the input text
「完了しました」 will be parsed as just 「完了」, but with the
`class:verb:suru-included` tag added by the parser. This is because the word
「完了」 has the tag `class:verb:suru` in the database, which allows the parser
to deconjugate a noun with the verb 「する」 back into the stem.

Other uses of this behavior include more accurate automatic kanji reading
generation, for example 「城」 being read as 「じょう」 in 「ハイラル城」
because 「ハイラル」 has the tag `name:place` in the database, and
「城(じょう)」 has `class:suffix`, while 「城(しろ)」 has `class:noun`.

Yomikun encourages homebrew dictionary sharing, and encourages using
behavior-altering tags for fixing readings for cases like the above examples.
As another example of this, it is encouraged that a dictionary for (for
example) Zelda add 「トト」 as a term with tags `class:noun` and `name:place`,
instead of 「トト湖(こ)」 as an expression to fix the reading of the kanji
「湖(みずうみ)」.

If Yomikun doesn't generate the correct reading, and the reading isn't based on
natural language context (=a computer *could* accurately decide which reading
is correct based on other words/tags in the sentence), please submit a pull
request with the sentence and its (expected) reading. An example of a
non-deterministic reading is 「何」 in the sentence 「何できた?」 which can be
read as both 「なん」 in which case 「何で」 turns into a single word, or
「なに」 where 「何」 is a regular word and 「で」 is particle.

[taekim]: https://guidetojapanese.org/learn/