diff options
Diffstat (limited to 'search/readme.md')
-rw-r--r-- | search/readme.md | 53 |
1 files changed, 53 insertions, 0 deletions
diff --git a/search/readme.md b/search/readme.md new file mode 100644 index 0000000..400c8ce --- /dev/null +++ b/search/readme.md @@ -0,0 +1,53 @@ +# Search + +This directory contains files that provide an abstracted interface with the +database for looking up sentences and words. + +## Tags + +All dictionary entries have tags. Tags are combined from term info, dictionary +info, and glossary info. Tags can have subcategories separated by `:`. A +separate tags table handles displaying tags for different display languages, +including abbreviated versions. + +Tags that may alter behavior are stored as constants in [tags.ts](./tags.ts). +Dictionary importers should map the dictionary-specific version of these tags +to Yomikun's tags for compatibility. Other tags include: + +|tag|description| +|-|-| +|`series:*`|abbreviated series name, e.g. "The Legend of Zelda" is `series:zelda`, and "Tears of the Kingdom" is `series:totk`. series with multiple entries should split the series and entry into separate tags, e.g. `series:zelda series:totk` instead of `series:zelda_totk`. +|`dict:*`|dictionary tag. e.g. `dict:jmdict_dutch` or `dict:daijisen`| +|`pitch:*`|`pitch:0` for 平板, `pitch:1` for 頭高, etc. +|`aux:*`|used for other tags (joyo kanji, commonly used term, usually kana, etc.) + +### Behavior-altering tags + +Some tag classes impact the parser's behavior. For example, the input text +「完了しました」 will be parsed as just 「完了」, but with the +`class:verb:suru-included` tag added by the parser. This is because the word +「完了」 has the tag `class:verb:suru` in the database, which allows the parser +to deconjugate a noun with the verb 「する」 back into the stem. + +Other uses of this behavior include more accurate automatic kanji reading +generation, for example 「城」 being read as 「じょう」 in 「ハイラル城」 +because 「ハイラル」 has the tag `name:place` in the database, and +「城(じょう)」 has `class:suffix`, while 「城(しろ)」 has `class:noun`. + +Yomikun encourages homebrew dictionary sharing, and encourages using +behavior-altering tags for fixing readings for cases like the above examples. +As another example of this, it is encouraged that a dictionary for (for +example) Zelda add 「トト」 as a term with tags `class:noun` and `name:place`, +instead of 「トト湖(こ)」 as an expression to fix the reading of the kanji +「湖(みずうみ)」. + +If Yomikun doesn't generate the correct reading, and the reading isn't based on +natural language context (=a computer *could* accurately decide which reading +is correct based on other words/tags in the sentence), please submit a pull +request with the sentence and its (expected) reading. An example of a +non-deterministic reading is 「何」 in the sentence 「何できた?」 which can be +read as both 「なん」 in which case 「何で」 turns into a single word, or +「なに」 where 「何」 is a regular word and 「で」 is particle. + +[taekim]: https://guidetojapanese.org/learn/ + |