aboutsummaryrefslogtreecommitdiff
path: root/search/readme.md
diff options
context:
space:
mode:
Diffstat (limited to 'search/readme.md')
-rw-r--r--search/readme.md53
1 files changed, 53 insertions, 0 deletions
diff --git a/search/readme.md b/search/readme.md
new file mode 100644
index 0000000..400c8ce
--- /dev/null
+++ b/search/readme.md
@@ -0,0 +1,53 @@
+# Search
+
+This directory contains files that provide an abstracted interface with the
+database for looking up sentences and words.
+
+## Tags
+
+All dictionary entries have tags. Tags are combined from term info, dictionary
+info, and glossary info. Tags can have subcategories separated by `:`. A
+separate tags table handles displaying tags for different display languages,
+including abbreviated versions.
+
+Tags that may alter behavior are stored as constants in [tags.ts](./tags.ts).
+Dictionary importers should map the dictionary-specific version of these tags
+to Yomikun's tags for compatibility. Other tags include:
+
+|tag|description|
+|-|-|
+|`series:*`|abbreviated series name, e.g. "The Legend of Zelda" is `series:zelda`, and "Tears of the Kingdom" is `series:totk`. series with multiple entries should split the series and entry into separate tags, e.g. `series:zelda series:totk` instead of `series:zelda_totk`.
+|`dict:*`|dictionary tag. e.g. `dict:jmdict_dutch` or `dict:daijisen`|
+|`pitch:*`|`pitch:0` for 平板, `pitch:1` for 頭高, etc.
+|`aux:*`|used for other tags (joyo kanji, commonly used term, usually kana, etc.)
+
+### Behavior-altering tags
+
+Some tag classes impact the parser's behavior. For example, the input text
+「完了しました」 will be parsed as just 「完了」, but with the
+`class:verb:suru-included` tag added by the parser. This is because the word
+「完了」 has the tag `class:verb:suru` in the database, which allows the parser
+to deconjugate a noun with the verb 「する」 back into the stem.
+
+Other uses of this behavior include more accurate automatic kanji reading
+generation, for example 「城」 being read as 「じょう」 in 「ハイラル城」
+because 「ハイラル」 has the tag `name:place` in the database, and
+「城(じょう)」 has `class:suffix`, while 「城(しろ)」 has `class:noun`.
+
+Yomikun encourages homebrew dictionary sharing, and encourages using
+behavior-altering tags for fixing readings for cases like the above examples.
+As another example of this, it is encouraged that a dictionary for (for
+example) Zelda add 「トト」 as a term with tags `class:noun` and `name:place`,
+instead of 「トト湖(こ)」 as an expression to fix the reading of the kanji
+「湖(みずうみ)」.
+
+If Yomikun doesn't generate the correct reading, and the reading isn't based on
+natural language context (=a computer *could* accurately decide which reading
+is correct based on other words/tags in the sentence), please submit a pull
+request with the sentence and its (expected) reading. An example of a
+non-deterministic reading is 「何」 in the sentence 「何できた?」 which can be
+read as both 「なん」 in which case 「何で」 turns into a single word, or
+「なに」 where 「何」 is a regular word and 「で」 is particle.
+
+[taekim]: https://guidetojapanese.org/learn/
+