diff options
-rw-r--r-- | README.md | 3 | ||||
-rw-r--r-- | docs/making-yomitan-dictionaries.md | 97 |
2 files changed, 99 insertions, 1 deletions
@@ -33,9 +33,10 @@ Yomitan provides powerful features not available in other browser-based dictiona ## Helpful information - ✨ [Migrating from Yomichan (legacy)](./docs/yomichan-migration.md) -- Installation and Setup +- Dictionaries - 📚 [Recommended Dictionaries](./docs/dictionaries.md#recommended-dictionaries) - 🗂️ [Importing and exporting dictionaries](./docs/dictionaries.md#importing-a-dictionary-collection) + - 🛠️ [Making Yomitan Dictionaries](./docs/making-yomitan-dictionaries.md) - Anki Integration - 📇 [Anki integration and flashcard creation guide](./docs/anki-integration.md) - 🔧 [Anki handlebar templates](./docs/templates.md) diff --git a/docs/making-yomitan-dictionaries.md b/docs/making-yomitan-dictionaries.md new file mode 100644 index 00000000..79769f4f --- /dev/null +++ b/docs/making-yomitan-dictionaries.md @@ -0,0 +1,97 @@ +# Making Yomitan Dictionaries <!-- omit in toc --> + +This document provides an overview on how to create your own Yomitan dictionary. + +- [Tools](#tools) +- [Read the Schemas](#read-the-schemas) +- [Packaging A Dictionary](#packaging-a-dictionary) +- [Examples](#examples) +- [Schema Validation](#schema-validation) +- [Conjugation](#conjugation) +- [Tag Categories](#tag-categories) + +## Tools + +- [Yomichan Dictionary Builder](https://github.com/MarvNC/yomichan-dict-builder/): A node package that simplifies the process of making dictionaries, particularly useful for those using TypeScript or JavaScript. +- [hasUTF16SurrogatePairAt](https://www.npmjs.com/package/@stdlib/assert-has-utf16-surrogate-pair-at): Important for checking if a kanji/hanzi is a surrogate pair, which affects string operations in JavaScript. +- [japanese-furigana-normalize](https://github.com/MarvNC/japanese-furigana-normalize): A utility function to normalize Japanese readings containing furigana, ensuring proper alignment with kanji characters. + +## Read the Schemas + +Familiarity with the [Yomitan schemas](https://github.com/themoeway/yomitan/tree/master/ext/data/schemas) is essential. These schemas define the structure of Yomitan dictionaries. Helpful resources for interpreting JSON schemas include [codebeautify](https://codebeautify.org/jsonviewer/), [json-schema-viewer](https://json-schema-viewer.vercel.app/), and [jsonhero](https://jsonhero.io/). + +Below is a list of Yomitan dictionary schemas, their expected filenames, and their usage: + +| Schema | Expected Filename | Usage | +| ---------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------- | -------------------------------------------------------------------------------------- | +| [`dictionary-index-schema.json`](https://github.com/themoeway/yomitan/tree/master/ext/data/schemas/dictionary-index-schema.json) | `index.json` | Metadata about the dictionary. Please include as much detail as possible. | +| [`dictionary-kanji-bank-v3-schema.json`](https://github.com/themoeway/yomitan/tree/master/ext/data/schemas/dictionary-kanji-bank-v3-schema.json) | `kanji_bank_${number}.json` | Information used in the kanji viewer - meanings, readings, statistics, and codepoints. | +| [`dictionary-kanji-meta-bank-v3-schema.json`](https://github.com/themoeway/yomitan/tree/master/ext/data/schemas/dictionary-kanji-meta-bank-v3-schema.json) | `kanji_meta-bank_${number}.json` | Stores kanji frequency data. | +| [`dictionary-tag-bank-v3-schema.json`](https://github.com/themoeway/yomitan/tree/master/ext/data/schemas/dictionary-tag-bank-v3-schema.json) | `tag_bank_${number}.json` | Defines tags for kanji and term dictionaries, like parts of speech or kanken level. | +| [`dictionary-term-bank-v3-schema.json`](https://github.com/themoeway/yomitan/tree/master/ext/data/schemas/dictionary-term-bank-v3-schema.json) | `term_bank_${number}.json` | Stores dictionary readings, definitions, etc. | +| [`dictionary-term-meta-bank-v3-schema.json`](https://github.com/themoeway/yomitan/tree/master/ext/data/schemas/dictionary-term-meta-bank-v3-schema.json) | `term_meta_bank_${number}.json` | Stores meta information about terms, such as frequency data and pitch accent data. | + +## Packaging A Dictionary + +A dictionary can contain various types of information within the zip file. After creating an `index.json` and the relevant data files, zip them with all data `.json` files in the root directory of the zip, not in subfolders. Use the highest compression level possible to reduce the size. + +## Examples + +- The [official test dictionary](https://github.com/themoeway/yomitan/tree/master/test/data/dictionaries/valid-dictionary1) showcases the full range of features available in Yomitan dictionaries. + +## Schema Validation + +To validate schemas, configure [VSCode](https://code.visualstudio.com/docs/languages/json#_json-schemas-and-settings) to validate schemas or use a website such as [jsonschemavalidator](https://www.jsonschemavalidator.net/). + +For VSCode validation, use the following settings JSON: + +```json + "json.schemas": [ + { + "fileMatch": ["kanji_bank_*.json"], + "url": "https://github.com/themoeway/yomitan/raw/master/ext/data/schemas/dictionary-kanji-bank-v3-schema.json" + }, + { + "fileMatch": ["kanji_meta_bank_*.json"], + "url": "https://github.com/themoeway/yomitan/raw/master/ext/data/schemas/dictionary-kanji-meta-bank-v3-schema.json" + }, + { + "fileMatch": ["tag_bank_*.json"], + "url": "https://github.com/themoeway/yomitan/raw/master/ext/data/schemas/dictionary-tag-bank-v3-schema.json" + }, + { + "fileMatch": ["term_bank_*.json"], + "url": "https://github.com/themoeway/yomitan/raw/master/ext/data/schemas/dictionary-term-bank-v3-schema.json" + }, + { + "fileMatch": ["term_meta_bank_*.json"], + "url": "https://github.com/themoeway/yomitan/raw/master/ext/data/schemas/dictionary-term-meta-bank-v3-schema.json" + } + ], +``` + +## Conjugation + +For Yomitan to conjugate Japanese terms, they need the appropriate part of speech tag. The part of speech labels are documented on the [official JMDict page](http://www.edrdg.org/jmdictdb/cgi-bin/edhelp.py?svc=jmdict&sid=#kw_pos). For other languages, find the part of speech tags in `ext/js/language/{language}/{language}-transforms.js` under the `conditions` label, for labels that aren't prefixed with "Intermediate". + +## Tag Categories + +The second item in the array of the tag bank schema determines the tag category, affecting the tag color in the user interface. The categories include: + +- name +- expression +- popular +- frequent +- archaism +- dictionary +- frequency +- partOfSpeech +- search +- pronunciation-dictionary +- search + +You can view the tag colors [here](https://github.com/themoeway/yomitan/blob/48f1d012ad5045319d4e492dfbefa39da92817b2/ext/css/display.css#L136-L149). + +# Community Contributions + +If you have any questions, need help, or want to share a new dictionary, feel free to pop in the [Yomitan Discord server](/README.md#yomitan). We're happy to help you get started! |