1 files changed, 150 insertions, 0 deletions
diff --git a/todo.md b/todo.md
new file mode 100644
index 0000000..6840877
--- /dev/null
+++ b/todo.md
@@ -0,0 +1,150 @@
+# generic list of concrete todo items that don't need further consideration
+
+## 0.0.1 (standalone API)
+
+- [x] working proof of concept sentence lookup using deno/sqlite3
+- [ ] port dictionaries for more advanced testing
+    - [x] JMdict (WIP)
+    - [ ] JMNedict
+- [x] add more deinflections to db/deinflections.sql
+- [x] set up unit tests for sentence reading generation
+- [x] port server-internal API to simple HTTP JSON API
+- [ ] [improve DB schema](#how-to-store-multiple-readingswritings-in-db)
+- [ ] finish [API examples](examples/readme.md)
+- [ ] remove makefiles for database initialization
+- [ ] add separate kanji readings/info table
+- [ ] add separate frequency dictionary
+- [ ] complete documentation
+- [ ] add code formatter config
+- [ ] ~replace .sql script files with typescript sql query generation library~ ([the problem](https://www.reddit.com/r/Deno/comments/ss6568/alternative_to_knexjs_on_deno/))
+
+## 0.1.0 (front-end UI)
+
+- [ ] create primitive search page ui
+
+## always
+
+- [ ] improve sentence parser accuracy
+    - [ ] have the parser recursively explore N shorter terms at each word
+      found and rank resulting possible sentences (by frequency?)
+    - [ ] use domain-specific tags in reading tests (create domain-specific
+      dictionaries first)
+    - [ ] normalize dictionary before import
+        - [ ] remove "baked" combinations of word + suffix
+        - [ ] automatically create combinations of kanji replaced by kana as
+          alternate writings
+    - [ ] add more deinflections for casual speech and other colloquialisms
+
+# how to store multiple readings/writings in DB
+
+## idea 1
+
+positives:
+- allows multiple alternate readings/writings for terms
+- easy to find primary reading or writing for a term
+- efficiently stores kana-only words
+- allows parser to parse alternatively written words (currently requires manual
+  typescript intervention to resolve `alt` field back to actual term to get
+  it's tags)
+
+negatives:
+- ~creates duplicates in `text` column for readings of terms with different
+  kanji but the same reading~
+  
+  I consider this a non-issue because this simplifies the sentence lookup
+  query. The alternative (a term\<-\>reading/writing reference table) would
+  save disk space in exchange for processing time and complexity.
+- ~unclear how to reference a specific word without using it's `term_id` (which
+  can vary from user to user when different dictionaries are installed), or
+  *what identifies a unique term in this case?*~
+  
+  `user.sort_overlay` needs to be able to uniquely identify a `term_id`, but
+  also needs to be in-/exportable by users with different imported dictionaries
+  (ideally with minimal failure).
+  
+  things to consider:
+  
+  options:
+  - ~just use (primary) writing only~
+    
+    this doesn't work for terms with multiple readings to distinguish between
+    meanings, e.g.
+    <ruby>人気<rt>ひとけ</rt></ruby>/<ruby>人気<rt>にんき</rt></ruby>
+  - ~identify as "term with text X and another text Y"~
+    
+    this feels like a janky solution but is what is currently being used, where
+    X is always the default way of writing and Y the default reading
+  - directly reference `term_id` in `user.sort_overlay` and convert to matching
+    all known readings/writings at time of export/import
+    
+    good:
+    
+    - faster `user.sort_overlay` lookups
+    - still allows user preference import/exporting
+    
+    bad:
+    
+    - ~all data in `user.db` becomes useless when `dict.db` is lost or corrupted~
+      
+      `user.sort_overlay` will be moved to `dict.db`, and `user.db` will only
+      be used for storing (mostly portable) user preferences and identifiers
+      (username, id, etc.).
+    - importing/exporting will take longer and require more complicated sql code
+  
+
+### example tables
+
+#### readwritings (should have better name)
+
+(indexes from LSB)  
+`flags[0]` = primary writing  
+`flags[1]` = primary reading
+
+|`id`|`term_id`|`text`|`flags`|
+|-|-|-|-|
+|1|1|繰り返す|1|
+|2|1|くり返す|0|
+|3|1|繰返す|0|
+|4|1|繰りかえす|0|
+|5|1|くりかえす|2|
+|6|2|変える|1|
+|7|2|かえる|2|
+|8|3|帰る|1|
+|9|3|かえる|2|
+|10|4|にやにや|3|
+
+# how/where to deal with irregular readings
+
+WIP
+
+ideally one way of storing reading exceptions for:
+
+- 来る + する (conjugation-dependent)
+- 入る as (はいる) or (いる) (not sure about this one?)
+- counters (counter type + amount specific)
+- numbers (exceptions for certain powers of 10)
+
+# way to link expressions to a list of conjugated terms
+
+WIP
+
+this may be useful for dictionaries that provide meanings for expressions but
+don't provide readings for those expressions? (新和英大辞典 has some of these)
+
+examples:
+- 村長選 -> 村長 + 選[suf]
+- 花より団子 -> 花 + より[grammar] + 団子
+
+# random thoughts
+
+this project has 0 planning so here's a list of things that may eventually need
+some thought
+
+- how can a series-specific dictionary also 'encourage' the use of another
+  domain-specific category? (e.g. anime about programming makes computer domain
+  specific terms rank slightly higher or something?)
+- maybe have a mode in the front-end that captures preedit text from a user
+  typing japanese text to infer readings of kanji, or rank different terms
+  slightly higher? (using [compositionupdate
+  events](https://developer.mozilla.org/en-US/docs/Web/API/Element/compositionupdate_event))
+