Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
vitto4 authored Nov 29, 2024
0 parents commit 5da748d
Show file tree
Hide file tree
Showing 4 changed files with 27,449 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ignore/*
.vscode/*
*.code-workspace
67 changes: 67 additions & 0 deletions OPINIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@

# 🗳️ Opinions

- In each lesson, words are ordered like they are in the second edition. <br>
As such, words that are only present in the first edition are placed at the very end of the lesson, marked by the comment :
```yaml
# ------------------------------------ ED1 ----------------------------------- #
```
In such sections, words appear in the same order as they do in the first edition.
- In some cases, the same word is present in both the first and second editions, but with different translations. <br>
In such cases, I kept the version from the second edition. However, when it felt like both versions were helpful in understanding the word, I may have :
- Combined the two. This is indicated with comments using the syntax. <br>
```yaml
# !MOD KEPT <short description here>
```
Where `!MOD` stands for « modification ».
- Went « i'll just deal with it later 👍 » (spoiler : I didn't) <br>
```yaml
# ?ADD `<meaning present in edition one, that may be added to the one from edition two, or may sit in this comment forever>`
```
- Occasionally the books contain spelling or grammar mistakes. They may be rectified when they can be definitively categorized as mistakes, and there is an obvious correction to be made. In that case, it is indicated by a comment :
```yaml
# !MOD REPLACED : `<word with typo>` --> `<corrected version>`
```
- `Minna no Nihongo`-specific vocabulary (like `IMC/パワー電気/ブラジルエアー`) is included.
- Formatting rules were decided arbitrarily, I chose what looked or sounded best to me. If you find one of these rules is broken somewhere, don't hesitate to report it through an issue.
<div align="center">
<details>
<summary>The rules <sub><sup><sub><sup><sub><sup><sub><sup><sub><sup>are simple</sup></sub></sup></sub></sup></sub></sup></sub></sup></sub></summary>
<div align="left">

- When a word has no `kanji`, the field is set to null using a `~`.
- In fields `kanji` and `kana` :
- Regular spaces ` ` are used over ideographic spaces ` `. <br>
Example : `~から 来ました。`.
- Fullwidth parentheses `` and `` are used. No space should be inserted before nor after. <br>
Example : `だれ(どなた)`.
- Fullwidth square brackets `` and `` are used. No space should be inserted before nor after. <br>
Example : `[どうぞ]よろしく[お願いします]。`.
- Fullwidth tildes `` are also invited to the party, and they may be used in combination with spaces. <br>
Example : `この ~`, `撮ります[写真を~]`.
- Ideographic full stops `` and ideographic commas `` shall also be used ; as usual with no trailing space. <br>
Example : `じゃ、また[あした]。`.
- Fullwidth numbers are used over *basic latin* ones. <br>
Example : `1日` instead of `1 日`.
- Use the fullwidth solidus `` instead of the solidus `/`. <br>
Example : `さくら大学/富士大学`.
- In `meaning` :
- Prefer regular parentheses `(` and `)` as well as regular square brackets `[` and `]`, except when a string in Japanese is inserted. <br>
Example : `put on [glasses]` or `Pleased to meet you, too. (response to[どうぞ]よろしく[おねがいします]。)`. Notice the regular parentheses enclosing the Japanese part, with fullwidth square brackets inside.
- Use fullwidth tildes `` and fullwidth hyphen-minuses `` when necessary. <br>
Example : `I'm from ~ (country)`, `- years old`.
- Regular colons `:` (may be enclosed in spaces) are used instead of fullwidth colons ``. <br>
Example : `meeting, conference (~を します : hold a meeting)`.
- Regular solidi `/` are preferred. <br>
Example : `You're welcome./Don't mention it.`.
- Regular apostrophes `'` are preferred. <br>
Example : `That's good.` and `'The Seven Samurai', a classic movie by Akira Kurosawa`.
- Add spaces around French guillemets. <br>
Example : `froid (le temps «il fait froid»)` --> `froid (le temps « il fait froid »)`.
- For languages such as French that require a space before punctuation, replace `*:` by <code>&nbsp;:&nbsp;</code> but keep `*?` and `*!`. <br>
Example : `Oui?` is kept this way ; but edit `football (~を します:jouer au football)` into `football (~を します : jouer au football)`.
- When a meaning is missing from the database, it is set to `null`. <br>
Example : `fr: null,`.
</div>
</details>
</div>
145 changes: 145 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# MinnaNoDB
<p align="center">
<a href="https://yaml.org/">
<img alt="YouTube" src="https://img.shields.io/badge/YAML-CB171E?logo=yaml&logoColor=fff&style=flat-square"
/></a>
<a href="https://en.wikipedia.org/wiki/Japanese_language">
<img alt="Python" src="https://img.shields.io/badge/lang-%20%E6%97%A5%E6%9C%AC%E8%AA%9E-forestgreen?style=flat-square"
/></a>
<a href="https://github.com/vitto4/MinnaNoDB/releases">
<img alt="GitHub Release" src="https://img.shields.io/github/v/release/vitto4/MinnaNoDB?style=flat-square"
/></a>


</p>

<p align="center">Lightly-opinionated compendium of vocabulary from <code>Minna no Nihongo Shokyū Ⅰ & Ⅱ</code> books.</p>

<br>

<div id="figure-1"></div>

```yaml
- id: [2, 10]
edition: [1, 2]
kanji: "新聞"
kana: "しんぶん"
romaji: "shinbun"
meaning: {
en: "newspaper",
fr: "journal",
}
```
<p align="center"><sup><ins><i>Figure</i></ins> – A word straight from the database. More information on the format <a href="https://github.com/vitto4/MinnaNoDB/blob/main/minna-no-db.yaml#L92-L106">this way</a>.</sup></p>



## 🧭 Table of contents
1. [Overview](#-overview)
2. [Usage](#️-usage)
3. [Bibliography](#-bibliography)
4. [Opinions](#️-opinions)
5. [Notes](#-notes)
6. [Warning](#-warning)

## ☁ Overview

This project aims to serve as a comprehensive database of vocabulary for the `Minna no Nihongo Shokyū` series (`みんなの日本語 初級 Ⅰ & Ⅱ`). <br>
To be more specific, it intends to be as close as possible to the source material, in an effort to (hopefully) provide a foundation anyone can use or expand on.


The database currently targets two languages for `meaning` :
```yaml
languages:
en: "English"
fr: "Français"
```
<p align="center"><sup> Further information <a href="https://github.com/vitto4/MinnaNoDB/blob/main/minna-no-db.yaml#L25-L30">here</a>.</sup></p>
**Rōmaji** are provided solely for convenience, and do not correspond to those of the *rōmaji edition* of the books (`みんなの日本語 初級 ローマ字版`).

They were made using a mix of [`pykakasi`](https://pypi.org/project/pykakasi/) and readings supplied by [Google Translate](https://translate.google.com/). As a result, they more or less follow standards set by the [*Modified Hepburn*](https://en.wikipedia.org/wiki/Hepburn_romanization#Variants) system (yes, mācrōns inclūdēd !).

## ⚙️ Usage

Here is a basic example in [`python`](https://www.python.org/).

```py
import yaml
# Load the database
with open("minna-no-db.yaml", "r", encoding="utf-8") as f:
db = yaml.load(f, Loader=yaml.FullLoader)
# Extract the keys for all available lessons
lessons: list = [lesson["key"] for lesson in db["lessons"]] # ['lesson-01', 'lesson-02', ...]
# Go through each lesson and print out its contents
for key in lessons:
print(f"Contents of {key}") # Outputs : Contents of lesson-01
print(db[key]) # Outputs : [{'id': [1, 1], 'edition': [1, 2], 'kanji': None, 'kana': 'わたし', 'romaji': 'watashi', 'meaning': {'en': 'I', 'fr': 'je, moi'}}, ...]
```

## 📚 Bibliography

As you may know, `みんなの日本語 初級` comes in two books of twenty-five lessons each ; both available in two editions (the second of which is an updated version of the original).

Presented bellow is a table showing the books used in the making of the database.

| 📗📘📙 | First Edition | Second Edition |
|:-----:|:-------------:|:--------------:|
| **Book 1**<br>*English Version* | みんなの日本語初級Ⅰ 翻訳・文法解説英語版<br>[`ISBN : 9784883191079`](https://web.archive.org/web/20040820203739/http://www.3anet.co.jp/english/text_e_m_trans.html) | みんなの日本語初級Ⅰ 第2版 翻訳・文法解説 英語版<br>[`ISBN : 9784883196043`](https://www.3anet.co.jp/np/en/books/2302/) |
| **Book 1**<br>*French Version* | みんなの日本語初級Ⅰ 翻訳・文法解説フランス語版<br>[`ISBN : 9784883191338`](https://web.archive.org/web/20040820203739/http://www.3anet.co.jp/english/text_e_m_trans.html)| みんなの日本語初級Ⅰ 第2版 翻訳・文法解説 フランス語版<br>[`ISBN : 9784883196456`](https://www.3anet.co.jp/np/en/books/2312/) |
| **Book 2**<br>*English Version* | みんなの日本語初級Ⅱ 翻訳・文法解説英語版<br>[`ISBN : 9784883191086`](https://web.archive.org/web/20040820203739/http://www.3anet.co.jp/english/text_e_m_trans.html) | みんなの日本語初級Ⅱ 第2版 翻訳・文法解説 英語版<br>[`ISBN : 9784883196647`](https://www.3anet.co.jp/np/en/books/2402/) |
| **Book 2**<br>*French Version* | みんなの日本語初級Ⅱ 翻訳・文法解説フランス語版<br>[`ISBN : 9784883191383`](https://web.archive.org/web/20040820203739/http://www.3anet.co.jp/english/text_e_m_trans.html) | みんなの日本語初級Ⅱ 第2版 翻訳・文法解説 フランス語版<br>[`ISBN : 9784883197057`](https://www.3anet.co.jp/np/en/books/2412/) |


## 🗳️ Opinions

What I call an *opinion* is any rule I set while creating the database that is not directly derived from the source material.

<p align="center">
See
<a href="https://github.com/vitto4/MinnaNoDB/blob/main/OPINIONS.md">
<code>OPINIONS.md</code>
</a>
</p>

This file also includes general information about the structure of the database.

## 🔖 Notes

- When starting out with this project, I used [Paul Denisowski's vocabulary lists](http://www.denisowski.org/Japanese/Japanese.html) to generate a blank template for me to fill in. Serious time-saver right there !
- As strings in `romaji` do not need to be spellchecked, you may use the following config with [`CSpell`](https://cspell.org/).
```json
"cSpell.ignoreRegExpList": [
"/romaji:\\s*\"[^\"]*\"/gi"
]
```
- This should have shipped with the set of scripts I used to lint and validate the database. <br>
It didn't, but who knows, I may get to it when (if) I stop being obsessed with that one space dwarves simulator ¯\\\_(ツ)_/¯
- Adding or removing a word will alter the `id` of all subsequent words in the same lesson. <br>
Therefore, any time this has to be done, the version number will have to be bumped to the next major release as this could be considered breaking change for anyone using `id` as a [primary key](## "Which it intends to be, when it is in fact more of something I believed to be called a `natural key` and may thus be unstable.").
- « This must have taken quite the amount of time to make » well you don't say ! (笑) <br>
Though I think I'm happy with how it turned out c:
- I haven't yet managed to get my hands on a French version of the first edition of book 1, so words found exclusively in `Book 1, edition 1` have no French `meaning` for now.



## 🚧 Warning

```yaml
# * The selection of words and their respective translations are the sole property of 3A Corporation.
# This database and subsequent projects that depend on it shall only be used *in conjunction with* – and not *as a substitute for* – of the books ; so as to not cause any financial harm to the IP owners.
# * As per previous remarks, no commercial use of this file shall be admissible.
```
<p align="center"><sup> More <a href="https://github.com/vitto4/MinnaNoDB/blob/main/minna-no-db.yaml#L11-L13">here</a>.</sup></p>

The lack of license is deliberate, as I am uncertain about the appropriate licensing options for this project.
Content isn't mine, only the database structure and the actual work of filling it in. <br>
If you know a suitable option, feel free to open an issue !

Hopefully that doesn't stop anyone from using the database though.
Loading

0 comments on commit 5da748d

Please sign in to comment.