Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
miau1 committed Feb 3, 2025
1 parent 531357f commit c3ef1f8
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 36 deletions.
65 changes: 33 additions & 32 deletions opustools_pkg/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -397,41 +397,42 @@ All aboard the OPUS Express! Create test/dev/train sets from OPUS data.
### Usage

```
usage: opus_cat [-h] -d DIRECTORY -l LANGUAGE [-i] [-m MAXIMUM] [-p]
[-f FILE_NAME] [-r RELEASE] [-pa]
[-sa SET_ATTRIBUTE [SET_ATTRIBUTE ...]]
usage: opus_cat [-h] -d DIRECTORY -l LANGUAGE [-i] [-m MAXIMUM]
[-pp {raw,xml}] [-p] [-f FILE_NAME]
[-r RELEASE] [-pa] [-sa SET_ATTRIBUTE [SET_ATTRIBUTE ...]]
[-ca CHANGE_ANNOTATION_DELIMITER] [-rd path_to_dir]
[-dl DOWNLOAD_DIR]
```

arguments:

```
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Corpus name
-l LANGUAGE, --language LANGUAGE
Language
-i, --no_ids Print without ids when using -p
-m MAXIMUM, --maximum MAXIMUM
Maximum number of sentences
-p, --plain Print in plain txt
-f FILE_NAME, --file_name FILE_NAME
File name (if not given, prints all files)
-r RELEASE, --release RELEASE
Release (default=latest)
-pa, --print_annotations
Print annotations, if they exist
-sa SET_ATTRIBUTE [SET_ATTRIBUTE ...], --set_attribute SET_ATTRIBUTE [SET_ATTRIBUTE ...]
Set sentence annotation attributes to be printed, e.g.
-sa pos lem. To print all available attributes use -sa
all_attrs (default=pos,lem)
-ca CHANGE_ANNOTATION_DELIMITER, --change_annotation_delimiter CHANGE_ANNOTATION_DELIMITER
Change annotation delimiter (default=|)
-rd path_to_dir, --root_directory path_to_dir
Change root directory (default=/proj/nlpl/data/OPUS)
-dl DOWNLOAD_DIR, --download_dir DOWNLOAD_DIR
Set download directory (default=current directory)
-h, --help show this help message and exit
-d DIRECTORY, --directory DIRECTORY
Corpus name
-l LANGUAGE, --language LANGUAGE
Language
-i, --no_ids Print without ids when using -p
-m MAXIMUM, --maximum MAXIMUM
Maximum number of sentences
-pp {raw,xml}, --preprocess {raw,xml}
Preprocess-type (raw, xml, default=xml)
-p, --plain Print in plain txt
-f FILE_NAME, --file_name FILE_NAME
File name (if not given, prints all files)
-r RELEASE, --release RELEASE
Release (default=latest)
-pa, --print_annotations
Print annotations, if they exist
-sa SET_ATTRIBUTE [SET_ATTRIBUTE ...], --set_attribute SET_ATTRIBUTE [SET_ATTRIBUTE ...]
Set sentence annotation attributes to be printed, e.g. -sa pos lem.
To print all available attributes use -sa all_attrs (default=pos,lem)
-ca CHANGE_ANNOTATION_DELIMITER, --change_annotation_delimiter CHANGE_ANNOTATION_DELIMITER
Change annotation delimiter (default=|)
-rd path_to_dir, --root_directory path_to_dir
Change root directory (default=/projappl/nlpl/data/OPUS)
-dl DOWNLOAD_DIR, --download_dir DOWNLOAD_DIR
Set download directory (default=current directory)
```

### Description
Expand Down Expand Up @@ -519,7 +520,7 @@ Download files from OPUS
List available files in RF corpus for en-sv language pair:

```
opus_get --directory RF --source en --target sv --list
opus_get --directory RF --source en --target sv --list_resources
```

Download RF corpus for en-sv:
Expand All @@ -537,19 +538,19 @@ opus_get --directory RF --source en --target sv --download_dir RF_files
List all files in RF that include English:

```
opus_get --directory RF --source en --list
opus_get --directory RF --source en --list_resources
```

List all files for all language pairs in RF:

```
opus_get --directory RF --list
opus_get --directory RF --list_resources
```

List all en-sv files in the whole OPUS:

```
opus_get --source en --target sv --list
opus_get --source en --target sv --list_resources
```

Find available target languages for English in RF:
Expand Down
6 changes: 3 additions & 3 deletions opustools_pkg/bin/opus_cat
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ parser.add_argument('-m', '--maximum',
help='Maximum number of sentences', default=-2, type=int)
parser.add_argument('-pp', '--preprocess',
help='Preprocess-type (raw, xml, default=xml)',
default='xml', choices=['raw', 'xml', 'parsed', 'moses'])
default='xml', choices=['raw', 'xml'])
parser.add_argument('-p', '--plain', help='Print in plain txt',
action='store_true')
parser.add_argument('-f', '--file_name',
Expand All @@ -33,9 +33,9 @@ parser.add_argument('-ca', '--change_annotation_delimiter',
help='Change annotation delimiter (default=|)',
default='|')
parser.add_argument('-rd', '--root_directory',
help='Change root directory (default=/proj/nlpl/data/OPUS)',
help='Change root directory (default=/projappl/nlpl/data/OPUS)',
metavar='path_to_dir',
default='/proj/nlpl/data/OPUS')
default='/projappl/nlpl/data/OPUS')
parser.add_argument('-dl', '--download_dir',
help='Set download directory (default=current directory)',
default='.')
Expand Down
2 changes: 1 addition & 1 deletion opustools_pkg/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

setuptools.setup(
name="opustools",
version="1.6.2",
version="1.7.0",
author="Mikko Aulamo",
author_email="[email protected]",
description="Tools to read OPUS",
Expand Down

0 comments on commit c3ef1f8

Please sign in to comment.