TextLoadUnify buggy without "--unify" #115

phikoehn · 2019-12-20T00:09:58Z

When not using the "--unify" option, there still will be an index of unique items built in TextLoadUnify that is based on unique lines. This messes up (at least) the "--score" option of scoring sentence pairs.

So, this:

for line in fin:
    new_ind = len(sent2ind)
    inds.append(sent2ind.setdefault(line, new_ind))
    if args.unify:
        if inds[-1] == new_ind:
            sents.append(line[:-1])
            nu += 1
    else:
        sents.append(line[:-1])
        nu += 1

should be changed to:

for line in fin:
    if args.unify:
        new_ind = len(sent2ind)
        inds.append(sent2ind.setdefault(line, new_ind))
        if inds[-1] == new_ind:
            sents.append(line[:-1])
            nu += 1
    else:
        sents.append(line[:-1])
        inds.append( nu )
        nu += 1

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TextLoadUnify buggy without "--unify" #115

TextLoadUnify buggy without "--unify" #115

phikoehn commented Dec 20, 2019

TextLoadUnify buggy without "--unify" #115

TextLoadUnify buggy without "--unify" #115

Comments

phikoehn commented Dec 20, 2019