Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--passC:"-O3" with ARC and ORC miscompile slice on nested seqs #19597

Closed
tersec opened this issue Mar 8, 2022 · 3 comments
Closed

--passC:"-O3" with ARC and ORC miscompile slice on nested seqs #19597

tersec opened this issue Mar 8, 2022 · 3 comments

Comments

@tersec
Copy link
Contributor

tersec commented Mar 8, 2022

The function echo outputs the wrong string. Well, at least, that's one manifestation.

Example

This comes in some related varieties:

doAssert @[@["."]][0 .. ^1] == @[@["."]]
doAssert @[@["."]][0 .. 0] == @[@["."]]
doAssert @[@["."]][0 .. 0][0] == @["."]
let foo = @[@["."]]
let bar = foo[0 .. 0]
let baz = bar[0]
doAssert baz == @["."]

et cetera.
The key parts are (a) the nested sequences, and (b) so far, I've needed the inner scalar element to be a string to trigger this. If that element's there, and -O3 is used with:

$ gcc --version
gcc (Debian 11.2.0-18) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.

This bug triggers when compiled using nim c -r --passC:"-O3" --mm:orc nested_seq.nim.

Current Output

Tested with:

Nim Compiler Version 1.6.4 [Linux: amd64]
Compiled at 2022-02-10
Copyright (c) 2006-2021 by Andreas Rumpf

active boot switches: -d:release

and

Nim Compiler Version 1.7.1 [Linux: amd64]
Compiled at 2022-03-08
Copyright (c) 2006-2022 by Andreas Rumpf

git hash: 645447293851749fcc3394cd387d7070d8a9c735
active boot switches: -d:release

One gets (depending on exactly which assertion statement variation one chooses, but they're fundamentally the same):

`@[@["."]][0 .. 0][0] == @["."]`  [AssertionDefect]

One can alternatively see it this way, surveying the cartesian product of (debug, release) x (O2, O3) x (refc, arc, orc):

$ Nim/bin/nim c -r --hints:off -d:debug   --passC:"-O2" --mm:refc --eval:'echo @[@["."]][0 .. 0]'
@[@["."]]
$ Nim/bin/nim c -r --hints:off -d:debug   --passC:"-O2" --mm:arc  --eval:'echo @[@["."]][0 .. 0]'
@[@["."]]
$ Nim/bin/nim c -r --hints:off -d:debug   --passC:"-O2" --mm:orc  --eval:'echo @[@["."]][0 .. 0]'
@[@["."]]
$ Nim/bin/nim c -r --hints:off -d:debug   --passC:"-O3" --mm:refc --eval:'echo @[@["."]][0 .. 0]'
@[@["."]]
$ Nim/bin/nim c -r --hints:off -d:debug   --passC:"-O3" --mm:arc  --eval:'echo @[@["."]][0 .. 0]'
@[@[""]]
$ Nim/bin/nim c -r --hints:off -d:debug   --passC:"-O3" --mm:orc  --eval:'echo @[@["."]][0 .. 0]'
@[@[""]]
$ Nim/bin/nim c -r --hints:off -d:release --passC:"-O2" --mm:refc --eval:'echo @[@["."]][0 .. 0]'
@[@["."]]
$ Nim/bin/nim c -r --hints:off -d:release --passC:"-O2" --mm:arc  --eval:'echo @[@["."]][0 .. 0]'
@[@["."]]
$ Nim/bin/nim c -r --hints:off -d:release --passC:"-O2" --mm:orc  --eval:'echo @[@["."]][0 .. 0]'
@[@["."]]
$ Nim/bin/nim c -r --hints:off -d:release --passC:"-O3" --mm:refc --eval:'echo @[@["."]][0 .. 0]'
@[@["."]]
$ Nim/bin/nim c -r --hints:off -d:release --passC:"-O3" --mm:arc  --eval:'echo @[@["."]][0 .. 0]'
@[@["."]]
$ Nim/bin/nim c -r --hints:off -d:release --passC:"-O3" --mm:orc  --eval:'echo @[@["."]][0 .. 0]'
@[@["."]]

The necessary conditions within that space are -O3, -d:debug, and ARC or ORC.

Expected Output

Running successfully, without asserting.

Possible Solution

Don't use ARC or ORC, or don't use --passC:"-O3" (even -O2 avoids triggering it, and -O2 with all the options from https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-O3 which supposedly transform -O2 into -O3 don't trigger it; so far, I've needed to specify literally -O3 to trigger this) without doing so implicitly via -d:release. The default refc memory manager also does not exhibit this problem.

Debugging this, the relevant difference in behavior between -O2 and -O3 seems to come from nimAsgnStrV2 which gets embedded into stdlib_system.nim not correctly copying the string of length 1, but rather not copying the string at all. This appears to be the key differentiator between memory managers which trigger this bug and which don't, that the ones which use nimAsgnStrV2 to effectively move the string, as the

		(*a).len = b.len;
		(*a).p = b.p;

from

N_LIB_PRIVATE N_NIMCALL(void, nimAsgnStrV2)(NimStringV2* a, NimStringV2 b) {
{	{
		if (!((*a).p == b.p)) goto LA3_;
		goto BeforeRet_;
	}
	LA3_: ;
	{
		NIM_BOOL T7_;
		T7_ = (NIM_BOOL)0;
		T7_ = (b.p == ((NimStrPayload*) NIM_NIL));
		if (T7_) goto LA8_;
		T7_ = ((NI)((*b.p).cap & ((NI) IL64(4611686018427387904))) == ((NI) IL64(4611686018427387904)));
		LA8_: ;
		if (!T7_) goto LA9_;
		{
			NIM_BOOL T13_;
			T13_ = (NIM_BOOL)0;
			T13_ = ((*a).p == ((NimStrPayload*) NIM_NIL));
			if (T13_) goto LA14_;
			T13_ = ((NI)((*(*a).p).cap & ((NI) IL64(4611686018427387904))) == ((NI) IL64(4611686018427387904)));
			LA14_: ;
			if (!!(T13_)) goto LA15_;
			dealloc(((void*) ((*a).p)));
		}
		LA15_: ;
		(*a).len = b.len;
		(*a).p = b.p;
	}
	goto LA5_;
	LA9_: ;
	{
		NI TM__Q5wkpxktOdTGvlSRo9bzt9aw_156;
		{
			NIM_BOOL T20_;
			NIM_BOOL T21_;
			NI TM__Q5wkpxktOdTGvlSRo9bzt9aw_154;
			NI TM__Q5wkpxktOdTGvlSRo9bzt9aw_155;
			void* T32_;
			T20_ = (NIM_BOOL)0;
			T21_ = (NIM_BOOL)0;
			T21_ = ((*a).p == ((NimStrPayload*) NIM_NIL));
			if (T21_) goto LA22_;
			T21_ = ((NI)((*(*a).p).cap & ((NI) IL64(4611686018427387904))) == ((NI) IL64(4611686018427387904)));
			LA22_: ;
			T20_ = T21_;
			if (T20_) goto LA23_;
			T20_ = ((NI)((*(*a).p).cap & ((NI) IL64(-4611686018427387905))) < b.len);
			LA23_: ;
			if (!T20_) goto LA24_;
			{
				NIM_BOOL T28_;
				T28_ = (NIM_BOOL)0;
				T28_ = ((*a).p == ((NimStrPayload*) NIM_NIL));
				if (T28_) goto LA29_;
				T28_ = ((NI)((*(*a).p).cap & ((NI) IL64(4611686018427387904))) == ((NI) IL64(4611686018427387904)));
				LA29_: ;
				if (!!(T28_)) goto LA30_;
				dealloc(((void*) ((*a).p)));
			}
			LA30_: ;
			if (nimAddInt(b.len, ((NI) 1), &TM__Q5wkpxktOdTGvlSRo9bzt9aw_154)) { raiseOverflow(); goto BeforeRet_;
};
			if (nimAddInt((NI)(TM__Q5wkpxktOdTGvlSRo9bzt9aw_154), ((NI) 8), &TM__Q5wkpxktOdTGvlSRo9bzt9aw_155)) { raiseOverflow(); goto BeforeRet_;
};
			if (((NI)(TM__Q5wkpxktOdTGvlSRo9bzt9aw_155)) < ((NI) 0) || ((NI)(TM__Q5wkpxktOdTGvlSRo9bzt9aw_155)) > ((NI) IL64(9223372036854775807))){ raiseRangeErrorI((NI)(TM__Q5wkpxktOdTGvlSRo9bzt9aw_155), ((NI) 0), ((NI) IL64(9223372036854775807))); goto BeforeRet_;
}
			T32_ = (void*)0;
			T32_ = alloc0Impl__system_1727(((NI) ((NI)(TM__Q5wkpxktOdTGvlSRo9bzt9aw_155))));
			(*a).p = ((NimStrPayload*) (T32_));
			(*(*a).p).cap = b.len;
		}
		LA24_: ;
		(*a).len = b.len;
		if (nimAddInt(b.len, ((NI) 1), &TM__Q5wkpxktOdTGvlSRo9bzt9aw_156)) { raiseOverflow(); goto BeforeRet_;
};
		if (((NI)(TM__Q5wkpxktOdTGvlSRo9bzt9aw_156)) < ((NI) 0) || ((NI)(TM__Q5wkpxktOdTGvlSRo9bzt9aw_156)) > ((NI) IL64(9223372036854775807))){ raiseRangeErrorI((NI)(TM__Q5wkpxktOdTGvlSRo9bzt9aw_156), ((NI) 0), ((NI) IL64(9223372036854775807))); goto BeforeRet_;
}
		copyMem__system_1709(((void*) ((&(*(*a).p).data[((NI) 0)]))), ((void*) ((&(*b.p).data[((NI) 0)]))), ((NI) ((NI)(TM__Q5wkpxktOdTGvlSRo9bzt9aw_156))));
	}
	LA5_: ;
	}BeforeRet_: ;
}

does not execute properly, leaving the string length at 0.

Additional Information

Nim Compiler Version 1.2.16 [Linux: amd64]
Compiled at 2022-03-07
Copyright (c) 2006-2020 by Andreas Rumpf

git hash: c6a9f27b3e36bae9aacec1bd6c37893fb98fd33f
active boot switches: -d:release

has a different problem for both --gc:arc and --gc:orc:

Error: internal error: '=copy' operator not found for type array[0..0, seq[string]]

This provides a lower bound on relevant versions. It needs Nim 1.4 to even compile enough to play with this.

@tersec
Copy link
Contributor Author

tersec commented Mar 20, 2022

The strings were a red herring:

$ nim c -r --hints:off -d:debug --mm:orc  --passC:'-O3'  --eval:'if [@[5]][0..0][0][0] == 5: echo "no bug" else: echo "bug"'
bug
$ nim c -r --hints:off -d:debug --mm:orc  --passC:'-O3 -fno-strict-aliasing'  --eval:'if [@[5]][0..0][0][0] == 5: echo "no bug" else: echo "bug"'
no bug

Rather, it it's collateral damage from setLen(seq), combined with -d:debug --passC:-O3 not also setting -fno-strict-aliasing, and:

N_LIB_PRIVATE N_NIMCALL(void, setLen__cmdfile_71)(tySequence__PSP8snSsRoFs9cDiya9bd7UQ* s, NI newlen) {
{	{
		NI T3_;
		T3_ = (*s).len;
		if (!(((NI) (newlen)) < T3_)) goto LA4_;
		shrink__cmdfile_67(s, newlen);
	}
	goto LA1_;
	LA4_: ;
	{
		NI oldLen;
		NI T7_;
		tyObject_NimSeqV2__k6A3SDA9bxp56RSL9bm9aXxvg* xu;
		T7_ = (*s).len;
		oldLen = T7_;
		{
			if (!(((NI) (newlen)) <= oldLen)) goto LA10_;
			goto BeforeRet_;
		}
		LA10_: ;
		xu = ((tyObject_NimSeqV2__k6A3SDA9bxp56RSL9bm9aXxvg*) (s));
		{
			NIM_BOOL T14_;
			NI TM__Q5wkpxktOdTGvlSRo9bzt9aw_155;
			void* T18_;
			T14_ = (NIM_BOOL)0;
			T14_ = ((*xu).p == ((tyObject_NimSeqPayload__0rETQpdPeHWsgle1jKnljg*) NIM_NIL));
			if (T14_) goto LA15_;
			T14_ = ((*(*xu).p).cap < ((NI) (newlen)));
			LA15_: ;
			if (!T14_) goto LA16_;
			if (nimSubInt(((NI) (newlen)), oldLen, &TM__Q5wkpxktOdTGvlSRo9bzt9aw_155)) { raiseOverflow(); goto BeforeRet_;
};
			T18_ = (void*)0;
			T18_ = prepareSeqAdd(oldLen, ((void*) ((*xu).p)), (NI)(TM__Q5wkpxktOdTGvlSRo9bzt9aw_155), ((NI) 16), ((NI) 8));
			(*xu).p = ((tyObject_NimSeqPayload__0rETQpdPeHWsgle1jKnljg*) (T18_));
		}
		LA16_: ;
		(*xu).len = ((NI) (newlen));
	}
	LA1_: ;
	}BeforeRet_: ;
}

and in particular

xu = ((tyObject_NimSeqV2__k6A3SDA9bxp56RSL9bm9aXxvg*) (s));

Which appear to be code generated from

Nim/lib/system/seqs_v2.nim

Lines 117 to 127 in 731eabc

proc setLen[T](s: var seq[T], newlen: Natural) =
{.noSideEffect.}:
if newlen < s.len:
shrink(s, newlen)
else:
let oldLen = s.len
if newlen <= oldLen: return
var xu = cast[ptr NimSeqV2[T]](addr s)
if xu.p == nil or xu.p.cap < newlen:
xu.p = cast[typeof(xu.p)](prepareSeqAdd(oldLen, xu.p, newlen - oldLen, sizeof(T), alignof(T)))
xu.len = newlen

Working around the type-punning/aliasing cast in the generated C code fixes this. It also doesn't seem to trigger for me so far on older gcc versions (e.g., 10.x) or any clang version.

It needs some compilers settings which will generate that seqs_v2.nim setLen rather than setLengthSeqV2 (as refc appears to do). It's not obvious that this specific issue relates per se to the garbage collector or memory model, but the non-ARC/ORC codegen generates the sequence types differently enough to apparently avoid triggering this:

Nim/lib/system/sysstr.nim

Lines 331 to 369 in 8ccde68

proc setLengthSeqV2(s: PGenericSeq, typ: PNimType, newLen: int): PGenericSeq {.
compilerRtl.} =
sysAssert typ.kind == tySequence, "setLengthSeqV2: type is not a seq"
if s == nil:
result = cast[PGenericSeq](newSeq(typ, newLen))
else:
when defined(nimIncrSeqV3):
let elemSize = typ.base.size
let elemAlign = typ.base.align
if s.space < newLen:
let r = max(resize(s.space), newLen)
result = cast[PGenericSeq](newSeq(typ, r))
copyMem(dataPointer(result, elemAlign), dataPointer(s, elemAlign), s.len * elemSize)
# since we steal the content from 's', it's crucial to set s's len to 0.
s.len = 0
elif newLen < s.len:
result = s
# we need to decref here, otherwise the GC leaks!
when not defined(boehmGC) and not defined(nogc) and
not defined(gcMarkAndSweep) and not defined(gogc) and
not defined(gcRegions):
if ntfNoRefs notin typ.base.flags:
for i in newLen..result.len-1:
forAllChildrenAux(dataPointer(result, elemAlign, elemSize, i),
extGetCellType(result).base, waZctDecRef)
# XXX: zeroing out the memory can still result in crashes if a wiped-out
# cell is aliased by another pointer (ie proc parameter or a let variable).
# This is a tough problem, because even if we don't zeroMem here, in the
# presence of user defined destructors, the user will expect the cell to be
# "destroyed" thus creating the same problem. We can destroy the cell in the
# finalizer of the sequence, but this makes destruction non-deterministic.
zeroMem(dataPointer(result, elemAlign, elemSize, newLen), (result.len-%newLen) *% elemSize)
else:
result = s
zeroMem(dataPointer(result, elemAlign, elemSize, result.len), (newLen-%result.len) *% elemSize)
result.len = newLen
else:
result = setLengthSeq(s, typ.base.size, newLen)

@bung87
Copy link
Collaborator

bung87 commented Nov 9, 2022

can't reproduce in d8d0832, neither --mm:arc nor --mm:orc
gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0

@tersec
Copy link
Contributor Author

tersec commented Nov 9, 2022

Agree, can reproduce with commit 645447293851749fcc3394cd387d7070d8a9c735 (as documented here) and gcc (Debian 12.2.0-9) 12.2.0, but not current devel. Bisecting, the last devel commit exhibiting this for me with this gcc version (Debian clang version 14.0.6-7 doesn't exhibit this in any commit I tested it with) is 6cf0727, while f89ba2c does not seem to exhibit this, so #20480 appears to have fixed this issue.

@tersec tersec closed this as completed Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants