-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Felix (latest) build error on Linux #175
Comments
So: the bootstrap build has worked successfully. However I cannot see the exact commands, which are in the file
I think. I also cannot see the commands for the second build, these can be exposed by
Both build processes just provide a summary of each step, rather than the exact shell command. So in the second build we get two different errors when trying to link:
As far as I can tell this is a bug in the linker. The linker thinks it should be linking a shared library OR an executable that depends on a shared library. But it's SUPPOSED to be doing a full static link and all the objects file SHOULD have been compiled for that. Every one of them ends in the suffix
which means precisely compiled WITHOUT -fPIC for static linkage. Now the thing is, the fbuild (bootstrap) build successfully linked bootflx (which is renamed Now, plugin support just loads shared libraries. This uses So at the moment it looks like gcc, ld, or something in LInux is broken. It can be easily fixed by simply building even the static objects with -fPIC but the whole point of having static objects at all is to avoid precisely that. It's POSSIBLE that the error is in the Felix build code but very unlikely because it has worked before, and, the compiles are done with a wildcarded Felix script that builds everything the same way. The difference in error messages is almost certainly because of this: some modules have no dependencies, other does. So the reloc 11 error is when there are no dependencies, and the reloc X86 error is when there are. The fact that EVERY object file gives an error suggests it its the linker command that is wrong: either the wrong switches given by the Felix build script, OR, the linker itself is broken. Linkage by default should be static. However .. there is one issue here: the C library on most platforms is a shared library END OF STORY. There is no static link shared library any more. Linux is one of the archaic holdouts on this one. It's possible modern Linux has removed static link C libraries (because they prevent upgrading the system!) MacOS doesn't support static link system libraries: not just the C library either, almost everything in a Framework on a Mac is dynamic link, and dynamic link ONLY. Similarly Windows. Everything's a DLL. So the bottom line is this: Step 1: set
Step 2: delete the build artefacts
Step 3: rebuild it (sorry!) You need to somehow save the output. Both stderr and stdout. Step 4: Find the linker commands used to build bootflx and flx. The two commands may be different. The first one works, the second doesn't. I need to see both linker commands to see if I got the switches right. Based on the advice of the error message it is possible to patch it so that the compiles are done with -fPIC even when they should not need to be. The actual compiler toolchains are in this file: https://github.com/felix-lang/felix/blob/master/src/packages/toolchain.fdoc around line 935 you will see the Felix code that is used to launch g++ for compilation and a bit later for linkage. |
So I'm now building Felix on Ubuntu using GitHub Workflows! Hopefully this will help identify the problem. |
ok everything builds with clang++ I've fixed the script finally to use g++ instead |
https://github.com/felix-lang/felix/actions/runs/3337443837/jobs/5523792114 This is using g++ version 9.4 and it builds just fine. I wonder why that works and your build does not? |
I am not sure why, but I do have the result of a fresh build from scratch.
|
maybe build/release/fbuild.log This is always produced. It is unrelated to FLX_SHELL_ECHO. |
It seems to have worked! |
Oh, that's great. It must have been the artefacts that hurt the build. Thanks for the support. |
Just to explain the build process: 1a. A program written in Python, called fbuild, is used to build a bootstrap version of Felix.
This contains more detailed information than shown on the console. Looking at the file you posted, fbuild decided to use clang++, not g++, as the compiler for this phase. The main result of this process is to build 1b. Now, the 1c. Now bootflx is renamed to flx, and is used to build
Now if the repository is changed you can try
and it rebuild the system starting at phase 2. If the ONLY changes are to test code, or most Occasionally a clean rebuild from phase 1 is required. There are other targets in the GNUmakefile:
is one that can be run every now and then. There will be a
at some stage to rebuild and check all the Rosetta tests. Finally if you say
you will get a list of switches and environment variables. This one:
causes ANY Felix program that calls the shell to display the text of the shell call. Of course
You can see By default, Felix generates shared libraries. If you want an executable:
You may notice above
See? The -MM steps calculate dependencies but there is no Felix compile, there is no C++ compile, and there is no link! Felix just runs the executable. Everything is cached. In fact notice WHERE the executable is .. it's in the cache too. |
When i attempt to run programs with
|
AH. I think I know what that is. Actually all of the test cases you had failed. I didn't notice. Hmm. I think the problem is that the linker needs |
|
Hmmm:
dlibs are for shared lib builds and slibs for static link. But -lpthread is there in both. It doesn't on MacOS because it's not required on MacOS. |
yeah that error is on EVERY test case in your build log. Same error. Here's the build of the Felix pthread library:
Note there is no -lpthread! I mean if anything needed the C pthread library it would be the Felix pthread library ! |
Just by the by .. Linux linker is TOTALLY SCREWED. The above is the proof. When linking shared libraries it silently ignore unresolved symbols. You don't get an error until you actually load the library and it finds there is a missing dependency. That's so utterly stupid it's unbelievable. If you're linking an executable, it tells you if there's a missing symbol. The argument is, when you link against a shared library, at load time it could be a different library so why both reporting an error which hasn't happened yet? ARRRGGGG. |
Yep .. missing switch:
Now the question is how did the CI build work .. perhaps it didn't lol ... |
But here is a test case as an example:
See? That one has -lpthread. |
Can you try this:
That's on MacOS. On Linux is should be this file:
If it's NOT, try this:
Also you should be set up to use the Felix in build/release so DELETE THE INSTALLED FELIX
and check
I'm not sure if AND you may need this too:
with obvious changes ... |
I modified LD_LIBRARY_PATH to add those directories, but there was no change. Should i rebuild again after that change? Here are the command outputs, they all seem to be in order: (i have no felix installed in /usr)
|
You should probably build again from scratch. I just don't understand what's happening considering it builds on Ubuntu with g++ version 9 on the GitHub CI server. You could try:
which avoids the bootstrap. Note PATH needs to include build/release/host/bin, and LD_LIBRARY_PATH has to include build/release/host/lib/rtl. The Felix binaries are in the first directory, and the shared libraries in the second. When you run Now the thing is, when you run say
it uses dynamic linkage (generates a shared library) which is loaded and run by
and the linkage machinery SHOULD be setting The way dynamic linkage works in Felix is using two level namespaces. What this means is that if a shared library A depends on B, then A is linked to B. Now if a shared library X depends on A, it is linked to A but it is NOT linked to B. In other words, So what's happening is some library .. and I don't know which one .. which depends on the |
well, both rebuild and a fresh build go into errors.
|
What do you mean "go into errors"? The repl requires a Felix program. For example
It reloads definitions every time. It does not work very well. This is the complete source code of the repl:
|
ah, makes sense.
$ rlwrap flx --repl
print$ (1,2,3);
/home/razetime/Software/felix/build/release/host/bin/flx_run: symbol
lookup error: /home/razetime/Software/felix/build/release/host/lib/rtl/libflx_gc_dynamic.so:
undefined symbol: pthread_create
same error.
…On 11/1/22, John Skaller ***@***.***> wrote:
What do you mean "go into errors"?
The repl requires a Felix program. For example
```
~/felix>flx --repl
> println$ 1,2,3;
(1, 2, 3)
```
It reloads definitions every time. It does not work very well. This is the
complete source code of the repl:
```
fun startlib (x:string) =
{
return x in RE2(" *(fun|proc|var|val|gen|union|struct|typedef).*\n");
}
// MOVE LATER!
proc repl()
{
nextline:>
print "> "; fflush stdout;
var text = readln stdin;
if feof(stdin) return;
if startlib(text) goto morelibrary;
goto executable;
morelibrary:>
print ".. "; fflush stdout;
var more = readln stdin;
if feof(stdin) return;
if more == "\n" goto saveit;
text += more;
goto morelibrary;
saveit:>
var dlibrary = load("library.flx");
dlibrary += text;
save("library.flx",dlibrary);
goto nextline;
executable:>
var session = load("session.flx");
session += text;
save ("session.flx", session);
dlibrary = load("library.flx");
var torun = dlibrary + text;
save ("cmd.flx", torun);
}
...
end
elif control*.REPL_MODE do
begin
again:>
repl();
if not feof (stdin) do
var dvars =
FlxDepvars::cal_depvars(toolchain_maker,c_compiler_executable,
cxx_compiler_executable, *config,control, *loopctl);
var pe = processing_env(toolchain_maker,c_compiler_executable,
cxx_compiler_executable, *config,*control,dvars);
result = pe.runit(ehandler);
goto again;
else
println$ "Bye!";
// TOP LEVEL REPL, OK
System::exit 0;
done
end
```
--
Reply to this email directly or view it on GitHub:
#175 (comment)
You are receiving this because you modified the open/close state.
Message ID: ***@***.***>
|
GOT IT I think! It's a bug in C++ thread library construction.
This is part of the GC. It is creating a thread using the C++ std::thread class. In any case the fix is trivial, just link the GC with -lpthread to work around the bug. I'll have a go at a patch. |
Reopening. Stupid github decided a reference to this in a message should close it. Assigning to razetime. |
Felix (as of the latest commit in 26th October) throws a build error when installed.
OS: Ubuntu 20.04.5 LTS x86_64
Python version: 3.8.10
OCaml version: 4.14.0
g++ version: 9.4.0
Full command output: gist
Previous discussion has been done in the Felix mail group, and in #174.
The text was updated successfully, but these errors were encountered: