forked from fabiensanglard/dc
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcc.php
1259 lines (907 loc) · 37.5 KB
/
cc.php
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?php include 'header.php';?>
<h1>The Compiler (3/5)</h1>
<a class="arrow" href="cpp.php">←</a> <a class="arrow" href="ld.php">→</a>
<hr/>
<div style="width:30%;float: right; margin-left: 2ch; margin-bottom: 2ch;">
<table class="lined" style="width: 100%; text-align: center; margin-top: 0;">
<tr>
<td colspan="3">driver
</td><td style="border-top-style: hidden; border-right-style: hidden;"></td>
</tr>
<tr>
<td>cpp</td>
<td>cc<span class="r">*</span></td>
<td>ld</td>
<td>loader</td>
</tr>
</table>
</div>
<p style="margin-top: 0;">
The compiler stage is the most complicated element in the pipeline. Because of the purpose of these articles, this is going to be the simplest part. If you want to learn how compilers work inside out, refer to the <a href="https://amzn.to/3JlbOZx">Dragon book</a>.
</p>
<h2>Compiler big picture</h2>
<p>The goal of the compiler is to open a translation unit, parse it, optimize it and output an object file (except in the case of LTO which is discussed later). These object files are also sometimes called relocatable.
</p>
<p>All compilers are structured the same way. They have a Frontend which ingest the text and transform it into an Intermediate Representation (IR). The IR is usually modified by optimizers before being consumed by the Backend which is in charge of generating machine specific instructions and package them into an object file format container.
</p>
<img class="center" style="width:75%; margin-bottom: 2ch; border:0;" src="illu/SimpleCompiler.svg"/>
<div class="t"> Clang is a frontend which generates an IR consumed by LLVM backend. Its well documented and kinda human readable IR format has opened the door to many tools. Among them is Rust's compiler, <code>rustc</code>, which is a LLVM frontend in charge of generating LLVM IR.<br/><br/>
<img class="center" style="border:0;width:75%;" src="illu/LLVMCompiler1.svg"/>
</div>
<h2>Output format</h2>
<p>The input format, the translation unit, was studied in the previous section about the preprocessor. Let's focus on what the compiler has to output. The format is given to us via the tool <code>file</code> after requesting the driver to output a relocatable file instead of an executable.</p>
<pre>// mult.c
int mul(int x, int y);
int pow(int x) { return mul(x, x) ; }
</pre>
<p>Note how using <code>-c</code> flag simply made the driver call itself in compiler mode (<code>-cc1</code>) and skip the linker stage.
</p>
<pre><b>$</b> clang -v <span class="r">-c</span> mult.c -o mult.o
clang -cc1 mult.c -o mult.o
<b>$</b> file mult.o
<span class="r">mult.o: ELF 64-bit LSB relocatable, ARM aarch64, version 1 (SYSV), not stripped</span>
</pre>
<p>The relocatable files are commonly called "object" file and use a <code>.o</code> extension. Let's use <code>binutils</code>'s <code>readelf</code> to peek inside it.
</p>
<pre><b>$</b> readelf <span class="r">-S</span> -W mult.o
There are 9 section headers, starting at offset 0x1d8:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 0000000000000000 0001b1 000071 00 0 0 1
[ 2] .text PROGBITS 0000000000000000 000040 000028 00 AX 0 0 4
[ 3] .rela.text RELA 0000000000000000 000180 000018 18 I 9 2 8
[ 4] .comment PROGBITS 0000000000000000 000068 000026 01 MS 0 0 1
[ 5] .note.GNU-stack PROGBITS 0000000000000000 00008e 000000 00 0 0 1
[ 6] .eh_frame PROGBITS 0000000000000000 000090 000030 00 A 0 0 8
[ 7] .rela.eh_frame RELA 0000000000000000 000198 000018 18 I 9 6 8
[ 8] .llvm_addrsig LOOS+0xfff4c03 0000000000000000 0001b0 000001 00 E 9 0 1
[ 9] .symtab SYMTAB 0000000000000000 0000c0 0000c0 18 1 6 8
</pre>
<p>The output is organized in named sections. The most important one to know is <code>.text</code>, where the functions instructions are stored. We can experiment with the source code to see the two other most common sections.
</p>
<pre>// manySymbols.c
int myInitializedVar = 1;
int myUnitializedVar;
int add(int x, int y);
int mult(int x) { return add(x, x) ; }
</pre>
<p>Let's compile to a relocatable object and peek inside again.</p>
<pre><b>$</b> clang -c -o manySymbols.o manySymbols.c
<b>$</b> readelf -S -W manySymbols.o
There are 12 section headers, starting at offset 0x2c0:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 0000000000000000 000219 0000a5 00 0 0 1
[ 2] .text PROGBITS 0000000000000000 000040 000028 00 AX 0 0 4
[ 3] .rela.text RELA 0000000000000000 0001e8 000018 18 I 11 2 8
[ 4] <span class="r">.data</span> PROGBITS 0000000000000000 000068 000004 00 WA 0 0 4
[ 5] <span class="r">.bss</span> NOBITS 0000000000000000 00006c 000004 00 WA 0 0 4
[ 6] .comment PROGBITS 0000000000000000 00006c 000026 01 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 000092 000000 00 0 0 1
[ 8] .eh_frame PROGBITS 0000000000000000 000098 000030 00 A 0 0 8
[ 9] .rela.eh_frame RELA 0000000000000000 000200 000018 18 I 11 8 8
[10] .llvm_addrsig LOOS+0xfff4c03 0000000000000000 000218 000001 00 E 11 0 1
[11] .symtab SYMTAB 0000000000000000 0000c8 000120 18 1 8 8</pre>
<p>The addition of an initialized variable made the compiler use a <code>.data</code> section. The addition of an uninitialized variable made the compiler use a <code>.bss</code> section.</p>
<h2>Symbols</h2>
<p>A relocatable lists both export symbols and import symbols. These lists are in the <code>.symbtab</code> sections, which refers to strings in the <code>.strtab</code> section.
</p>
<pre><b>$</b> // importExport.c
extern const int myConstant;
extern void foo(int x);
int myVar1;
int myVar2;
void bar() {
foo(myConstant);
}
</pre>
<p>Let's look at the exported and imported symbols with <code>nm</code>.</p>
<pre><b>$</b> clang -c mult.c -o mult.o
<b>$</b> <span class="r">nm</span> mult.o
0000000000000000 T bar
U foo
U myConstant
0000000000000000 B myVar1
0000000000000004 B myVar2
</pre>
<p>As expected we find three symbols exported, a function <code>bar</code> (with an offset in <code>.text</code> of <code>0x0</code>) and two uninitialized variables in the <code>bss</code> section. Variable <code>myVar1</code> is at offset <code>0x0</code> and <code>myVar2</code> is four bytes further at offset <code>0x4</code>.
</p>
<p>
We also see two undefined (a.k.a imported) symbols, <code>foo</code> and <code>myConstant</code> with the <code>U</code> type. These obviously don't have an offset. The complete list of <code>nm</code> letter codes and their meaning is as follows.
</p>
<pre>A A global, absolute symbol.
B A global "bss" (uninitialized data) symbol.
C A "common" symbol, representing uninitialized data.
D A global symbol naming initialized data.
N A debugger symbol.
R A read-only data symbol.
T A global text symbol.
U An undefined symbol.
V A weak object.
W A weak reference.
a A local absolute symbol.
b A local "bss" (uninitialized data) symbol.
d A local data symbol.
r A local read-only data symbol.
t A local text symbol.
v A weak object that is undefined.
w A weak symbol that is undefined.
? None of the above.</pre>
<p>We can write a rainbow source file which hits as many types of symbols as possible when compiled to object.</p>
<pre>extern int undVar; // Should be U
int defVar; // Should be B
extern const int undConst; // Should be U
const int defConst = 1; // Should be R
extern int undInitVar; // Should be U
int defInitVar = 1; // Should be D
static int staticVar; // Should be b
static int staticInitVar=1; // Should be d
static const int staticConstVar=1; // Should be r
static void staticFun(int x) {} // Should be t
extern void foo(int x); // Should be U
void bar(int x) { // Should be T
foo(undVar);
staticFun(undConst);
}</pre>
<p>Since we are using an OS with two great compilers available, we can compile with both <code>gcc</code> and <code>clang</code> to see the differences.</p>
<pre><b>$</b> <span class="r">clang</span> -c rainbow.c -o rainbow.o && nm rainbow.o
0000000000000000 T bar
0000000000000000 R defConst
0000000000000000 D defInitVar
0000000000000000 B defVar
U foo
000000000000003c t staticFun
U undConst
U undVar</pre>
<pre><b>$</b> <span class="r">gcc</span> -c rainbow.c -o rainbow.o && nm rainbow.o
0000000000000014 T bar
0000000000000000 R defConst
0000000000000000 D defInitVar
0000000000000000 B defVar
U foo
0000000000000004 r staticConstVar
0000000000000000 t staticFun
0000000000000004 d staticInitVar
0000000000000004 b staticVar
U undConst
U undVar
</pre>
<h2>Global symbol / Local symbol</h2>
<p><code>nm</code> outputs differentiate between local and global symbols. A local symbol is only visible within a relocatable unit. In C, this is achieved with a <code>static</code> storage class specifier.
</p>
<p>Global are visible to all relocatable units. It is something that is revisited in the linker article.</p>
<h2>Weak and strong symbols</h2>
<p><code>nm</code> output also differentiates between "strong" symbols (the default) and weak symbols.</p>
<p>A weak symbol can be overwritten by a strong symbol.
</p>
<table>
<tr>
<td>
<pre> // weak.c
#include "stdio.h"
extern int getNumber();
int main() {
printf("%d\n", <span class="r">getNumber()</span>);
}
</pre>
</td>
<td>
<pre>// number1.c
int <span class="r">getNumber</span>() {
return 1;
}
</pre>
</td>
<td>
<pre>// number2.c
int <span class="r">getNumber</span>() {
return 2;
}
</pre>
</td>
</tr>
</table>
<p>By default all symbols are strong. In this example, the linker fails because it does not know which <code>getNumber</code> to pick. when it is used in <code>weak.c</code>.</p>
<pre><b>$</b> clang -o weak weak.c number1.c number2.c
/usr/bin/ld: number2.o: in function `getNumber':
number2.c:(.text+0x0): <span class="r">multiple definition of `getNumber')</span>; number1.o:number1.c:(.text+0x0): first defined here
clang: error: linker command failed with exit code 1 (use -v to see invocation
</pre>
<p>If we declare one of the duplicate functions as <code>weak</code>, the program compiles and run normally, regardless of the compilation and linking order.</p>
<table>
<tr>
<td>
<pre> // weak.c
#include "stdio.h"
extern int getNumber();
int main() {
printf("%d\n", getNumber());
}
</pre>
</td>
<td>
<pre>// number1.c
<span class="r">__attribute__((weak))</span> int getNumber() {
return 1;
}
</pre>
</td>
<td>
<pre>// number2.c
int getNumber() {
return <span class="g">2</span>;
}
</pre>
</td>
</tr>
</table>
<pre><b>$</b> clang -o weak weak.c number1.c number2.c
<b>$</b>./weak
<span class="g">2</span>
<b>$</b> clang -o weak weak.c number2.c number1.c
<b>$</b>./weak
<span class="g">2</span>
</pre>
<h2>Weak symbols and libc</h2>
<p>Most <code>libc</code> implementations declare their methods "weak" so users can intercept them. This is not always as convenient as it seems. Let's look at how to intercept <code>malloc</code>.</p>
<pre>// mymalloc.c
#define _GNU_SOURCE // Could have been defined with -D on command-line
#include "stddef.h"
#include "dlfcn.h"
#include "stdio.h"
#include "stdlib.h"
#include "stdio.h"
#include "dlfcn.h"
void* malloc(size_t sz) {
void *(*libc_malloc)(size_t) = dlsym(RTLD_NEXT, "malloc");
printf("malloced %zu bytes\n", sz);
return libc_malloc(sz);
}
int main() {
char* x = malloc(100);
return 0;
}
</pre>
<p>This program will enter an infinite loop until it segfaults. This is because <code>dlsym</code> calls <code>malloc</code>.
<pre><b>$</b> clang mymalloc.
<b>$</b> ./a.out
<span class="r">Segmentation fault (core dumped)</span>
</pre>
<p>For such cases, GNU's <code>libc</code> used to provide special hooks such as <code>__malloc_hook</code>...but they became deprecated. Now the best way is to MITM via the loader and <code>LD_PRELOAD</code>.</p>
<pre> // mtrace.c
#include <stdio.h>
#include <dlfcn.h>
static void* (*real_malloc)(size_t) = nullptr;
void *malloc(size_t size) {
if(!real_malloc) {
real_malloc = dlsym(RTLD_NEXT, "malloc");
}
printf("malloc(%d) = ", size);
return real_malloc(size);
}
</pre>
<pre><b>$</b> clang -shared -fPIC -D_GNU_SOURCE -o mtrace.so mtrace.c
$ LD_PRELOAD=./mtrace.so ls
malloc(472) = 0xaaab24e4b2a0
malloc(120) = 0xaaab24e4b480
malloc(1024) = 0xaaab24e4b500
malloc(5) = 0xaaab24e4b910
...
<b>$</b></pre>
<div class="t"> Weak symbols are also paramount for C++ and especially the STL (see below).</div>
<h2>How C++ template leverage weak symbols</h2>
<p>There is one further usage of weak symbols. When using STL templates, each relocatable receives a copy of instructions and symbols when instantiation is involved. As a result, two translation units using <code>vector<int></code> end up with the same symbols.
</p>
<table>
<tr>
<td>
<pre>// c++foo.cc
#include <vector>
void foo() {
auto v = std::vector<int>();
}
</pre>
</td>
<td>
<pre>// c++bar.cc
#include <vector>
void bar() {
auto v = std::vector<int>();
}
</pre>
</td>
</tr>
</table>
<p>
<code>nm</code> confirms the duplicates in both object files.
</p>
<pre><b>$</b> clang -c -o c++foo.o c++foo.cc
<b>$</b> nm c++foo.o | grep -E 'vector|bar|foo'
0000000000000000 T foo()
0000000000000000 <span class="r">W</span> std::vector<int, std::allocator<int> >::vector()
0000000000000000 <span class="r">W</span> std::vector<int, std::allocator<int> >::~vector()</pre>
<pre><b>$</b> clang -c -o c++bar.o c++bar.cc
<b>$</b> nm c++bar.o | grep -E 'vector|bar|foo'
0000000000000000 T bar()
0000000000000000 <span class="r">W</span> std::vector<int, std::allocator<int> >::vector()
0000000000000000 <span class="r">W</span> std::vector<int, std::allocator<int> >::~vector()
</pre>
<p>When the linker sees several symbols it favors the "strong" one. However if only "weak" ones are available it picks up any of them without throwing an error. This behavior can be exposed in an example using template and <code>-D</code>.</p>
<table>
<tr>
<td>
<pre>// weak_main.cc
const char* foo();
const char* bar();
#include "stdio.h"
int main() {
printf("%s\n", foo());
printf("%s\n", bar());
}
</pre>
</td>
<td>
<pre>// <span class="b">c++foo.cc</span>
#define NAME "foo"
#include "template.h"
const har* foo() {
Name<const char*> name;
return name.get();
}
</pre>
</td>
<td>
<pre>// <span class="g">c++bar.cc</span>
#define NAME "bar"
#include "template.h"
const char* bar() {
Name<const char*> name;
return name.get();
}
</pre>
</td>
<td>
<pre> // template.h
template<typename T> struct Name {
T get() const {
return T{NAME};
}
};
</typename>
</td>
</tr>
</table>
<p>At first sight, the program above should print to the console <code>"foo"</code> and then <code>"bar"</code> but it doesn't. Because of C++ One Definition Rule (ODR) all these symbols are marked as weak so a single one is picked, depending on the order the linker sees them.</p>
<pre><b>$</b> clang++ -o main weak_main.cc <span class="b">c++foo.cc</span> <span class="g">c++bar.cc</span>
<b>$</b> ./main
<span class="r">foo
foo</span>
<b>$</b> clang++ -o main weak_main.cc <span class="g">c++bar.cc</span> <span class="b">c++foo.cc</span>
<b>$</b> ./main
<span class="r">bar
bar</span>
</pre>
<p>The original illustration of this process was found <a href="https://stackoverflow.com/questions/44335046/how-does-the-linker-handle-identical-template-instantiations-across-translation"> here</a>.
</p>
<h2>Relocation</h2>
<p>The symbols list shows imports and exports names. That is enough for the linker to understand what an object provides and needs but that is not enough to relocate the relocatables. The linker needs the exact location of each symbols in an object. These are stored in relocation tables which <code>readelf</code> can show us.</p>
<pre><b>$</b> readelf <span class="r">-r</span> mult.o
Relocation section '.rela.text' at offset 0x1d8 contains 5 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000010 000800000137 R_AARCH64_ADR_GOT 0000000000000000 myConstant + 0
000000000014 000800000138 R_AARCH64_LD64_GO 0000000000000000 myConstant + 0
00000000001c 000900000113 R_AARCH64_ADR_PRE 0000000000000000 myVariable + 0
000000000020 00090000011d R_AARCH64_LDST32_ 0000000000000000 myVariable + 0
000000000024 000a0000011b R_AARCH64_CALL26 0000000000000000 add + 0
Relocation section '.rela.eh_frame' at offset 0x250 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
00000000001c 000200000105 R_AARCH64_PREL32 0000000000000000 .text + 0
</pre>
<p>Every single usage of an imported variable/function is present in the relocation table. It provides everything the linker needs like the section to patch, the offset, the type of usage, and of course the symbol name.</p>
<h2>Mangling</h2>
<p>So far we used examples using the C language which results in simple symbol names where function/variable results in a symbol of the same name. Things get more complicated when a language allows function overloading.</p>
<p>
To illustrate mangling, instead of letting the driver detect the language, we can declare it ourselves and see what happens with the symbols table.</p>
<pre>// sample.c
void foo() {};
</pre>
<p>Let's first compile <code>sample.c</code> as a C file (with <code>-x c</code>) and then as a C++ file <code>-x c++</code>.</p>
<pre><b>$</b> clang -c <span class="r">-x c</span> sample.c -o sample.o
<b>$</b> nm sample.o
0000000000000000 T <span class="r">foo</span>
<b>$</b> clang -c <span class="r">-x c++</span> sample.c -o sample.o
<b>$</b> nm sample.o
0000000000000000 T <span class="r">_Z3foov</span>
</pre>
<p>Thanks to mangling, C++ allows functions to have the same name. They get different symbol names thank to the parameter types. Symbols avoid function name collision via a special encoding but name mangling can lead to linking issues.
</p>
<table style="width:100%;">
<tr>
<td>
</td>
<td>
<pre>// bar.h
void bar();
</pre>
</td>
</tr>
<tr>
<td>
<pre>// main.cpp
#include "bar.h"
int main() {
bar();
return 0;
}
</pre>
</td>
<td>
<pre>// bar.c
void bar() {};
</pre>
</td>
</tr>
</table>
<pre><b>$</b> clang main.cpp bar.c -o main <span class="r">
/usr/bin/ld: /tmp/m-7f361c.o: in function `main':
main.cc:(.text+0x18): undefined reference to `bar()'
clang: error: linker command failed with exit code 1 (use -v to see invocation)</span>
</pre>
<p>The project won't link properly because the symbols for the function <code>bar</code> do not match (<code>main.cpp</code> was mangled as C++ but <code>bar.c</code> was mangled as C).</p>
<pre><b>$</b> nm main.o
0000000000000000 T main
U <span class="r">_Z3barv</span>
<b>$</b> nm bar.o
0000000000000000 T <span class="r">bar</span>
</pre>
<p>There is a simple solution. Just use the name mangling C++ expect to name your functions and variables in your C++.</p>
<table style="width:100%;">
<tr>
<td>
</td>
<td>
<pre>// bar.h
void <span class="r">_Z3barv</span>();
</pre>
</td>
</tr>
<tr>
<td>
<pre>// main.cpp
#include "bar.h"
int main() {
bar();
return 0;
}
</pre>
</td>
<td>
<pre>// bar.c
void <span class="r">_Z3barv</span>() {};
</pre>
</td>
</tr>
</table>
<p>It works, problem solved!</p>
<pre><b>$</b> clang main.cpp bar.c -o main
<b>$</b></pre>
<p>A more serious and realistic solution is to use a macro to let the compiler know that it should generate import symbol names without mangling them. This is done via <code>extern "C"</code>.</p>
<table style="width:100%;">
<tr>
<td>
</td>
<td>
<pre>// bar.h
<span class="r">extern "C" {</span>
void bar();
<span class="r">}</span>
</pre>
</td>
</tr>
<tr>
<td>
<pre>// main.cpp
#include "bar.h"
int main() {
bar();
return 0;
}
</pre>
</td>
<td>
<pre>// bar.c
void bar() {};
</pre>
</td>
</tr>
</table>
<p>Compilation works, the export/import symbol tables have no mismatch.</p>
<pre><b>$</b> clang main.cpp bar.c -o main
<b>$</b> nm main.o
0000000000000000 T main
U <span class="r">bar</span>
<b>$</b> nm bar.o
0000000000000000 T <span class="r">bar</span>
</pre>
<h2>Section management</h2>
<p>We have seen earlier how variables, constants, and functions end up in three sections <code>text</code>, <code>data</code>, and <code>bss</code> but the compiler can operate at a lower granularity.
</p>
<p>Instead of generating huge sections, the compiler can generate one section per symbol. This later allows the linker to pick only what is useful and reduce the size of the executable.</p>
<pre>// sections.c
int a = 0;
int b = 0;
int funcA() { return a;}
int funcB() { return b;}
</pre>
<table>
<tr>
<td><pre><b>$</b> clang -c -o sections.o sections.c
<b>$</b> readelf -S -W sections.o
There are <span class="r">11</span> section headers:
Section Headers:
[Nr] Name Type
[ 0] NULL
[ 1] .strtab STRTAB
[ 2] .text PROGBITS
[ 3] .rela.text RELA
[ 4] .bss NOBITS
[ 5] .comment PROGBITS
[ 6] .note.GNU-stack PROGBITS
[ 7] .eh_frame PROGBITS
[ 8] .rela.eh_frame RELA
[ 9] .llvm_addrsig LOOS+0xfff4c03
[10] .symtab SYMTAB
</pre>
</td>
<td><pre><b>$</b> clang -c -o sections.o sections.c \
<span class="r">-ffunction-sections -fdata-sections</span>
<b>$</b> readelf -S -W sections.o
There are <span class="r">15</span> section headers:
Section Headers:
[Nr] Name Type
[ 0] NULL
[ 1] .strtab STRTAB
[ 2] .text PROGBITS
[ 3] <span class="r">.text.funcA</span> PROGBITS
[ 4] <span class="r">.rela.text.funcA</span> RELA
[ 5] <span class="r">.text.funcB</span> PROGBITS
[ 6] <span class="r">.rela.text.funcB</span> RELA
[ 7] .bss.a NOBITS
[ 8] .bss.b NOBITS
[ 9] .comment PROGBITS
[10] .note.GNU-stack PROGBITS
[11] .eh_frame PROGBITS
[12] .rela.eh_frame RELA
[13] .llvm_addrsig LOOS+0xfff4c03
[14] .symtab SYMTAB
</pre>
</td>
</tr>
</table>
<h2
<h2>Optimization level</h2>
<p>By far the most important flag to pass the compiler is the level of optimization to apply to the IR before generating the instructions. By default, no optimizations are performed. It shows, even with a program doing almost nothing.
</p>
<pre>// do_nothing.c
void do_nothing() {
}
int main() {
for(int i= 0 ; i < 1000000000 i++)
do_nothing();
return 0;
}
</pre>
<p>Let's build and measure how long it takes to do nothing.
</p>
<pre><b>$</b> clang do_nothing.c
<b>$</b> time ./a.out
<span class="r">real 0m2.374s</span>
user 0m2.104s
sys 0m0.015s
</pre>
<p>This program should have completed near instantly but because of the function call overhead, it took two seconds. Let's try again but this time, allowing optimization to occur.
</p>
<pre><b>$</b> clang do_nothing.c <span class="r">-O3</span>
<b>$</b> time ./a.out
<span class="r">real 0m0.224s</span>
user 0m0.011s
sys 0m0.014s
</pre>
<p>While some optimization focuses on runtime, others focus on code size. They are listed <a href="https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html">here</a>.
</p>
<div class="t">If you have a few hours to spare, treat yourself <a href="https://binary.ninja/">Binary Ninja</a> and take a look at the marvels optimizers come up with.
</div>
<h2>The translation unit barrier</h2>
<p>
Let's keep iterating with the previous program that does nothing. Compiler optimization <code>-O3</code> is awesome but it has its limitations because it only operates at the translation unit level. Let's see what happens when the <code>do_nothing</code> function is in a different source file.
</p>
<table style="width:100%;table-layout:fixed">
<tr>
<td>
<pre>// opt_main.c
extern void do_nothing();
int main() {
for(int i= 0 ; i <1000000000 ;i++)
do_nothing();
}
</pre>
</td>
<td style="width:45%;">
<pre>// do_nothing_tu.c
void do_nothing() {
}
</pre>
</td>
</tr>
</table>
<pre><b>$</b> clang <span class="r">-O3</span> opt_main.c do_nothing_tu.c </span>
<b>$</b> time ./a.out
<span class="r">real 0m2.056s</span>
user 0m1.824s
sys 0m0.018s
</pre>
<p>Even with optimization enabled, we are back to the poor performance of an un-optimized executable. Due to the siloed nature of translation unit processing, the compiler could not decide whether calls to <code>do_nothing</code> should be pruned and generated a callsite anyway.
</p>
<h3>Breaking the barrier the old-school way</h3>
<p>
The solution to this problem would be to perform optimization not at the translation unit level but at the program level. Since only the linker has a vision of all components (and it can only see sections and symbols), this is seemingly not possible.
</p>
<p>
The trick to make it work is called "artisanal LTO". It consists in creating a super translation unit, containing all the source code of the program. We can do that with the pre-processor.
</p>
<table style="width:100%;">
<tr>
<td>
<pre>// all.c
#include "do_nothing.c"
#include "opt_main.c"
</pre>
</td>
<td>
<pre>// opt_main.c
extern void do_nothing();
int main() {
for(int i= 0 ; i <1000000000 ;i++)
do_nothing();
}
</pre>
</td>
<td style="width:45%;">
<pre>// do_nothing.c
void do_nothing() {
}
</pre>
</td>
</tr>
</table>
<p>
Now able to see that <code>do_nothing</code> is a no-op, the compiler is able to optimize it away.
</p>
<pre>$ clang all.c -O3</span>
$ time ./a.out
<span class="r">real 0m0.163s</span>
user 0m0.012s
sys 0m0.014s
</pre>
<p>Of course the bigger and complex the program, the less practical it is which led to LTO.</p>
<h2>LTO</h2>
<p>
Thankfully, the "artisanal LTO" trick is no longer needed. Compilers can outputs extra information in the relocatables for the linker to use. Both <code>GNU</code>'s <code>GCC</code> and <code>LLVM</code> implement Link-Time Optimizations via <code>-flto</code> flag but they do it differently.
</p>
<h3>GCC's LTO</h3>
<p>
GCC compiler implements LTO in a way that lets the linker fail gracefully if it does not support it. The program will still be linked but without link-time optimizations.
To this effect, GCC generates fat-objects which not only contains everything an <code>.obj</code> should have but also GCC's intermediate representation (<code>GIMPLE</code> bytecode).
</p>
<table style="width:100%;table-layout:fixed">
<tr>
<td>
<pre>// opt_main.c
extern void do_nothing();
int main() {
for(int i= 0 ; i <1000000000 ;i++)
do_nothing();
}
</pre>
</td>
<td style="width:45%;">
<pre>// do_nothing.c
void do_nothing() {
}
</pre>
</td>