forked from aaronbloomfield/pdr
-
Notifications
You must be signed in to change notification settings - Fork 228
/
Copy pathindex.html
327 lines (325 loc) · 15.5 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
<title>PDR: Laboratory 9: x86 Assembly Language, part 2 (64 bit)</title>
<style>
code{white-space: pre-wrap;}
span.smallcaps{font-variant: small-caps;}
span.underline{text-decoration: underline;}
div.column{display: inline-block; vertical-align: top; width: 50%;}
div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
ul.task-list{list-style: none;}
.display.math{display: block; text-align: center; margin: 0.5rem auto;}
</style>
<link rel="stylesheet" href="../../markdown.css" />
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<h1 id="pdr-laboratory-9-x86-assembly-language-part-2-64-bit">PDR:
Laboratory 9: x86 Assembly Language, part 2 (64 bit)</h1>
<p><a href="../index.html">Go up to the Labs table of contents
page</a></p>
<h3 id="objective">Objective</h3>
<p>This lab is one of two labs meant to familiarize you with the process
of writing, assembling, and linking assembly language code. The purposes
of the in-lab and in-lab activities are to investigate how various C++
language features are implemented at the assembly level.</p>
<p>There are both <a href="../lab09-32bit/index.html">32 bit</a> (<a
href="../lab09-32bit/index.md">md</a>) and <a
href="../lab09-64bit/index.html">64 bit</a> (<a
href="../lab09-64bit/index.md">md</a>) versions of this lab. This is the
<strong><em>64 bit version</em></strong>.</p>
<h3 id="background">Background</h3>
<p>The Intel x86 assembly language is currently one of the most popular
assembly languages and runs on many architectures from the x86 line
through the Pentium 4. It is a CISC instruction set that has been
extended multiple times (e.g. MMX) into a larger instruction set.</p>
<h3 id="tutorial">Tutorial</h3>
<p>Read the <a href="../../tutorials/09-c/index.html">C tutorial</a> for
the in-lab.</p>
<h3 id="recommended-readings">Recommended Readings</h3>
<ol type="1">
<li>Read the <a href="../../slides/08-x86.html">slides on x86</a></li>
<li>The two book chapters on x86: <a
href="../../book/x86-64bit-asm-chapter.pdf">x86 Assembly</a> and <a
href="../../book/x86-64bit-ccc-chapter.pdf">The x86 C Calling
Convention</a>.</li>
<li>The <a
href="http://www.cs.virginia.edu/~evans/cs216/guides/x86.html">x86
Assembly Guide</a> from an older course at UVA (ignore the section on
Calling Convention)</li>
</ol>
<h2 id="lab-procedure">Lab Procedure</h2>
<h3 id="pre-lab">Pre-lab</h3>
<ol type="1">
<li>Create an assembly program to demonstrate the Collatz
conjecture</li>
<li>Time your program using a helper C++ file</li>
<li>Optimize your program and list all optimizations used</li>
<li>Files to download: <a
href="../lab06/code/timer.cpp.html">timer.cpp</a> (<a
href="../lab06/code/timer.cpp">src</a>), <a
href="../lab06/code/timer.h.html">timer.h</a> (<a
href="../lab06/code/timer.h">src</a>) (from the lab 6 directory)</li>
<li>Files to submit: threexplusone.s, threexinput.cpp, timer.cpp,
timer.h, Makefile</li>
</ol>
<h3 id="in-lab">In-lab</h3>
<ol type="1">
<li>Complete the provided linkedlist.c program by going through <a
href="../../tutorials/09-c/index.html">Tutorial 9: C</a> (<a
href="../../tutorials/09-c/index.md">md</a>)</li>
<li>Files to download: <a href="linkedlist.c.html">linkedlist.c</a> (<a
href="linkedlist.c">src</a>)</li>
<li>Files to submit: linkedlist.c, Makefile</li>
</ol>
<h3 id="post-lab">Post-lab</h3>
<ol type="1">
<li>Write an x86 function to perform a binary search</li>
<li>Files to download: <a
href="testBinarySearch.cpp.html">testBinarySearch.cpp</a> (<a
href="testBinarySearch.cpp">src</a>), <a
href="timer.cpp.html">timer.cpp</a> (<a href="timer.cpp">src</a>), and
<a href="timer.h">timer.h</a> (<a href="timer.h">timer.h</a>)</li>
<li>Files to submit: testBinarySearch.cpp, binarySearch.s, timer.cpp,
timer.h, Makefile</li>
</ol>
<hr />
<h2 id="pre-lab-1">Pre-lab</h2>
<p>You may want to reference the “Compiling Assembly With C++” and
“Vecsum” sections from the <a href="../lab08-64bit/index.html">previous
x86 lab</a>.</p>
<h3 id="the-collatz-conjecture">The Collatz Conjecture</h3>
<p>The 3x+1 conjecture (also called the Collatz conjecture) is an open
problem in mathematics, meaning that it has not yet been proven to be
true. The conjecture states that if you take any positive integer, you
can repeatedly apply the following function to it:</p>
<p><img src="formula.png" /></p>
<p>The conjecture is that eventually, the result will reach 1. For
example, consider <em>x</em> = 13:</p>
<ol type="1">
<li><em>f</em>(13) = 3 * 13 + 1 = 40</li>
<li><em>f</em>(40) = 40 / 2 = 20</li>
<li><em>f</em>(20) = 20 / 2 = 10</li>
<li><em>f</em>(10) = 10 / 2 = 5</li>
<li><em>f</em>(5) = 3 * 5 + 1 = 16</li>
<li><em>f</em>(16) = 16 / 2 = 8</li>
<li><em>f</em>(8) = 8 / 2 = 4</li>
<li><em>f</em>(4) = 4 / 2 = 2</li>
<li><em>f</em>(2) = 2 / 2 = 1</li>
</ol>
<p>Note that this took 9 steps to reach the value 1. And it also shows
that this conjecture is true for a number of other values (2, 4, 5, 8,
10, 16, 20, and 40).</p>
<p><a href="Collatz-graph-all-30-no27.png">An image</a> (from Wikipedia)
shows how paths of most integers less than 50 converge to 1.</p>
<p>While this conjecture has been proven only up to at least 5.6 *
10^13, it is widely believed to be true for all positive integers. If
you are interested, more information on this conjecture can be found <a
href="http://en.wikipedia.org/wiki/Collatz_conjecture">here</a>.</p>
<p>We won’t be testing with any values above 10^6, so you can safely
assume that the Collatz conjecture holds true for all of the input
values that we will use. That is, you won’t have to worry about integer
overflow.</p>
<p>Your task is to write a routine, called <code>threexplusone</code>,
that takes in a positive integer and returns the number of steps
required for that integer to reach 1 by following the Collatz
conjecture. An input of 13 takes 9 steps, as shown above. The Wikipedia
page shows a few other input sizes and the number of steps: an input of
6 takes 8 steps; an input of 14 takes 17 steps; an input of 27 takes 111
steps. If the input is 1, the output should be zero.</p>
<p>This routine <strong><em>MUST</em></strong> call itself recursively
using the proper C-style calling convention. The assembly code should be
in a threexplusone.s file. <strong>If you write your function so that it
is an iterative solution, you will not receive credit for this
pre-lab.</strong></p>
<h3 id="testing-and-timing">Testing and timing</h3>
<p>You will need to write a C++ file called threexinput.cpp that allows
you to test your subroutine. This input file should:</p>
<ol type="1">
<li>Ask for an input value, <em>x</em>, which is the positive integer to
pass to the subroutine</li>
<li>Ask for an input value, <em>n</em>, which is the number of times to
call the subroutine</li>
<li>Run the subroutine once and store the result</li>
<li>Run the subroutine <em>n</em> times with the parameter <em>x</em> as
the input</li>
<li>Print out the number of iterations it took for the integer to
converge to 1.</li>
</ol>
<p>You can assume that both <em>x</em> and <em>n</em> are positive
integers.</p>
<p>We can now time the subroutine to see how fast it runs. Download the
timer code from the hash table lab (lab 6: <a
href="../lab06/code/timer.cpp.html">timer.cpp</a> (<a
href="../lab06/code/timer.cpp">src</a>) and <a
href="../lab06/code/timer.h.html">timer.h</a> (<a
href="../lab06/code/timer.h">src</a>)) and use it to guage the average
time it took to run each subroutine call in step 4.</p>
<p>You should use an appropriate precision number to make sure you don’t
report zero when you divide the total time by the number of runs. Your
timer code should only include the loop of <em>n</em> times that calls
the routine with <em>x</em> as the parameter. Nothing else (including
the print statement) should be inside the timer code.</p>
<h3 id="optimization">Optimization</h3>
<p>Now that you can time your program, you will need to optimize it as
much as possible. Any optimization is valid, as long as it computes the
correct result, is still a recursive subroutine, and follows the C
calling convention. The grade on this pre-lab will be based both on the
correctness of the subroutine and the optimizations included.</p>
<p>What optimizations do you use?</p>
<ul>
<li>First, try to figure out how you can write the same routine using
fewer x86 instructions. x86 has lots of complex instructions that can be
used for this purpose – Google is your friend, here.</li>
<li><code>lea</code> can quickly add and multiply numbers in one
instruction.</li>
<li>Multiplication and division are expensive. Try to use bit shifts
whenever possible – <code>4*x+x</code> is likely to be faster than
<code>5*x</code>, as the former can be optimized to <code>x << 2 +
x</code>.</li>
<li>Branching slows down the execution speed of a program as the branch
condition must be checked every iteration. As much as possible,
eliminate branching (if/else statements, loops, etc.). For loops,
consider <a href="http://en.wikipedia.org/wiki/Loop_unrolling">loop
unrolling</a>.</li>
<li>Consider the <a
href="http://en.wikipedia.org/wiki/Memory_hierarchy">memory
hierarchy</a> and try to reduce memory accesses (this includes
<code>push</code> and <code>pop</code>).</li>
<li>Reduce the number of instructions used to create (and remove) the
activation record; this was done in a few x86 examples we studied: <a
href="../../slides/08-assembly-64bit.html#/max">max</a> and <a
href="../../slides/08-assembly-64bit.html#/fact">fact</a></li>
<li>Reduce the registers that are backed up to the stack in the calling
convention</li>
<li>Many optimizations are listed <a
href="http://en.wikipedia.org/wiki/Category:Compiler_optimizations">here</a>,
but most would not apply to this one program.</li>
</ul>
<p>You will need to include at least one optimization beyond just
figuring out how to write your subroutine with fewer instructions. You
should put the optimizations used as a comment in the beginning of your
assembly file.</p>
<p>Note that we, too, can write the function in C++ and compile it with
<code>clang++ -O2 -S -mllvm --x86-asm-syntax=intel</code>. And we will
be looking at that assembly code when we grade the pre-lab. If you write
your program this way, it constitutes an honor violation, so please
hand-code the assembly yourself.</p>
<p><strong>You must list, as comments in your assembly file, the
optimizations that you used!</strong> Just a brief description of what
optimizations you used is fine.</p>
<h3 id="sample-execution-run">Sample Execution Run</h3>
<p>Below is a sample execution run to show you the input and output
format we are looking for.</p>
<p>Input</p>
<pre><code>Enter a number: 100
Enter iterations of subroutine: 30</code></pre>
<p>Output</p>
<pre><code>Steps: 25</code></pre>
<h3 id="different-architectures">Different Architectures</h3>
<p>See the <a href="../lab08/index.html">last lab</a> for details, but
all code must be submitted to run under Linux, which is the platform
that does the compilation on the submission system.</p>
<hr />
<h2 id="in-lab-1">In-lab</h2>
<h3 id="complete-the-c-tutorials-exercise-program">Complete the C
tutorial’s exercise program</h3>
<p>Download the <a href="linkedlist.c.html">linkedlist.c</a> (<a
href="linkedlist.c">src</a>) test harness, your task is to implement a
simple doubly-linked list in C. Read the <a
href="../../tutorials/09-c/index.html">C tutorial</a> to help
familiarize yourself with the C language. This linked list is meant to
be simple, and can be implemented however you’d like in linkedlist.c. We
don’t really care how you implement the linked list, whether it be with
structs, arrays, etc. so long as it works with the provided harness.
However, the assignment is considerably easier to do by using an actual
linked list solution rather than an array-based solution. Your linked
list will need to be able to do the following:</p>
<ul>
<li>Insert at head – Push an integer onto the front of the linked
list</li>
<li>Insert at tail – Push an integer onto the back of the linked
list</li>
<li>Remove at head – Remove an integer from the front of the linked
list</li>
<li>Remove at tail – Remove an integer from the back of the linked
list</li>
<li>Print forward – Print the elements in forward order, delimited by a
space</li>
</ul>
<h3 id="sample-execution-run-1">Sample Execution Run</h3>
<p>Below is a sample execution run to show you the input and output
format we are looking for.</p>
<pre><code>----------------------------------------------------
This test harness operates with one linked list.
Use any of the following options to manipulate it:
* af <num> --- add integer to front
* ab <num> --- add integer to back
* rf --- remove front element
* rb --- remove back element
* p --- print list forward
* help --- print off this list
* -1 --- exit harness
Enter command: af 1
Enter command: af
Invalid command or operand, please input a valid command or help to see the list again.
Enter command: af 2
Enter command: ab 3
Enter command: p
Elements: 2 1 3
Enter command: rf
Enter command: p
Elements: 1 3
Enter command: rb
Enter command: p
Elements: 1
Enter command: -1</code></pre>
<hr />
<h2 id="post-lab-1">Post-Lab</h2>
<p>Download <a href="testBinarySearch.cpp.html">testBinarySearch.cpp</a>
(<a href="testBinarySearch.cpp">src</a>), <a
href="timer.cpp.html">timer.cpp</a> (<a href="timer.cpp">src</a>), and
<a href="timer.h">timer.h</a> (<a href="timer.h">timer.h</a>), which you
will use to test your code. Make sure you do not alter any of these
files, as your must include the original files in your submission.</p>
<p>Your task for the post-lab is to implement the
<code>binarySearch</code> function in assembly. This function will begin
at the middle of a sorted array, and continuously split the search space
in half until it finds the target element or reaches a point where it
knows the target is not in the array. The function takes in four
parameters. The first parameter is a pointer to an int array. The second
parameter is an integer representing the beginning of the array. The
third parameter is an integer representing the end of the array. The
fourth parameter is an integer representing the target element to find
in the array. The return type of this function is int, and will be the
index into the array that the target was found, or -1 if it wasn’t. Just
like with the <code>linearSearch</code> from Post-Lab 8,
<code>testBinarySearch</code> will take input from std and pass it on to
your <code>binarySearch</code> routine. Make sure you test your function
on various sized arrays.</p>
<p>To make sure your function is efficient enough, the
testBinarySearch.cpp file will call your function 1000 times and record
the overall time taken via timer.cpp.</p>
<h3 id="sample-execution-run-2">Sample Execution Run</h3>
<p>Below is a sample execution run to show you the input and output
format we are looking for.</p>
<pre><code>Enter the array size: 5
Enter value 0: -30
Enter value 1: -15
Enter value 2: 0
Enter value 3: 15
Enter value 4: 30
Enter target to search for: 30
Found 30 at index 4
Took 0.007ms</code></pre>
<p>The following resource explains the binary search algorithm. This is
what you need to implement in x86 assembly: <a
href="https://www.hackerearth.com/practice/algorithms/searching/binary-search/tutorial/">https://www.hackerearth.com/practice/algorithms/searching/binary-search/tutorial/</a></p>
</body>
</html>