-
Notifications
You must be signed in to change notification settings - Fork 41
/
Copy pathstar_discrepancy.html
346 lines (309 loc) · 9.65 KB
/
star_discrepancy.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
<html>
<head>
<title>
STAR_DISCREPANCY - The Star Discrepancy of a Pointset
</title>
</head>
<body bgcolor="#EEEEEE" link="#CC0000" alink="#FF3300" vlink="#000055">
<h1 align = "center">
STAR_DISCREPANCY <br> The Star Discrepancy of a Pointset
</h1>
<hr>
<p>
<b>STAR_DISCREPANCY</b>
is a C++ program which
computes bounds on the star discrepancy of an M-dimensional set of N points contained
in the unit hypercube,
by Eric Thiemard.
</p>
<p>
The pointset is stored in a file, in the TABLE format.
</p>
<p>
The star discrepancy is a commonly cited statistic for
determining how uniformly a pointset is distributed over a
region. For convenience, this region is usually taken as
the unit hypercube; <b>STAR_DISCREPANCY</b> will assume that
datasets under investigation are meant to fill up the unit
hypercube.
</p>
<p>
If the pointset to be investigated actually
lies in some other hypercube, a simply translation and rescaling
may be enough to transform the data. This will probably NOT
be satisfactory if the original region is rectangular, but
has sides of different length, or if the region is not rectangular.
</p>
<p>
The discrepancy measures the worst error
that would be made in estimating the area of a subregion of
the hypercube by simply noting the fraction of the pointset
contained in the subregion. If arbitrary subregions were allowed,
then it would always be possible to make this error equal to 1
(just take the region consisting of the hypercube minus the
pointset.) Since any "reasonable" area can be arbitrarily well
approximated by rectangles, the star discrepancy calculation
uses only rectangular subregions, whose sides are aligned with
coordinates directions, and one of whose corners is at the origin.
</p>
<p>
Formally, the star discrepancy of a pointset of <b>n</b> points
is symbolized by <b>D<sub>n</sub><sup>*</sup></b> and defined as
<blockquote><b>
D<sub>n</sub><sup>*</sup> = supremum ( P in I* )
| ( A(P,x) / n ) - lambda ( P ) |
</b></blockquote>
Here, <b>I*</b> is the set of all M-dimensional subintervals of the
form [0,p1] x [0,p2] x ... x [0,ps] where every <b>p</b> is between
0 and 1; <b>P</b> is any such subinterval; <b>lambda(P)</b> is the
volume of the subinterval, <b>A(P,x)</b> is the number of points of
the point set <b>x</b> that occur in <b>P</b>, and <b>n</b>
is the number of points in <b>x</b>.
</p>
<p>
Clearly, the star discrepancy is measuring how badly the pointset
estimates the volume of a subinterval. This worst error is somewhere
between 0 (absolutely no error ever) and 1 (totally missing the
volume of the unit hypercube). A value of 0.25, for instance,
means that there is a subinterval in the unit hypercube for which
the difference between its true and estimated volumes is 0.25.
(It might have a volume of 0.80, and be estimated at 0.55, for
instance, or a volume of 0.05 that is estimated at 0.30.)
</p>
<p>
The original program is by Eric Thiemard. Changes have been
made to the program so that it compiles under C++, uses a simpler
datafile format, and infers the dimensionality of the data from
the information in the datafile.
</p>
<h3 align = "center">
Usage:
</h3>
<p>
<dl>
<dt>
<b>star_discrepancy</b> <i>epsilon n table_file</i>
</dt>
<dd>
<ul>
<li>
<i>epsilon</i> is an error tolerance between 0 and 1,
and indicates the allowable error in the estimate;
</li>
<li>
<i>n</i> is the number of points to be read from the file;
</li>
<li>
<i>table_file</i> is a file in
<a href =
"../../data/table/table.html">
table format</a> containing at least <i>n</i> points.
The dimensionality of the pointset is inferred from the
file.
</li>
</ul>
</dd>
<dt>
<b>star_discrepancy</b> <i>epsilon n table_file num den</i>
</dt>
<dd>
where the two extra arguments are:
<ul>
<li>
<i>num</i> the optional balance numerator;
</li>
<li>
<i>den</i> the optional balance denominator;
</li>
</ul>
</dd>
</dl>
</p>
<h3 align = "center">
Languages:
</h3>
<p>
<b>STAR_DISCREPANCY</b> is available in
<a href = "../../c_src/star_discrepancy/star_discrepancy.html">a C version</a> and
<a href = "../../cpp_src/star_discrepancy/star_discrepancy.html">a C++ version</a>.
</p>
<h3 align = "center">
Related Data and Programs:
</h3>
<p>
<a href = "../../cpp_src/diaphony/diaphony.html">
DIAPHONY</a>,
a C++ program which
reads a file of N points in M dimensions and computes its diaphony, a measure
of point dispersion.
</p>
<p>
<a href = "../../cpp_src/table_latinize/table_latinize.html">
TABLE_LATINIZE</a>,
a C++ program which
can read a TABLE file
and write out a "latinized" version.
</p>
<p>
<a href = "../../cpp_src/table_quality/table_quality.html">
TABLE_QUALITY</a>,
a C++ program which
can read a TABLE file
and print out measures of the quality of dispersion of the points.
</p>
<h3 align = "center">
Author:
</h3>
<p>
Eric Thiemard
</p>
<h3 align = "center">
Reference:
</h3>
<p>
<ol>
<li>
<a href = "http://rosowww.epfl.ch/papers/discrbound2/">
http://rosowww.epfl.ch/papers/discrbound2/</a>,
the source code web site.
</li>
<li>
Eric Thiemard,<br>
An Algorithm to Compute Bounds for the Star Discrepancy,<br>
Journal of Complexity,<br>
Volume 17, pages 850-880, 2001.
</li>
</ol>
</p>
<h3 align = "center">
Source Code:
</h3>
<p>
<ul>
<li>
<a href = "star_discrepancy.cpp">star_discrepancy.cpp</a>,
the source code.
</li>
<li>
<a href = "star_discrepancy.sh">star_discrepancy.sh</a>,
commands to compile the source code.
</li>
</ul>
</p>
<h3 align = "center">
Examples and Tests:
</h3>
<p>
<ul>
<li>
<a href = "halton_02_00010.txt">
halton_02_00010.txt</a>,
10 Halton points in 2D.
</li>
<li>
<a href = "halton_02_00010_output.txt">
halton_02_00010_output.txt</a>,
the result of the command
<blockquote><b>
star_discrepancy 0.001 10 halton_02_00010.txt
</b></blockquote>
</li>
</ul>
</p>
<h3 align = "center">
List of Routines:
</h3>
<p>
<ul>
<li>
<b>MAIN</b> is the main program for the star discrepancy bound computation.
</li>
<li>
<b>CH_EQI</b> is true if two characters are equal, disregarding case.
</li>
<li>
<b>CH_TO_DIGIT</b> returns the integer value of a base 10 digit.
</li>
<li>
<b>DATA_READ</b> reads coordinate data from a file.
</li>
<li>
<b>DECOMPOSITION</b> carries out the decomposition of a subinterval.
</li>
<li>
<b>EXPLORE</b> ???
</li>
<li>
<b>FASTEXPLORE</b> ???
</li>
<li>
<b>FILE_COLUMN_COUNT</b> counts the number of columns in the first line of a file.
</li>
<li>
<b>FILE_ROW_COUNT</b> counts the number of row records in a file.
</li>
<li>
<b>FILE_USAGE</b> lists the legal input file format.
</li>
<li>
<b>FREETREE</b> frees storage associated with a tree.
</li>
<li>
<b>INITLEX</b> ???
</li>
<li>
<b>INITPARAMETERS</b> sets program parameters based on user input and defaults.
</li>
<li>
<b>INSERTLEX</b> ???
</li>
<li>
<b>LOWBOUND</b> ???
</li>
<li>
<b>MEMORY</b> prints a message and terminates on memory allocation errors.
</li>
<li>
<b>QUICKSORT</b> uses Quicksort to sort an array.
</li>
<li>
<b>S_LEN_TRIM</b> returns the length of a string to the last nonblank.
</li>
<li>
<b>S_TO_R8</b> reads an R8 from a string.
</li>
<li>
<b>S_TO_R8VEC</b> reads an R8VEC from a string.
</li>
<li>
<b>S_WORD_COUNT</b> counts the number of "words" in a string.
</li>
<li>
<b>SUBTREE</b> ???
</li>
<li>
<b>SUPERTREE</b> ???
</li>
<li>
<b>TIMESTAMP</b> prints the current YMDHMS date as a time stamp.
</li>
<li>
<b>TRAITER</b> ???
</li>
<li>
<b>USAGE</b> prints usage information for the program.
</li>
</ul>
</p>
<p>
You can go up one level to <a href = "../cpp_src.html">
the C++ source codes</a>.
</p>
<hr>
<i>
Last revised on 12 November 2006.
</i>
<!-- John Burkardt -->
</body>
<!-- Initial HTML skeleton created by HTMLINDEX. -->
</html>