Your task is to create a Java/Python program which reads a text file where each line contains the coordinates of a multidimensional point, and then looks for the closest pair of points in the file. If the program has found the closest pair of points, it should output the line numbers of the two closest points.
The text file contains one point per line. The coordinate values are separated by a tabulator character. The coordinate values are not necessarily integers. In case of a floating point coordinate value the decimal separator is a period.
Git, Maven 3, JDK 1.8.
After fetching the source (git clone https://github.com/torzsmokus/closest-pair.git
), run mvn clean install
for building, mvn test
for testing and java -jar target/closest-pair-*.jar
to run the program.
If no input file is specified as an argument, the program looks for a file named points.tsv
, first in the working directory, then on the classpath (there is a default file in the jar with that name).
The specified output is printed to stdout while some debugging information appears on stderr.
The current implementation is a naïve one (compares each point once with every other). It takes O(dn²) time (for d dimensions and n sized input) asimptotically. Nevertheless, it still runs pretty fast (under a second) with the given test data.
The theoretical minimum is Ω(n log n) because the Element Uniqueness problem can be reduced to this, which is known to have that limit. (The reduction is quite trivial: if the closest pair has δ=0, the elements aren’t unique.)
It can be shown that an O(n log n) solution is possible but far from trivial.
I am currently working on a divide-and-conquer implementation that takes O( n (log n)^(d–1) ) time.