Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new instructions for spatial join evaluation #10

Open
wants to merge 16 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ jobs:
- name: update apt
run: sudo apt update
- name: install dependencies
run: sudo apt install -y cmake gcc g++
run: sudo apt install -y cmake gcc g++ libbz2-dev
- name: cmake
run: mkdir build && cd build && cmake ..
- name: make
Expand All @@ -29,7 +29,7 @@ jobs:
- name: update apt
run: sudo apt update
- name: install dependencies
run: sudo apt install -y cmake gcc g++
run: sudo apt install -y cmake gcc g++ libbz2-dev
- name: cmake
run: mkdir build && cd build && cmake ..
- name: make
Expand All @@ -46,7 +46,7 @@ jobs:
- name: update apt
run: sudo apt update
- name: install dependencies
run: sudo apt install -y cmake clang
run: sudo apt install -y cmake clang libbz2-dev
- name: cmake
run: mkdir build && cd build && cmake ..
shell: bash
Expand All @@ -67,7 +67,7 @@ jobs:
- name: update apt
run: sudo apt update
- name: install dependencies
run: sudo apt install -y cmake clang
run: sudo apt install -y cmake clang libbz2-dev
- name: cmake
run: mkdir build && cd build && cmake ..
shell: bash
Expand All @@ -86,7 +86,7 @@ jobs:
- name: Checkout submodules
run: git submodule update --init --recursive
- name: install dependencies
run: brew install cmake
run: brew install cmake lbzip2
- name: cmake
run: mkdir build && cd build && cmake ..
- name: make
Expand All @@ -101,7 +101,7 @@ jobs:
- name: Checkout submodules
run: git submodule update --init --recursive
- name: install dependencies
run: brew install cmake
run: brew install cmake lbzip2
- name: cmake
run: mkdir build && cd build && cmake ..
- name: make
Expand Down
206 changes: 206 additions & 0 deletions evaluation/Makefile

Large diffs are not rendered by default.

87 changes: 87 additions & 0 deletions evaluation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Evaluation instructions and results

We evaluated the performance of our spatial join and compared it against
PostgreSQL+PostGIS. In the following sections, we provide instructions and
results for the evaluation.

## Setup PostgreSQL and PostGIS on Ubuntu 24.04

Install the required packages:


```
sudo apt update
sudo apt install postgresql postgresql-contrib postgis postgresql-16-postgis-3 gdal-bin
```

Next, create a new database storage in a directory of your choice.
```
export POSTGIS_DIR=/local/data-ssd/postgis/spatialjoin
sudo mkdir -p ${POSTGIS_DIR} && sudo chown postgres:postgres ${POSTGIS_DIR}
sudo -u postgres /usr/lib/postgresql/16/bin/initdb -D ${POSTGIS_DIR}
sudo vim ${POSTGIS_DIR}/postgresql.conf
```
In the file `${POSTGIS_DIR}/postgresql.conf`, set the following:
```
work_mem = 4MB
max_worker_processes = 8
max_parallel_workers_per_gather = 2
max_parallel_workers = 8
```
Afterwards, restart Postgres with the selected database storage directory:

```
sudo su - postgres -c "/usr/lib/postgresql/16/bin/pg_ctl -D ${POSTGIS_DIR} -l logfile start"
```

## Create a database

Create a database `spatialjoin_db` and enable PostGIS.
```
sudo su - postgres -c "createdb spatialjoin_db"
psql -U postgres -d spatialjoin_db -c "CREATE EXTENSION postgis;"
```

## Install spatialjoin

Build the `spatialjoin` executable in this repository and include it in the `PATH`:

```
mkdir build && cd build
cmake ..
make -j
cd ..
export PATH=PATH:$(pwd)/build
```

## Run full evaluation using the provided Makefile

First, check if the PostgreSQL, PostGIS and spatialjoin installation works as expected:

```
make check
```

Afterwards, create the tables required for the evaluation. This will take a while. Note that this will completely rebuild *all* tables every time.

```
make tables
```

Finally, run the complete evaluation:

```
make eval
```

You can change individual configuration parameters (listed on the top section of the Makefile) by setting them explicity, e.g. `make POSTGRES_USER=patrick POSTGRES_DB=eval tables`.

## Run individual evaluations using the provided Makefile

Run

```
make help
```

to get a list of available target.
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ def compute(args: argparse.Namespace):

# The command line for this combination.
cmd = (f"cat {args.basename}.spatialjoin-input.tsv |"
f" spatialjoin{sweep_mode} {combination}")
f" {args.spatialjoin}{sweep_mode} {combination}")

# Optionally, generate RDF output.
if args.rdf_output:
Expand Down Expand Up @@ -132,12 +132,13 @@ def compute(args: argparse.Namespace):
parse_time = "[not found]"
sweep_time = "[not found]"
for line in result.stderr.decode().split("\n"):
match = re.match(".*INFO : done \\(([0-9.]+)s\\)\\.", line)
match = re.match(".*INFO : Done parsing \\(([0-9.]+)s\\)\\.", line)
if match:
if parse_time == "[not found]":
parse_time = f"{float(match.group(1)):.3f}"
elif sweep_time == "[not found]":
sweep_time = f"{float(match.group(1)):.3f}"
parse_time = f"{float(match.group(1)):.3f}"

match = re.match(".*INFO : Done sweeping \\(([0-9.]+)s\\)\\.", line)
if match:
sweep_time = f"{float(match.group(1)):.3f}"

print(f"{name}\t{total_time}\t{parse_time}\t{sweep_time}", flush=True)

Expand Down Expand Up @@ -340,6 +341,9 @@ def sort_key(pair):
parser.add_argument("--minutes",
action="store_true", default=False,
help="Show times in minutes instead of seconds")
parser.add_argument("--spatialjoin",
default="spatialjoin",
help="spatialjoin executable")
argcomplete.autocomplete(parser, always_complete_options="long")
args = parser.parse_args()

Expand Down
7 changes: 7 additions & 0 deletions src/spatialjoin/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@ include_directories(
${ZLIB_INCLUDE_DIRS}
)


configure_file (
"_config.h.in"
"_config.h"
)


add_executable(spatialjoin ${spatialjoin_main})
add_library(spatialjoin-dev ${SPATIALJOIN_SRC})

Expand Down
37 changes: 30 additions & 7 deletions src/spatialjoin/SpatialJoinMain.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@

#include <iostream>

#include "BoxIds.h"
#include "Sweeper.h"
#include "WKTParse.h"
#include "spatialjoin/BoxIds.h"
#include "spatialjoin/Sweeper.h"
#include "spatialjoin/WKTParse.h"
#include "spatialjoin/_config.h"
#include "util/Misc.h"
#include "util/geo/Geo.h"
#include "util/log/Log.h"
Expand All @@ -32,6 +33,8 @@ void printHelp(int argc, char** argv) {
UNUSED(argc);
std::cout
<< "\n"
<< VERSION_FULL << "\n(built " << __DATE__ << " " << __TIME__
<< ")\n\n"
<< "(C) 2023-" << YEAR << " " << COPY << "\n"
<< "Authors: " << AUTHORS << "\n\n"
<< "Usage: " << argv[0] << " [--help] [-h] <input>\n\n"
Expand Down Expand Up @@ -127,6 +130,7 @@ int main(int argc, char** argv) {
bool noGeometryChecks = false;

bool preSortCache = false;
bool printStats = false;

size_t numThreads = NUM_THREADS;
size_t numCaches = NUM_THREADS;
Expand Down Expand Up @@ -183,6 +187,13 @@ int main(int argc, char** argv) {
useInnerOuter = true;
} else if (cur == "--pre-sort-cache") {
preSortCache = true;
} else if (cur == "--print-stats") {
printStats = true;
} else if (cur == "--version") {
std::cout
<< "spatialjoin " << VERSION_FULL << " (built " << __DATE__ << " " << __TIME__
<< ")\n";
exit(0);
} else {
std::cerr << "Unknown option '" << cur << "', see -h" << std::endl;
exit(1);
Expand Down Expand Up @@ -249,6 +260,10 @@ int main(int argc, char** argv) {
std::string dangling;
size_t gid = 1;

std::function<void(const std::string&)> statsCb;

if (printStats) statsCb = [](const std::string& s) { std::cerr << s; };

Sweeper sweeper({numThreads,
numCaches,
prefix,
Expand All @@ -270,7 +285,7 @@ int main(int argc, char** argv) {
noGeometryChecks,
{},
[](const std::string& s) { LOGTO(INFO, std::cerr) << s; },
[](const std::string& s) { std::cerr << s; },
statsCb,
{}},
cache, output);

Expand All @@ -288,25 +303,33 @@ int main(int argc, char** argv) {

// end event
jobs.add({});

LOGTO(INFO, std::cerr) << "Done parsing (" << TOOK(ts) / 1000000000.0 << "s).";

// wait for all workers to finish
for (auto& thr : thrds) thr.join();

auto genTs = TIME();

LOGTO(INFO, std::cerr) << "Sorting sweep events...";

sweeper.flush();

LOGTO(INFO, std::cerr) << "done (" << TOOK(ts) / 1000000000.0 << "s).";
LOGTO(INFO, std::cerr) << "Done sorting sweep events (" << TOOK(ts) / 1000000000.0 << "s).";

if (preSortCache) {
ts = TIME();
LOGTO(INFO, std::cerr) << "Pre-sorting cache...";
sweeper.sortCache();
sweeper.flush();
LOGTO(INFO, std::cerr) << "done (" << TOOK(ts) / 1000000000.0 << "s).";
LOGTO(INFO, std::cerr) << "Done pre-sorting cache (" << TOOK(ts) / 1000000000.0 << "s).";
}

LOGTO(INFO, std::cerr) << "Sweeping...";
ts = TIME();
sweeper.sweep();
LOGTO(INFO, std::cerr) << "done (" << TOOK(ts) / 1000000000.0 << "s).";
LOGTO(INFO, std::cerr) << "Done sweeping (" << TOOK(ts) / 1000000000.0 << "s).";
LOGTO(INFO, std::cerr) << "Total predicate generation time (without parsing): " << TOOK(genTs) / 1000000000.0 << "s";

delete[] buf;
}
12 changes: 4 additions & 8 deletions src/spatialjoin/Sweeper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@
#include <set>
#include <sstream>

#include "BoxIds.h"
#include "InnerOuter.h"
#include "IntervalIdx.h"
#include "Sweeper.h"
#include "spatialjoin/BoxIds.h"
#include "spatialjoin/InnerOuter.h"
#include "spatialjoin/IntervalIdx.h"
#include "spatialjoin/Sweeper.h"
#include "util/Misc.h"
#include "util/log/Log.h"

Expand Down Expand Up @@ -851,8 +851,6 @@ void Sweeper::flush() {
_lineCache.flush();
_simpleLineCache.flush();

log("Sorting events...");

std::string newFName = util::getTmpFName(_cache, ".spatialjoin", "sorttmp");
int newFile = open(newFName.c_str(), O_RDWR | O_CREAT, 0666);
unlink(newFName.c_str());
Expand Down Expand Up @@ -883,8 +881,6 @@ void Sweeper::flush() {
#ifdef __unix__
posix_fadvise(_file, 0, 0, POSIX_FADV_SEQUENTIAL);
#endif

log("...done");
}

// _____________________________________________________________________________
Expand Down
6 changes: 3 additions & 3 deletions src/spatialjoin/Sweeper.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@
#include <unordered_map>
#include <unordered_set>

#include "GeometryCache.h"
#include "IntervalIdx.h"
#include "Stats.h"
#include "spatialjoin/GeometryCache.h"
#include "spatialjoin/IntervalIdx.h"
#include "spatialjoin/Stats.h"
#include "util/JobQueue.h"
#include "util/geo/Geo.h"

Expand Down
13 changes: 13 additions & 0 deletions src/spatialjoin/_config.h.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
// Copyright 2025
// Author: Patrick Brosi

#ifndef SRC_SPATIALJOIN_CONFIG_H_
#define SRC_SPATIALJOIN_CONFIG_H_

// version number from cmake version module
#define VERSION_FULL "@VERSION_GIT_FULL@"

// version number from cmake version module
#define INSTALL_PREFIX "@CMAKE_INSTALL_PREFIX@"

#endif // SRC_SPATIALJOIN_CONFIG_H_N
Loading