Skip to content

Commit

Permalink
Merge pull request #21 from hoytech/master
Browse files Browse the repository at this point in the history
[pull] master from hoytech:master
  • Loading branch information
kroese authored Sep 14, 2024
2 parents af59400 + 980725c commit b77f1e3
Show file tree
Hide file tree
Showing 49 changed files with 1,495 additions and 710 deletions.
54 changes: 54 additions & 0 deletions CHANGES
Original file line number Diff line number Diff line change
@@ -1,3 +1,57 @@
1.0.1
* Prevent exporting a v2 DB using --fried. The packed representation will be
corrupted. Instead, you should downgrade to 0.9.7 to do the export, or do
it without fried.
* Fix build error on some platforms that don't include stdint.h in another
header. Reported by fiatjaf.

1.0.0
* Refactored database format to use a custom PackedEvent encoding, removed some
unnecessary indices. This reduces the size of the DB and should help improve
query performance in some cases.
* Because of the above, the DB version has been increased to 3, meaning events
will need to be exported and reimported into a new DB.
* Added a special `--fried` mode for import and export that greatly (10x or more)
speeds up the process.
* Removed prefix matching on ids and authors fields. This was removed from
NIP-01. Now, you must use exactly 32 byte (64 hex character) values.
* Upgraded negentropy to protocol version 1.
* Use the C++ negentropy implementation's BTree support to cache the results
of negentropy fingerprints over arbitrary nostr queries. By default the
query {} (the full DB) is tracked, but additional queries can be added
using the new `strfry negentropy` command.
* Advertises NIP-77 support (negentropy syncing). The negentropy version is now
also indicated in the NIP-11 relay information and the HTML landing page.
* Parsing-related error messages were greatly improved. Instead of just getting
"bad msg: std::get: wrong index for variant", in most situations you will
now get a more useful message such as "first element not a command like REQ"
* Refactored some critical areas like ActiveMonitors to use a precise 32-byte
type instead of std::string. This will reduce pointer chasing and memory
usage and, more importantly, improve CPU caching.
* The perl libraries needed at compile-time are now bundled in golpe, so they
do not need to be separately installed to build strfry.
* Bugfix: When querying for 2 or more tags where one of the tags was a prefix
of the other, matching events would not be returned. Reported by mrkvon.
* Updated and re-organised the README.md docs

0.9.7
* `--fried` support from 1.0.0 was back-ported, allowing export of DBs
in a fried format for efficient import by 1.0.0+ relays.
* Bugfix: The cron thread was incorrectly removing expireable events prior
to their expiry. Reported and fixed by KoalaSat.
* A `limitation` entry is now included in the NIP-11 output. This exposes
configured relay limits such as max message size. Added by Alex Gleason.
* Node info support added: The relay now replies to requests to /nodeinfo/2.1
and /.well-known/nodeinfo . Added by Alex Gleason.
* NIP-70 support: Events with a "_" tag are considered protected and relays
should only allow them to be posted by their author. Since strfry does
not yet support AUTH, these events are always rejected. Added by fiatjaf.
* NIP-11 icon support added by zappityzap.
* Preliminary FreeBSD support added by cosmicpsyop.
* Switch import to use WriterPipeline, allowing strfry import be used as a
general-purpose non-relay event ingester. To do so, users must ensure that
the stdout of their process they pipe into import is line buffered.

0.9.6
* Bugfix: Sometimes malformed or old-format negentropy messages would throw
uncaught exceptions which would cause the relay to crash. Now it properly
Expand Down
279 changes: 218 additions & 61 deletions README.md

Large diffs are not rendered by default.

22 changes: 4 additions & 18 deletions TODO
Original file line number Diff line number Diff line change
@@ -1,23 +1,10 @@
set argv in plugin processes, propagate environment
remove lookbehind, receivedAt index
get rid of "too many tags in filter" error

1.0 release
split plugins for relay/stream
test negentropy queries stored in events
more config params in negentropy
? limit for total number of events, not just per filter

features
in sync/stream, log bytes up/down and compression ratios
"router" app, where multiple stream/sync connections handled in one process/config (the "nginx of nostr")
NIP-42 AUTH
archival mode (no deleting of events)
asynchronous plugins (multiple in flight at once)
slow-websocket connection detection and back-pressure
pre-calcuated tree negentropy XOR trees to support full-db scans (optionally limited by since/until)
? maybe just use daily/fixed-size bucketing
improve delete command
* delete by receivedAt, IP addrs, etc
* inverted filter: delete events that *don't* match the provided filter
in sync/stream, log bytes up/down and compression ratios
? NIP-45 COUNT
? less verbose default logging
? kill plugin if it times out

Expand All @@ -33,4 +20,3 @@ rate limits (maybe not needed now that we have plugins?)
misc
? periodic reaping of disconnected sockets (maybe autoping is doing this already)
? warn when run as root
docs: mention you have to run `make update-submodules` after a pull
36 changes: 36 additions & 0 deletions docs/fried.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Fried Exports

When importing events with `strfry import`, most of the CPU time is spent on JSON parsing (assuming you use `--no-verify` to disable signature verification).

In order to speed this up, the `strfry export` and `strfry import` commands accept a `--fried` parameter. This causes the exported JSON events to have a `fried` field that contains a hex-encoded dump of the corresponding `PackedEvent`. This field must always be the *last* entry in each JSON line. Other than this extra field, these exports are just regular JSONL dumps.

When importing in fried mode, no JSON parsing will be performed. Instead, the packed data will be extracted directly from the JSON and installed into the DB. The fried field will be removed so it isn't stored or sent to clients. No signature verification or other validity checks are performed.

This optimisation speeds up import about 10x and the bottleneck becomes building the LMDB indices. Because the fried data is hex-encoded, it compresses quite well with the (mostly also hex-encoded) JSON fields. After compression, the overhead of `--fried` is ~7%.

Fried *export* functionality has been back-ported to the 0.9 strfry releases, but import only exists for the 1.0 series (since the packed data format has changed).

## PackedEvent format

PackedEvent contains the minimal set of data required for indexing a nostr event:

// <offset>: <fieldName> (<size>)
//
// PackedEvent
// 0: id (32)
// 32: pubkey (32)
// 64: created_at (8)
// 72: kind (8)
// 80: expiration (8)
// 88: tags[] (variable)
//
// each tag:
// 0: tag char (1)
// 1: length (1)
// 2: value (variable)

* Only indexable (single character) tags are included
* Tag values cannot be longer than 255 octets
* `e` and `p` tags are unpacked as raw 32 bytes (so they are not double hex-encoded in fried output)
* Integers are encoded in little-endian
* An expiration of `0` means no expiration
12 changes: 3 additions & 9 deletions docs/negentropy.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ We're going to call the two sides engaged in the sync the client and the relay (
* (4) Relay calls `reconcile()` on its `Negentropy` object, and returns the results as a `NEG-MSG` answer to the client.
* (5) Client calls `reconcile()` on its `Negentropy` object using the value sent by the relay.
* If the empty string is returned, the sync is complete.
* This call will return `have` and `need` arrays, which correspond to nostr IDs (or ID prefixes, if `idSize < 32`) that should be uploaded and downloaded, respectively.
* This call will return `have` and `need` arrays, which correspond to nostr IDs that should be uploaded and downloaded, respectively.
* Otherwise, the result is sent back to the relay in another `NEG-MSG`. Goto step 4.

## Nostr Messages
Expand All @@ -27,15 +27,13 @@ We're going to call the two sides engaged in the sync the client and the relay (
[
"NEG-OPEN",
<subscription ID string>,
<nostr filter or event ID>,
<idSize>,
<nostr filter>,
<initialMessage, lowercase hex-encoded>
]
```

* The subscription ID is used by each side to identify which query a message refers to. It only needs to be long enough to distinguish it from any other concurrent NEG requests on this websocket connection (an integer that increments once per `NEG-OPEN` is fine). If a `NEG-OPEN` is issued for a currently open subscription ID, the existing subscription is first closed.
* The nostr filter is as described in [NIP-01](https://github.com/nostr-protocol/nips/blob/master/01.md), or is an event ID whose `content` contains the JSON-encoded filter/array of filters.
* `idSize` indicates the truncation byte size for IDs. It should be an integer between 8 and 32, inclusive. Smaller values will reduce the amount of bandwidth used, but increase the chance of a collision. 16 is a good default.
* The nostr filter is as described in [NIP-01](https://github.com/nostr-protocol/nips/blob/master/01.md).
* `initialMessage` is the string returned by `initiate()`, hex-encoded.

### Error message (relay to client):
Expand All @@ -57,10 +55,6 @@ Current reason codes are:
* The maximum number of records that can be processed can optionally be returned as the 4th element in the response
* `CLOSED`
* Because the `NEG-OPEN` queries are stateful, relays may choose to time-out inactive queries to recover memory resources
* `FILTER_NOT_FOUND`
* If an event ID is used as the filter, this error will be returned if the relay does not have this event. The client should retry with the full filter, or upload the event to the relay.
* `FILTER_INVALID`
* The event's `content` was not valid JSON, or the filter was invalid for some other reason.

After a `NEG-ERR` is issued, the subscription is considered to be closed.

Expand Down
6 changes: 4 additions & 2 deletions docs/plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ In order to reduce complexity, strfry's design attempts to keep policy logic out

A plugin can be implemented in any programming language that supports reading lines from stdin, decoding JSON, and printing JSON to stdout. If a plugin is installed, strfry will send the event (along with some other information like IP address) to the plugin over stdin. The plugin should then decide what to do with it and print out a JSON object containing this decision.

Currently strfry always waits until it receives a response from a plugin before sending another request. In the future, multiple requests may be sent concurrently, which is why output messages must include the event ID.

The plugin command can be any shell command, which lets you set environment variables, command-line switches, etc. If the plugin command contains no spaces, it is assumed to be a path to a script. In this case, whenever the script's modification-time changes, the plugin will be reloaded upon the next write attempt.

If the plugin's command in `strfry.conf` (or a router config file) change, then the plugin will also be reloaded.
Expand All @@ -20,8 +22,8 @@ Input messages contain the following keys:
* `type`: Currently always `new`
* `event`: The event posted by the client, with all the required fields such as `id`, `pubkey`, etc
* `receivedAt`: Unix timestamp of when this event was received by the relay
* `sourceType`: The channel where this event came from: `IP4`, `IP6`, `Import`, `Stream`, or `Sync`.
* `sourceInfo`: Specifics of the event's source. Either an IP address or a relay URL (for stream/sync)
* `sourceType`: The channel where this event came from: `IP4`, `IP6`, `Import`, `Stream`, `Sync`, or `Stored`.
* `sourceInfo`: Specifics of the event's source. Usually an IP address.


## Output messages
Expand Down
28 changes: 0 additions & 28 deletions fbs/nostr-index.fbs

This file was deleted.

2 changes: 1 addition & 1 deletion golpe
Submodule golpe updated 2 files
+1 −1 external/rasgueadb
+2 −1 golpe.h.tt
60 changes: 22 additions & 38 deletions golpe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,42 +5,28 @@ features:
onAppStartup: true
db: true
customLMDBSetup: true
flatbuffers: true
websockets: true
templar: true

flatBuffers: |
include "../fbs/nostr-index.fbs";
includes: |
inline std::string_view sv(const NostrIndex::Fixed32Bytes *f) {
return std::string_view((const char *)f->val()->data(), 32);
}
#include "PackedEvent.h"
tables:
## DB meta-data. Single entry, with id = 1
Meta:
fields:
- name: dbVersion
- name: endianness
- name: negentropyModificationCounter

## Meta-info of nostr events, suitable for indexing
## Primary key is auto-incremented, called "levId" for Local EVent ID
Event:
fields:
- name: receivedAt # microseconds
- name: flat
type: ubytes
nestedFlat: NostrIndex.Event
- name: sourceType
- name: sourceInfo
type: ubytes
opaque: true

indices:
created_at:
integer: true
receivedAt:
integer: true
id:
comparator: StringUint64
pubkey:
Expand All @@ -61,43 +47,41 @@ tables:
multi: true

indexPrelude: |
auto *flat = v.flat_nested();
created_at = flat->created_at();
PackedEventView packed(v.buf);
created_at = packed.created_at();
uint64_t indexTime = *created_at;
receivedAt = v.receivedAt();
id = makeKey_StringUint64(sv(flat->id()), indexTime);
pubkey = makeKey_StringUint64(sv(flat->pubkey()), indexTime);
kind = makeKey_Uint64Uint64(flat->kind(), indexTime);
pubkeyKind = makeKey_StringUint64Uint64(sv(flat->pubkey()), flat->kind(), indexTime);
for (const auto &tagPair : *(flat->tagsGeneral())) {
auto tagName = (char)tagPair->key();
auto tagVal = sv(tagPair->val());
id = makeKey_StringUint64(packed.id(), indexTime);
pubkey = makeKey_StringUint64(packed.pubkey(), indexTime);
kind = makeKey_Uint64Uint64(packed.kind(), indexTime);
pubkeyKind = makeKey_StringUint64Uint64(packed.pubkey(), packed.kind(), indexTime);
packed.foreachTag([&](char tagName, std::string_view tagVal){
tag.push_back(makeKey_StringUint64(std::string(1, tagName) + std::string(tagVal), indexTime));
if (tagName == 'd' && replace.size() == 0) {
replace.push_back(makeKey_StringUint64(std::string(sv(flat->pubkey())) + std::string(tagVal), flat->kind()));
replace.push_back(makeKey_StringUint64(std::string(packed.pubkey()) + std::string(tagVal), packed.kind()));
} else if (tagName == 'e' && packed.kind() == 5) {
deletion.push_back(std::string(tagVal) + std::string(packed.pubkey()));
}
}
for (const auto &tagPair : *(flat->tagsFixed32())) {
auto tagName = (char)tagPair->key();
auto tagVal = sv(tagPair->val());
tag.push_back(makeKey_StringUint64(std::string(1, tagName) + std::string(tagVal), indexTime));
if (flat->kind() == 5 && tagName == 'e') deletion.push_back(std::string(tagVal) + std::string(sv(flat->pubkey())));
}
return true;
});
if (flat->expiration() != 0) {
expiration.push_back(flat->expiration());
if (packed.expiration() != 0) {
expiration.push_back(packed.expiration());
}
CompressionDictionary:
fields:
- name: dict
type: ubytes

NegentropyFilter:
fields:
- name: filter
type: string

tablesRaw:
## Raw nostr event JSON, possibly compressed
## keys are levIds
Expand Down
Loading

0 comments on commit b77f1e3

Please sign in to comment.