Merge pull request #21 from hoytech/master

[pull] master from hoytech:master
dockur · Sep 14, 2024 · b77f1e3 · b77f1e3
2 parents af59400 + 980725c
commit b77f1e3
Show file tree

Hide file tree

Showing 49 changed files with 1,495 additions and 710 deletions.
diff --git a/CHANGES b/CHANGES
@@ -1,3 +1,57 @@
+1.0.1
+  * Prevent exporting a v2 DB using --fried. The packed representation will be
+    corrupted. Instead, you should downgrade to 0.9.7 to do the export, or do
+    it without fried.
+  * Fix build error on some platforms that don't include stdint.h in another
+    header. Reported by fiatjaf.
+
+1.0.0
+  * Refactored database format to use a custom PackedEvent encoding, removed some
+    unnecessary indices. This reduces the size of the DB and should help improve
+    query performance in some cases.
+  * Because of the above, the DB version has been increased to 3, meaning events
+    will need to be exported and reimported into a new DB.
+  * Added a special `--fried` mode for import and export that greatly (10x or more)
+    speeds up the process.
+  * Removed prefix matching on ids and authors fields. This was removed from
+    NIP-01. Now, you must use exactly 32 byte (64 hex character) values.
+  * Upgraded negentropy to protocol version 1.
+  * Use the C++ negentropy implementation's BTree support to cache the results
+    of negentropy fingerprints over arbitrary nostr queries. By default the
+    query {} (the full DB) is tracked, but additional queries can be added
+    using the new `strfry negentropy` command.
+  * Advertises NIP-77 support (negentropy syncing). The negentropy version is now
+    also indicated in the NIP-11 relay information and the HTML landing page.
+  * Parsing-related error messages were greatly improved. Instead of just getting
+    "bad msg: std::get: wrong index for variant", in most situations you will
+    now get a more useful message such as "first element not a command like REQ"
+  * Refactored some critical areas like ActiveMonitors to use a precise 32-byte
+    type instead of std::string. This will reduce pointer chasing and memory
+    usage and, more importantly, improve CPU caching.
+  * The perl libraries needed at compile-time are now bundled in golpe, so they
+    do not need to be separately installed to build strfry.
+  * Bugfix: When querying for 2 or more tags where one of the tags was a prefix
+    of the other, matching events would not be returned. Reported by mrkvon. 
+  * Updated and re-organised the README.md docs
+
+0.9.7
+  * `--fried` support from 1.0.0 was back-ported, allowing export of DBs
+    in a fried format for efficient import by 1.0.0+ relays.
+  * Bugfix: The cron thread was incorrectly removing expireable events prior
+    to their expiry. Reported and fixed by KoalaSat.
+  * A `limitation` entry is now included in the NIP-11 output. This exposes
+    configured relay limits such as max message size. Added by Alex Gleason.
+  * Node info support added: The relay now replies to requests to /nodeinfo/2.1
+    and /.well-known/nodeinfo . Added by Alex Gleason.
+  * NIP-70 support: Events with a "_" tag are considered protected and relays
+    should only allow them to be posted by their author. Since strfry does
+    not yet support AUTH, these events are always rejected. Added by fiatjaf.
+  * NIP-11 icon support added by zappityzap.
+  * Preliminary FreeBSD support added by cosmicpsyop.
+  * Switch import to use WriterPipeline, allowing strfry import be used as a
+    general-purpose non-relay event ingester. To do so, users must ensure that
+    the stdout of their process they pipe into import is line buffered.
+
 0.9.6
   * Bugfix: Sometimes malformed or old-format negentropy messages would throw
     uncaught exceptions which would cause the relay to crash. Now it properly

diff --git a/README.md b/README.md
diff --git a/TODO b/TODO
@@ -1,23 +1,10 @@
-set argv in plugin processes, propagate environment
-remove lookbehind, receivedAt index
-get rid of "too many tags in filter" error
-
-1.0 release
-  split plugins for relay/stream
-  test negentropy queries stored in events
-  more config params in negentropy
-    ? limit for total number of events, not just per filter
-
 features
-  in sync/stream, log bytes up/down and compression ratios
-  "router" app, where multiple stream/sync connections handled in one process/config (the "nginx of nostr")
   NIP-42 AUTH
+  archival mode (no deleting of events)
+  asynchronous plugins (multiple in flight at once)
   slow-websocket connection detection and back-pressure
-  pre-calcuated tree negentropy XOR trees to support full-db scans (optionally limited by since/until)
-    ? maybe just use daily/fixed-size bucketing
-  improve delete command
-    * delete by receivedAt, IP addrs, etc
-    * inverted filter: delete events that *don't* match the provided filter
+  in sync/stream, log bytes up/down and compression ratios
+  ? NIP-45 COUNT
   ? less verbose default logging
   ? kill plugin if it times out
 
@@ -33,4 +20,3 @@ rate limits (maybe not needed now that we have plugins?)
 misc
   ? periodic reaping of disconnected sockets (maybe autoping is doing this already)
   ? warn when run as root
-  docs: mention you have to run `make update-submodules` after a pull
diff --git a/docs/fried.md b/docs/fried.md
@@ -0,0 +1,36 @@
+# Fried Exports
+
+When importing events with `strfry import`, most of the CPU time is spent on JSON parsing (assuming you use `--no-verify` to disable signature verification).
+
+In order to speed this up, the `strfry export` and `strfry import` commands accept a `--fried` parameter. This causes the exported JSON events to have a `fried` field that contains a hex-encoded dump of the corresponding `PackedEvent`. This field must always be the *last* entry in each JSON line. Other than this extra field, these exports are just regular JSONL dumps.
+
+When importing in fried mode, no JSON parsing will be performed. Instead, the packed data will be extracted directly from the JSON and installed into the DB. The fried field will be removed so it isn't stored or sent to clients. No signature verification or other validity checks are performed.
+
+This optimisation speeds up import about 10x and the bottleneck becomes building the LMDB indices. Because the fried data is hex-encoded, it compresses quite well with the (mostly also hex-encoded) JSON fields. After compression, the overhead of `--fried` is ~7%.
+
+Fried *export* functionality has been back-ported to the 0.9 strfry releases, but import only exists for the 1.0 series (since the packed data format has changed).
+
+## PackedEvent format
+
+PackedEvent contains the minimal set of data required for indexing a nostr event:
+
+    // <offset>: <fieldName> (<size>)
+    //
+    // PackedEvent
+    //   0: id (32)
+    //  32: pubkey (32)
+    //  64: created_at (8)
+    //  72: kind (8)
+    //  80: expiration (8)
+    //  88: tags[] (variable)
+    //
+    // each tag:
+    //   0: tag char (1)
+    //   1: length (1)
+    //   2: value (variable)
+
+* Only indexable (single character) tags are included
+* Tag values cannot be longer than 255 octets
+* `e` and `p` tags are unpacked as raw 32 bytes (so they are not double hex-encoded in fried output)
+* Integers are encoded in little-endian
+* An expiration of `0` means no expiration
diff --git a/docs/negentropy.md b/docs/negentropy.md
@@ -16,7 +16,7 @@ We're going to call the two sides engaged in the sync the client and the relay (
 * (4) Relay calls `reconcile()` on its `Negentropy` object, and returns the results as a `NEG-MSG` answer to the client.
 * (5) Client calls `reconcile()` on its `Negentropy` object using the value sent by the relay.
   * If the empty string is returned, the sync is complete.
-  * This call will return `have` and `need` arrays, which correspond to nostr IDs (or ID prefixes, if `idSize < 32`) that should be uploaded and downloaded, respectively.
+  * This call will return `have` and `need` arrays, which correspond to nostr IDs that should be uploaded and downloaded, respectively.
   * Otherwise, the result is sent back to the relay in another `NEG-MSG`. Goto step 4.
 
 ## Nostr Messages
@@ -27,15 +27,13 @@ We're going to call the two sides engaged in the sync the client and the relay (
 [
     "NEG-OPEN",
     <subscription ID string>,
-    <nostr filter or event ID>,
-    <idSize>,
+    <nostr filter>,
     <initialMessage, lowercase hex-encoded>
 ]
 ```
 
 * The subscription ID is used by each side to identify which query a message refers to. It only needs to be long enough to distinguish it from any other concurrent NEG requests on this websocket connection (an integer that increments once per `NEG-OPEN` is fine). If a `NEG-OPEN` is issued for a currently open subscription ID, the existing subscription is first closed.
-* The nostr filter is as described in [NIP-01](https://github.com/nostr-protocol/nips/blob/master/01.md), or is an event ID whose `content` contains the JSON-encoded filter/array of filters.
-* `idSize` indicates the truncation byte size for IDs. It should be an integer between 8 and 32, inclusive. Smaller values will reduce the amount of bandwidth used, but increase the chance of a collision. 16 is a good default.
+* The nostr filter is as described in [NIP-01](https://github.com/nostr-protocol/nips/blob/master/01.md).
 * `initialMessage` is the string returned by `initiate()`, hex-encoded.
 
 ### Error message (relay to client):
@@ -57,10 +55,6 @@ Current reason codes are:
   * The maximum number of records that can be processed can optionally be returned as the 4th element in the response
 * `CLOSED`
   * Because the `NEG-OPEN` queries are stateful, relays may choose to time-out inactive queries to recover memory resources
-* `FILTER_NOT_FOUND`
-  * If an event ID is used as the filter, this error will be returned if the relay does not have this event. The client should retry with the full filter, or upload the event to the relay.
-* `FILTER_INVALID`
-  * The event's `content` was not valid JSON, or the filter was invalid for some other reason.
 
 After a `NEG-ERR` is issued, the subscription is considered to be closed.
 

diff --git a/docs/plugins.md b/docs/plugins.md
@@ -8,6 +8,8 @@ In order to reduce complexity, strfry's design attempts to keep policy logic out
 
 A plugin can be implemented in any programming language that supports reading lines from stdin, decoding JSON, and printing JSON to stdout. If a plugin is installed, strfry will send the event (along with some other information like IP address) to the plugin over stdin. The plugin should then decide what to do with it and print out a JSON object containing this decision.
 
+Currently strfry always waits until it receives a response from a plugin before sending another request. In the future, multiple requests may be sent concurrently, which is why output messages must include the event ID.
+
 The plugin command can be any shell command, which lets you set environment variables, command-line switches, etc. If the plugin command contains no spaces, it is assumed to be a path to a script. In this case, whenever the script's modification-time changes, the plugin will be reloaded upon the next write attempt.
 
 If the plugin's command in `strfry.conf` (or a router config file) change, then the plugin will also be reloaded.
@@ -20,8 +22,8 @@ Input messages contain the following keys:
 * `type`: Currently always `new`
 * `event`: The event posted by the client, with all the required fields such as `id`, `pubkey`, etc
 * `receivedAt`: Unix timestamp of when this event was received by the relay
-* `sourceType`: The channel where this event came from: `IP4`, `IP6`, `Import`, `Stream`, or `Sync`.
-* `sourceInfo`: Specifics of the event's source. Either an IP address or a relay URL (for stream/sync)
+* `sourceType`: The channel where this event came from: `IP4`, `IP6`, `Import`, `Stream`, `Sync`, or `Stored`.
+* `sourceInfo`: Specifics of the event's source. Usually an IP address.
 
 
 ## Output messages

diff --git a/external/negentropy b/external/negentropy
diff --git a/fbs/nostr-index.fbs b/fbs/nostr-index.fbs
diff --git a/golpe b/golpe
diff --git a/golpe.yaml b/golpe.yaml
@@ -5,42 +5,28 @@ features:
     onAppStartup: true
     db: true
     customLMDBSetup: true
-    flatbuffers: true
     websockets: true
     templar: true
 
-flatBuffers: |
-    include "../fbs/nostr-index.fbs";
-
 includes: |
-    inline std::string_view sv(const NostrIndex::Fixed32Bytes *f) {
-        return std::string_view((const char *)f->val()->data(), 32);
-    }
+    #include "PackedEvent.h"
 
 tables:
   ## DB meta-data. Single entry, with id = 1
   Meta:
     fields:
       - name: dbVersion
       - name: endianness
+      - name: negentropyModificationCounter
 
   ## Meta-info of nostr events, suitable for indexing
   ## Primary key is auto-incremented, called "levId" for Local EVent ID
   Event:
-    fields:
-      - name: receivedAt # microseconds
-      - name: flat
-        type: ubytes
-        nestedFlat: NostrIndex.Event
-      - name: sourceType
-      - name: sourceInfo
-        type: ubytes
+    opaque: true
 
     indices:
       created_at:
         integer: true
-      receivedAt:
-        integer: true
       id:
         comparator: StringUint64
       pubkey:
@@ -61,43 +47,41 @@ tables:
         multi: true
 
     indexPrelude: |
-        auto *flat = v.flat_nested();
-        created_at = flat->created_at();
+        PackedEventView packed(v.buf);
+        created_at = packed.created_at();
         uint64_t indexTime = *created_at;
-        receivedAt = v.receivedAt();
-
-        id = makeKey_StringUint64(sv(flat->id()), indexTime);
-        pubkey = makeKey_StringUint64(sv(flat->pubkey()), indexTime);
-        kind = makeKey_Uint64Uint64(flat->kind(), indexTime);
-        pubkeyKind = makeKey_StringUint64Uint64(sv(flat->pubkey()), flat->kind(), indexTime);
 
-        for (const auto &tagPair : *(flat->tagsGeneral())) {
-            auto tagName = (char)tagPair->key();
-            auto tagVal = sv(tagPair->val());
+        id = makeKey_StringUint64(packed.id(), indexTime);
+        pubkey = makeKey_StringUint64(packed.pubkey(), indexTime);
+        kind = makeKey_Uint64Uint64(packed.kind(), indexTime);
+        pubkeyKind = makeKey_StringUint64Uint64(packed.pubkey(), packed.kind(), indexTime);
 
+        packed.foreachTag([&](char tagName, std::string_view tagVal){
             tag.push_back(makeKey_StringUint64(std::string(1, tagName) + std::string(tagVal), indexTime));
 
             if (tagName == 'd' && replace.size() == 0) {
-                replace.push_back(makeKey_StringUint64(std::string(sv(flat->pubkey())) + std::string(tagVal), flat->kind()));
+                replace.push_back(makeKey_StringUint64(std::string(packed.pubkey()) + std::string(tagVal), packed.kind()));
+            } else if (tagName == 'e' && packed.kind() == 5) {
+                deletion.push_back(std::string(tagVal) + std::string(packed.pubkey()));
             }
-        }
 
-        for (const auto &tagPair : *(flat->tagsFixed32())) {
-            auto tagName = (char)tagPair->key();
-            auto tagVal = sv(tagPair->val());
-            tag.push_back(makeKey_StringUint64(std::string(1, tagName) + std::string(tagVal), indexTime));
-            if (flat->kind() == 5 && tagName == 'e') deletion.push_back(std::string(tagVal) + std::string(sv(flat->pubkey())));
-        }
+            return true;
+        });
 
-        if (flat->expiration() != 0) {
-            expiration.push_back(flat->expiration());
+        if (packed.expiration() != 0) {
+            expiration.push_back(packed.expiration());
         }
 
   CompressionDictionary:
     fields:
       - name: dict
         type: ubytes
 
+  NegentropyFilter:
+    fields:
+      - name: filter
+        type: string
+
 tablesRaw:
   ## Raw nostr event JSON, possibly compressed
   ## keys are levIds
+3 −0		.gitmodules
+91 −187		README.md
+9 −1		TODO
+0 −507		cpp/Negentropy.h
+141 −0		cpp/README.md
+324 −0		cpp/negentropy.h
+62 −0		cpp/negentropy/encoding.h
+146 −0		cpp/negentropy/storage/BTreeLMDB.h
+48 −0		cpp/negentropy/storage/BTreeMem.h
+63 −0		cpp/negentropy/storage/SubRange.h
+88 −0		cpp/negentropy/storage/Vector.h
+22 −0		cpp/negentropy/storage/base.h
+652 −0		cpp/negentropy/storage/btree/core.h
+189 −0		cpp/negentropy/storage/btree/debug.h
+184 −0		cpp/negentropy/types.h
+341 −296		js/Negentropy.js
+54 −0		js/README.md
+2 −0		test/Utils.pm
+6 −0		test/cpp/.gitignore
+30 −2		test/cpp/Makefile
+149 −0		test/cpp/btreeFuzz.cpp
+9 −0		test/cpp/check.sh
+15 −12		test/cpp/harness.cpp
+157 −0		test/cpp/lmdbTest.cpp
+1 −0		test/cpp/lmdbxx
+78 −0		test/cpp/measureSpaceUsage.cpp
+9 −0		test/cpp/measureSpaceUsage.pl
+165 −0		test/cpp/subRange.cpp
+88 −20		test/fuzz.pl
+129 −0		test/go/harness.go
+6 −6		test/js/harness.js
+11 −14		test/protoversion.pl
+10 −1		test/test.pl