From 071ca3b2d24fffdd843fc04d64a61b949d7c832c Mon Sep 17 00:00:00 2001 From: John Adler Date: Wed, 17 Mar 2021 08:43:45 -0400 Subject: [PATCH 1/7] Add namespace ID to NMT leaf hash. --- specs/data_structures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index ebb2ef4..de7fa04 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -330,7 +330,7 @@ For leaf node `node` of [share](#share) data `d`: ```C++ node.n_min = d.namespaceID node.n_max = d.namespaceID -node.v = h(0x00, d.rawData) +node.v = h(0x00, d.namespaceID, d.rawData) ``` The `namespaceID` message field here is the namespace ID of the leaf, which is a [`NAMESPACE_ID_BYTES`](consensus.md#system-parameters)-long byte array. From f229ebb299a4c2bcf54e05e155ac322794609678 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 18 Mar 2021 11:40:58 -0400 Subject: [PATCH 2/7] Revert NMT hashing nid into leaf hash. --- specs/data_structures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index de7fa04..ebb2ef4 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -330,7 +330,7 @@ For leaf node `node` of [share](#share) data `d`: ```C++ node.n_min = d.namespaceID node.n_max = d.namespaceID -node.v = h(0x00, d.namespaceID, d.rawData) +node.v = h(0x00, d.rawData) ``` The `namespaceID` message field here is the namespace ID of the leaf, which is a [`NAMESPACE_ID_BYTES`](consensus.md#system-parameters)-long byte array. From a46400122ffa4c39e107d590311d1f64128702ed Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 18 Mar 2021 11:45:24 -0400 Subject: [PATCH 3/7] Update share figure. --- specs/figures/share.dot | 6 ++++-- specs/figures/share.svg | 39 +++++++++++++++++++++------------------ 2 files changed, 25 insertions(+), 20 deletions(-) diff --git a/specs/figures/share.dot b/specs/figures/share.dot index 37aab89..f8ae292 100644 --- a/specs/figures/share.dot +++ b/specs/figures/share.dot @@ -4,14 +4,16 @@ digraph G { share [label=< - - + + + + diff --git a/specs/figures/share.svg b/specs/figures/share.svg index 18bd54c..02885cc 100644 --- a/specs/figures/share.svg +++ b/specs/figures/share.svg @@ -4,30 +4,33 @@ - + G - + share - + 0 -1 -80 -200 -256 - -* - -end of tx2 - -len(tx3) - -tx3 - -zero-padding +8 +9 +80 +200 +256 + +NID + +* + +end of tx2 + +len(tx3) + +tx3 + +zero-padding From 20b4b59448b54cb305c56e27b0e75c7c4412b7a6 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 18 Mar 2021 11:54:15 -0400 Subject: [PATCH 4/7] Update share layout. --- specs/data_structures.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index ebb2ef4..8adefc0 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -496,13 +496,15 @@ If a malicious block producer incorrectly computes the 2D Reed-Solomon code for A share is a fixed-size data chunk associated with a namespace ID, whose data will be erasure-coded and committed to in [Namespace Merkle trees](#namespace-merkle-tree). -A share's raw data (`rawData`) is interpreted differently depending on the namespace ID. +The first [`NAMESPACE_ID_BYTES`](./consensus.md#constants) of a share's raw data `rawData` for non-parity shares is the namespace ID of that share, `namespaceID`. -For shares **with a reserved namespace ID through [`NAMESPACE_ID_MAX_RESERVED`](./consensus.md#constants)**, the first [`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes (the `*` in the example layout figure below) is the starting byte of the length of the [canonically serialized](#serialization) first request that starts in the share, or `0` if there is none, as a [canonically serialized](#serialization) big-endian unsigned integer. In this example, with a share size of `256` the first byte would be `80` (or `0x50` in hex). +Subsequent bytes of a share's raw data `rawData` are interpreted differently depending on the namespace ID. + +For shares **with a reserved namespace ID through [`NAMESPACE_ID_MAX_RESERVED`](./consensus.md#constants)**, the first [`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes after [`NAMESPACE_ID_BYTES`](./consensus.md#constants) (the `*` in the example layout figure below) is the starting byte of the length of the [canonically serialized](#serialization) first request that starts in the share, or `0` if there is none, as a [canonically serialized](#serialization) big-endian unsigned integer. In this example, with a share size of `256` the first byte would be `80` (or `0x50` in hex). ![fig: Reserved share.](./figures/share.svg) -For shares **with a namespace ID above [`NAMESPACE_ID_MAX_RESERVED`](./consensus.md#constants)**, the first [`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes have no special meaning and are simply used to store data like all the other bytes in the share. +For shares **with a namespace ID above [`NAMESPACE_ID_MAX_RESERVED`](./consensus.md#constants)**, the first [`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes after [`NAMESPACE_ID_BYTES`](./consensus.md#constants) have no special meaning and are simply used to store data like all the other bytes in the share. For non-parity shares, if there is insufficient request data to fill the share, the remaining bytes are padded with `0`. From 4a3715d68f5aec001d5f81c874f990a69a065f09 Mon Sep 17 00:00:00 2001 From: John Adler Date: Thu, 18 Mar 2021 12:01:09 -0400 Subject: [PATCH 5/7] Update logic for laying out messages into shares. --- specs/data_structures.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index 8adefc0..01dcd50 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -518,7 +518,7 @@ Then, 1. For each request in the list: 1. [Serialize](#serialization) the request (individually). 1. Compute the length of each serialized request, [serialize the length](#share), and pre-pend the serialized request with its serialized length. - 1. Split up the length/request pairs into [`SHARE_SIZE`](./consensus.md#constants)`-`[`SHARE_RESERVED_BYTES`](./consensus.md#constants)-byte [shares](#share) and assign [the appropriate namespace ID](./consensus.md#reserved-namespace-ids). This data has a _reserved_ namespace ID, so the first [`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes for these shares must be [set specially](#share). + 1. Split up the length/request pairs into [`SHARE_SIZE`](./consensus.md#constants)`-`[`NAMESPACE_ID_BYTES`](./consensus.md#constants)`-`[`SHARE_RESERVED_BYTES`](./consensus.md#constants)-byte [shares](#share) and assign [the appropriate namespace ID](./consensus.md#reserved-namespace-ids). This data has a _reserved_ namespace ID, so the first [`NAMESPACE_ID_BYTES`](./consensus.md#constants)`+`[`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes for these shares must be [set specially](#share). 1. Concatenate the lists of shares in the order: transactions, intermediate state roots, evidence. Note that by construction, each share only has a single namespace, and that the list of concatenated shares is [lexicographically ordered by namespace ID](consensus.md#reserved-namespace-ids). @@ -527,7 +527,7 @@ These shares are arranged in the [first quadrant](#2d-reed-solomon-encoding-sche ![fig: Original data: reserved.](./figures/rs2d_originaldata_reserved.svg) -Each message in the list `messageData` is _independently_ serialized and split into `SHARE_SIZE`-byte shares. For each message, it is placed in the available data matrix, with row-major order, as follows: +Each message in the list `messageData` is _independently_ serialized and split into [`SHARE_SIZE`](./consensus.md#constants)`-`[`NAMESPACE_ID_BYTES`](./consensus.md#constants)-byte shares, with the first [`NAMESPACE_ID_BYTES`](./consensus.md#constants) [set to the namespace ID](#share). For each message, it is placed in the available data matrix, with row-major order, as follows: 1. Place the first share of the message at the next unused location in the matrix, then place the remaining shares in the following locations. From 5b010a75805d9ddcf39dfb48b9209a5c3499402c Mon Sep 17 00:00:00 2001 From: John Adler Date: Fri, 19 Mar 2021 09:32:46 -0400 Subject: [PATCH 6/7] Fix start index to use raw byte. --- specs/data_structures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index 01dcd50..24742b7 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -500,7 +500,7 @@ The first [`NAMESPACE_ID_BYTES`](./consensus.md#constants) of a share's raw data Subsequent bytes of a share's raw data `rawData` are interpreted differently depending on the namespace ID. -For shares **with a reserved namespace ID through [`NAMESPACE_ID_MAX_RESERVED`](./consensus.md#constants)**, the first [`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes after [`NAMESPACE_ID_BYTES`](./consensus.md#constants) (the `*` in the example layout figure below) is the starting byte of the length of the [canonically serialized](#serialization) first request that starts in the share, or `0` if there is none, as a [canonically serialized](#serialization) big-endian unsigned integer. In this example, with a share size of `256` the first byte would be `80` (or `0x50` in hex). +For shares **with a reserved namespace ID through [`NAMESPACE_ID_MAX_RESERVED`](./consensus.md#constants)**, the first [`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes after [`NAMESPACE_ID_BYTES`](./consensus.md#constants) (the `*` in the example layout figure below) is the starting byte of the length of the [canonically serialized](#serialization) first request that starts in the share, or `0` if there is none, as a one-byte big-endian unsigned integer (i.e. canonical serialization is not used). In this example, with a share size of `256` the first byte would be `80` (or `0x50` in hex). ![fig: Reserved share.](./figures/share.svg) From 42d8b4fb92dfdf912d2843548cf95b461ac3bf3e Mon Sep 17 00:00:00 2001 From: John Adler Date: Fri, 19 Mar 2021 09:43:39 -0400 Subject: [PATCH 7/7] Clarify wording. --- specs/data_structures.md | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/specs/data_structures.md b/specs/data_structures.md index 24742b7..3adc114 100644 --- a/specs/data_structures.md +++ b/specs/data_structures.md @@ -496,17 +496,26 @@ If a malicious block producer incorrectly computes the 2D Reed-Solomon code for A share is a fixed-size data chunk associated with a namespace ID, whose data will be erasure-coded and committed to in [Namespace Merkle trees](#namespace-merkle-tree). -The first [`NAMESPACE_ID_BYTES`](./consensus.md#constants) of a share's raw data `rawData` for non-parity shares is the namespace ID of that share, `namespaceID`. +A share's raw data `rawData` is interpreted differently depending on the namespace ID. -Subsequent bytes of a share's raw data `rawData` are interpreted differently depending on the namespace ID. +For shares **with a reserved namespace ID through [`NAMESPACE_ID_MAX_RESERVED`](./consensus.md#constants)**: -For shares **with a reserved namespace ID through [`NAMESPACE_ID_MAX_RESERVED`](./consensus.md#constants)**, the first [`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes after [`NAMESPACE_ID_BYTES`](./consensus.md#constants) (the `*` in the example layout figure below) is the starting byte of the length of the [canonically serialized](#serialization) first request that starts in the share, or `0` if there is none, as a one-byte big-endian unsigned integer (i.e. canonical serialization is not used). In this example, with a share size of `256` the first byte would be `80` (or `0x50` in hex). +- The first [`NAMESPACE_ID_BYTES`](./consensus.md#constants) of a share's raw data `rawData` is the namespace ID of that share, `namespaceID`. +- The next [`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes (the `*` in the example layout figure below) is the starting byte of the length of the [canonically serialized](#serialization) first request that starts in the share, or `0` if there is none, as a one-byte big-endian unsigned integer (i.e. canonical serialization is not used). In this example, with a share size of `256` the first byte would be `80` (or `0x50` in hex). +- The remaining [`SHARE_SIZE`](./consensus.md#constants)`-`[`NAMESPACE_ID_BYTES`](./consensus.md#constants)`-`[`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes are request data. ![fig: Reserved share.](./figures/share.svg) -For shares **with a namespace ID above [`NAMESPACE_ID_MAX_RESERVED`](./consensus.md#constants)**, the first [`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes after [`NAMESPACE_ID_BYTES`](./consensus.md#constants) have no special meaning and are simply used to store data like all the other bytes in the share. +For shares **with a namespace ID above [`NAMESPACE_ID_MAX_RESERVED`](./consensus.md#constants) but below [`PARITY_SHARE_NAMESPACE_ID`](./consensus.md#constants)**: -For non-parity shares, if there is insufficient request data to fill the share, the remaining bytes are padded with `0`. +- The first [`NAMESPACE_ID_BYTES`](./consensus.md#constants) of a share's raw data `rawData` is the namespace ID of that share, `namespaceID`. +- The remaining [`SHARE_SIZE`](./consensus.md#constants)`-`[`NAMESPACE_ID_BYTES`](./consensus.md#constants) bytes are request data. In other words, the remaining bytes have no special meaning and are simply used to store data. + +For shares **with a namespace ID equal to [`PARITY_SHARE_NAMESPACE_ID`](./consensus.md#constants)** (i.e. parity shares): + +- Bytes carry no special meaning. + +For non-parity shares, if there is insufficient request data to fill the share, the remaining bytes are filled with `0`. ### Arranging Available Data Into Shares @@ -518,7 +527,8 @@ Then, 1. For each request in the list: 1. [Serialize](#serialization) the request (individually). 1. Compute the length of each serialized request, [serialize the length](#share), and pre-pend the serialized request with its serialized length. - 1. Split up the length/request pairs into [`SHARE_SIZE`](./consensus.md#constants)`-`[`NAMESPACE_ID_BYTES`](./consensus.md#constants)`-`[`SHARE_RESERVED_BYTES`](./consensus.md#constants)-byte [shares](#share) and assign [the appropriate namespace ID](./consensus.md#reserved-namespace-ids). This data has a _reserved_ namespace ID, so the first [`NAMESPACE_ID_BYTES`](./consensus.md#constants)`+`[`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes for these shares must be [set specially](#share). + 1. Split up the length/request pairs into [`SHARE_SIZE`](./consensus.md#constants)`-`[`NAMESPACE_ID_BYTES`](./consensus.md#constants)`-`[`SHARE_RESERVED_BYTES`](./consensus.md#constants)-byte chunks. + 1. Create a [share](#share) out of each chunk. This data has a _reserved_ namespace ID, so the first [`NAMESPACE_ID_BYTES`](./consensus.md#constants)`+`[`SHARE_RESERVED_BYTES`](./consensus.md#constants) bytes for these shares must be [set specially](#share). 1. Concatenate the lists of shares in the order: transactions, intermediate state roots, evidence. Note that by construction, each share only has a single namespace, and that the list of concatenated shares is [lexicographically ordered by namespace ID](consensus.md#reserved-namespace-ids).
01089 80 200 256
NID * end of tx2 len(tx3)