Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds the Finite-State Transducer algorithm (#11242)
This PR adds a parallel _Finite-State Transducer_ (FST) algorithm. The FST is a key component of the nested JSON parser. # Background **An example of a Finite-State Transducer (FST) // aka the algorithm which we try to mimic**: [Slides from the JSON parser presentation, Slides 11-17](https://docs.google.com/presentation/d/1NTQdUMM44NzzHxLNnvcGLQk6pI-fdoM3cXqNqushMbU/edit?usp=sharing) ## Our GPU-based implementation **The GPU-based algorithm builds on the following work:** [ParPaRaw: Massively Parallel Parsing of Delimiter-Separated Raw Data](https://arxiv.org/pdf/1905.13415.pdf) **The following sections are of relevance:** - Section 3.1 - Section 4.5 (i.e., the Multi-fragment in-register array) **How the algorithm works is illustrated in the following presentation:** [ParPaRaw @VlLDB'20](https://eliasstehle.com/media/parparaw_vldb_2020.pdf#page=21) ## Relevent Data Structures **A word about the motivation and need for the _Multi-fragment in-register array_:** The composition over to state-transaction vectors is a key operation (in the prefix scan). Basically, what it does for two state-transition vectors `lhs` and `rhs`, both comprising `N` items: ``` for (int32_t i = 0; i < N; ++i) { result[n] = rhs[lhs[i]]; } return result; ``` The relevant part is the indexing into `rhs`: `rhs[lhs[i]]`, i.e., the index is `lhs[i]`, a runtime value that isn't known at compile time. It's important to understand that in CUB's prefix scan both `rhs` and `lhs` are thread-local variables. As such, they either live in the fast register file or in (slow off-chip) local memory. The register file has a shortcoming, it cannot be indexed dynamically. And here, we are dynamically indexing into `rhs`. So `rhs` will need to be spilled to local memory (backed by device memory) to allow for dynamic indexing. This would usually make the algorithm very slow. That's why we have the _Multi-fragment in-register array_. For its implementation details I'd suggest reading [Section 4.5](https://arxiv.org/pdf/1905.13415.pdf). In contrast, the following example is fine and `foo` will be mapped to registers, because the loop can be unrolled, and, if `N` is known at compile time and sufficiently small (of at most tens of items). ``` // this is fine, if N is a compile-time constant for (int32_t i = 1; i < N; ++i) { foo[n] = foo[n-1]; } ``` # Style & CUB Integration The following may be considered for being integrated into CUB at a later point, hence the deviation in style from cuDF. - `in_reg_array.cuh` - `agent_dfa.cuh` - `device_dfa.cuh` - `dispatch_dfa.cuh` Authors: - Elias Stehle (https://github.com/elstehle) Approvers: - Vukasin Milovanovic (https://github.com/vuule) - Tobias Ribizel (https://github.com/upsj) - Karthikeyan (https://github.com/karthikeyann) URL: #11242
- Loading branch information