Running HSE over a Replicated Network Disk #394
-
i'm currently evaluating, in regards to my database built over HSE, how to best decouple the storage capacity of the database as a whole from the capacity of any individual disk-- as currently i run them as mirrored clones of each other, replicating at the application level all writes to all other database instances. the current way provides read capacity proportional to the number of instances, but write and storage capacity equivalent to 1-- plus a lot of network chatter. the other issue with the current way is to boot new database instances, i have to serialize the entire database to disk, meaning the capacity of the entire fleet is only equal to half the capacity of one drive, and then transfer the entire thing from one machine to another. so my options are basically either sharding (with parent / children for replication) at the application level where each database shard only has visibility over a subset of the dataset, or continuing to run each instance with a full view of the dataset but run over a networked drive to handle replication. obviously sharding would be an incredible amount of work and complexity, so i much prefer the networked drive option where i'd simply eliminate application level replication plus be able to scale with no complexity. sharing also still limits the size of any shard to half the size of its disk capacity, given the need to serialize then transfer its entire subset dataset to boot a sibling and for the sibling to deserialize etc. the networked drive option is basically perfect in every way: as long as i either ensured single writer access to my ordered list datatype, which currently fills up a value to HSE_KVS_VALUE_LEN_MAX then overflows into a new subsey, or changed it to unordered and map individual items in the list to distinct keys, then i see no issues from my perspective running over a networked drive because no there's no merging required to sanitize competing writes.... but i'm sure there might be complications at the HSE level... so are there? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
You can run HSE on a file system configured on a SAN volume -- we do it all the time. However, you cannot share that SAN volume among machines, except in a failover scenario. I don't see an obvious way around sharding the data to get what you want. |
Beta Was this translation helpful? Give feedback.
You can run HSE on a file system configured on a SAN volume -- we do it all the time. However, you cannot share that SAN volume among machines, except in a failover scenario.
I don't see an obvious way around sharding the data to get what you want.