Flexible buffer block layout in the save state format #1242

rndmcnlly · 2025-02-02T01:51:03Z

rndmcnlly
Feb 2, 2025

I want to propose a (possibly) breaking change to the way buffers are serialized in the save state format.

Problem: We are missing opportunities to produce radically smaller save state files because the current design assumes assumes blocks are sequential and disjoint (i.e. non-overlapping). Suppose there is a block of data that appears multiple times on disk as well as multiple times in memory. This situation is common! The current format can't express the idea that a single instance of this data in the save state file should get reused multiple times when restoring state.

Currently, our buffer info objects have offset and length fields that could allow them different buffer info objects to refer back to a shared range of data in the big buffer. However, the zstd path of CPU.prototype.restore_state makes some assumptions about sequential layout (it ignores offset and does its own logic with front_padding to compute the offset instead).

Proposals:

A: Remain at STATE_VERSION=6 and update the zstd path of CPU.prototype.restore_state to match what seem to be the intended semantics of the offset and length fields of buffer info objects (e.g. interpret the current implementation as buggy and just fix it). You might think this is totally unacceptable because it means that a state saved by in nominally version 6 format can't be decoded by another nominally version 6 (but older) implementation. However, consider that the current codebase (to my understanding) doesn't every actually output zstd-compressed save states. We'd be breaking support for decoding a format that we don't actually support for encoding anyway.
B: Bump to STATE_VERSION=7, fix the zstd path, and make some other forward-looking improvements to the save format while we are in there. Here are a few things that would give encoders more flexibility:
- Reserve more space in the header (the thing that includes STATE_MAGIC but before the info block starts). We currently have 16 bytes (4 uint32s) and we use all of them. If we move this to 64 bytes (16 uint32s), we'll gain some more slots that could be used/abused for application specific flags.
- Use one of the new header slots to explicitly code the start of the buffer block. Currently, we assume the buffer block starts after the info block ends, plus some padding that is easy to overlook. If the header explicitly declared the start point, the amount of padding could be varied without needing to change the state restoration path. (I have an application we're I'd like the buffer block to be at least 256-byte aligned, but I don't want to force this on others.)
- Use another new header slot to explicitly code the length of the buffer block. Currently, we assume the buffer block extends all the way out to TOTAL_LEN. If the base and range of the buffer block is explicitly delimited, then people are allowed to stuff application specific stuff (e.g. a screenshot thumbnail or other metadata) in there in a forward-compatible way. We'd be wasting very few bits to buy a lot of flexibility.

Proposal A risks upsetting people, but do these people exist? Who relies on version 6 states with internal zstd compression?

Proposal B is very safe and forward-looking, but I also note that we haven't bumped the version number in several years. So, there's a risk of a different kind of disruption that people didn't expect.

Context: I'm making a generalization of savestates that I call savestreams (paralleling the relationship between screenshots and screen recording videos). One thing I'd really like to have is the ability to do block-based deduplication of V86 savestates. This requires more flexibility in block alignment and allowing blocks to be reused and accessed out of order.

copy · 2025-02-02T03:24:48Z

copy
Feb 2, 2025
Maintainer

I haven't looked at state.js in detail yet, but here are some notes relevant to this topic:

Proposal A risks upsetting people, but do these people exist? Who relies on version 6 states with internal zstd compression?

Yes, that would be me :-)

First, I sometimes get emails of people asking me if there's a way to upgrade their state file from STATE_VERSION 3 or 4 to the latest. Nowadays v86 has many more users than when those versions were in use.

Secondly, zstd compression is significant for the state images. For reference:

  36M 9front_state-v2.bin
 5.0M 9front_state-v2.bin.zst
  46M alpine-state.bin
 8.4M alpine-state.bin.zst
  84M freebsd_state.bin
  16M freebsd_state.bin.zst
 267M haiku_state-v3.bin
  39M haiku_state-v3.bin.zst
  60M openbsd_state.bin
  11M openbsd_state.bin.zst
 158M redox_state.bin
  16M redox_state.bin.zst
 114M serenity_state-v4.bin
  17M serenity_state-v4.bin.zst
 120M windows2k_state-v3.bin
  22M windows2k_state-v3.bin.zst
  34M windows95_state.bin
 6.7M windows95_state.bin.zst
  53M windows98_state.bin
  12M windows98_state.bin.zst
  60M windows-me_state-v2.bin
  15M windows-me_state-v2.bin.zst

I use v86 with nodejs to automatically build and compress the state images, e.g:

      const s = await emulator.save_state();
      await fs.writeFile(OUTPUT_FILE, new Uint8Array(s));
      child_process.execFileSync("make", ["-C", path.dirname(OUTPUT_FILE), path.basename(OUTPUT_FILE) + ".zst"], { stdio: "inherit" });

Where the makefile does zstd -f -19 file. That way, v86 itself only needs to include the zstd decompressor. I should document this.

Bump to STATE_VERSION=7

That's fine, iff you keep the old state loading code around and fork based on the version in the state file. That way, old state images can still be used in new versions. Old versions of v86 won't be able to load the new state images, but I don't think that's a problem.

All the other suggestions in proposal B sound very reasonable, and I would probably merge them. Just try to keep the format simple, please :-)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flexible buffer block layout in the save state format #1242

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Flexible buffer block layout in the save state format #1242

rndmcnlly Feb 2, 2025

Replies: 1 comment

copy Feb 2, 2025 Maintainer

rndmcnlly
Feb 2, 2025

copy
Feb 2, 2025
Maintainer