Skip to content

Latest commit

 

History

History
198 lines (154 loc) · 9.72 KB

README.md

File metadata and controls

198 lines (154 loc) · 9.72 KB

Salvage

Build Status

This library to handle the use case where you have React components (or any system where you want to use the === operator to determine equality) but your data is coming from non-in-memory source. The use case I'm most concerned with is when data is being deserialized from a web sockets of HTTP responses. Such data comes into the Javascript environment as a completely new JSON instance. But in some cases, it may be the case that only a fraction of that data is actually new (as compared to some previously deseralized value).

This library attempts to efficiently salvage as much as possible of the "current" value in memory rather than completely replacing it with a newly known value. I could imagine this library could also be useful as an operator for an Observable in order to salvage parts of previously reported values.

Getting started

npm install salvage

In TypesScript, you can import the function with:

import { salvage} from 'salvage';

From node, the simplest thing is probably do do:

var salvage = require('salvage').salvage;

Examples

Consider that your current application state is represented by something like this:

{
    catalog: [{
        product: "Settlers of Catan",
        price: "$35.27",
    }, {
        product: "Blood Rage",
        price: "$50.60",
    }, ..., {
        product: "Star Wars: Rebellion",
        price: "$76.44",
    }]
}

Then imagine that you query your server after a price drop for one item and get back this response:

{
    catalog: [{
        product: "Settlers of Catan",
        price: "$35.27",
    }, {
        product: "Blood Rage",
        price: "$38.78",  // <-- Only change
    }, ..., {
        product: "Star Wars: Rebellion",
        price: "$76.44",
    }]
}

You wouldn't want to trigger unnecessary redraws. So, an ideal situation would be to preserve the items in the catalog that didn't change. By preserve, I mean that they are equal (as in ===) to the previous values in the catalog. In the case of the example above, the value of catalog would change (since the array itself is "new"), but the only element that would have a different (!==) value would be the one for "Blood Rage". All others would be === to their predecessor (even if elements of the array are inserted or removed).

The API consists of just a single function, salvage, which is invoked as follows:

let newState = salvage(oldState, deserializedValue);

The salvage function will preserve as many values from within arrays and objects as possible so as to minimize the number of new values present in the hierarchy of newState.

Options

When calling salvage, you can include an options object that provides hints about how salvage should function. The main options are related to how array comparisions are done (see discussion below about Performance). However, at the moment the default behavior seems to be nearly as good as any of the optimized settings I've tried. As such, I suggest using the default settings until I see any cases "in the wild" where tweaking these settings makes a big improvement.

Why?

Working with React and Angular2 UIs that depend heavily on === equality but, at the same time, working with data sources that generated data from "out of memory", I wanted to be able to preserve the ability to use === for efficiency.

My guess is that something like this has been done before. However, I couldn't find such a thing. This is probably due to the fact that I'm not sure what Google search terms to use. Perhaps this kind of thing has a widely accepted name and I simply don't know what it is. By all means, if there are libraries out there that do this kind of thing...please let me know.

Performance

Introduction

In order to talk about performance, we need to consider what to compare the performance of this library to. The way I look at this, I'm trying to allow frameworks to use object identity as a fast equality check. This library is paying an "upfront" cost to allow that by processing successive values to preserve object identity to the extent possible. So the point is that running salvage can be seen as an investment to make other operations faster later.

That being said, I see the purpose of salvage as a way to perform deep equality checks in other parts of the code. The worst case scenario for a deep equality check is comparing an object to another object that is (deeply) equal because it is, therefore, forced to check the entire hierarchy. So that provides a kind of "base line" reference time for traversal of the object hierarchy as a reference point.

So then the question becomes, how does salvage compare (in CPU time) to this reference point? I've added a performance test to the library test suite. I'm not sure that it can be considered representative of anything, but it is at least a non-trivial JSON representation.

Arrays

The primary issue with performance at this point is in array handling. This is because handling of object properties is relatively straight forward (you compare against the value associated with the same property in the previous object). But array handling is different. That is because changes in arrays might cause values to be inserted or deleted. This means that two values that are equal may shift around in the array (i.e., have different indices).

So how do we check to see if we can preserve object/value identity from one array value to the next? The safest and most general approach is to compare each element in the new array value to all values in the previous array value. For two "large" arrays, this can be computationally intensive and my benchmarks to date indicate that you definitely don't want to do that. For this reason, I've settled on an approach that uses a "key function" to generate a string "key" value (conceptually you can think of this as a has value) for each element in the arrays. In this way, we only bother comparing elements with matching keys. This follows very closely how a key attribute can be used in React or the trackBy function can be used in Angular 2 to accomplish similar things.

Results

I have only one large data set in my test cases. Again, I cannot say this is representative. But I think it is probably reasonable. As I mentioned before, I feel that performance of the salvage function should be compared relative to a deep equality check. I established this baseline by using lodash to perform a deep equality check on my one sample data set. The follow chart demonstrates how invocations of salvage (with a few different options) compared to this deep equality check...

Operation Time (for equal object) Time (for different objects)
Deep Equality Check (_.isEqual) 13ms
salvage (default) 22ms 18ms
salvage (jsonKey) 24ms 15ms
salvage (use _id) 30ms 13ms
salvage (sameKey) 490ms 492ms

The main thing to focus on in this chart is the fact that running salvage with its default options took no more than 70% longer than testing for object equality using the default key function. You can also see that a couple of other key functions were used and the default key function is quite performant vs. more case specific key functions.

You might think...oh, deep object equality checks are faster. So what do I need salvage for?

Yes, a simple deep equality check is going to be faster. But this only tells you whether two values are the same at the root level. On the other hand, by running salvage, you'll find out all (nested) values that are deep equal.

To understand why this is important, consider a typical React hierarchy of components (and the same can also be applied in the case of Angular 2 as well). It is quite common to have a rich hierarchy of DOM components that cascade some top level value down through the DOM hierarchy delegating nested values to these components along the way. These components are going to decide whether they need to redraw based on the identity of their values and whether that identity has changed. What salvage is helping you with here is preserving the identities whenever possible. As such, it is potentially eliminating the need for multiple, nested deep equality checks and the associated work of rerendering those components.

Types

The main goal of this library is to allow in-memory values to be updated by values that originated outside of memory. In practice, this means values that have been reconstituted via some deserialization process and, that, in practice means JSON. As such, the focus here is on the JSON primitives: numbers, strings, null, booleans, objects and array. As a practical matter, I included support for both Date and function types (even though those are outside the scope of the JSON spec). Dates in particular could be pretty important and could easily be transformed as part of a deserialization process.

This still leaves other types of values like Buffer, Uint8Array, etc. unhandled. If salvage encounters a value that it doesn't know how to handle, it will throw an exception.

To Do

I've tried to create a representative set of test cases and I have 100% coverage for the test cases I have. But I'd really like to have more test cases to make sure the code is robust to as many weird corner cases as I can find.

Another potential future enhancement would be to be able to use salvage in the context of Observables to preserve identity for next values in a stream.