The demo below uses dE76 to incorporate a poor man's chroma key: replacing
green pixels with any arbitrary user input (in this case, red&black pixel noise).
Click play, and be amazed*!
*Claim is void if you are on mobile, a slow connection, or archaic browser.
At a broad level, the demo works by calculating the dE76 value for each pixel
on the HTML5 video, and then decides if that pixel is green enough or not. If that
sounds intensive, that's because it is. Let's look at the fun problems encountered.
The code for this demo runs on a single thread on your CPU. That sucks.
That means we are greatly limited in the size of arrays and operations we can
work with.
Had we access to a GPU, or the diligence to implement web workers, we wouldn't
be having this discussion. We lack both; let's get to business.
Canvas Size
The most obvious (and easiest!) performance improvement is to modify the
canvas dimensions. Had we increased the demo to twice the size, the operations
required per second would increase by an order of magnitude.
Even on desktop, that probably means you'll get jitter. This parlor trick
won't scale.
Goodbye, Standards. Hello, Performance!
What's the best way to get performance out of a database? Do you add some indexes?
Spend an afternoon profiling?
Nope. You Denormalize it*. To hell with technical debt, remove all your
schema integrity and make a mess of things.
*Please don't take this as an opportunity to be pedantic.
The demo does not use the Delta E library. Instead, it uses a version of
dE76 that is optimized for performance.
Optimizations
Cut Cruft: The function above doesn't include the excess
functions that the Delta-E library includes; it has a single purpose.
There is also no object instantiation.
Limit Object/Prototype Usage: The function uses more parameters
in favor of passing in two color objects. The two color objects are easier
to use and manage, but that comes at a performance hit: accessing properties
in an object means you're accessing the prototype chain. This is a bigger
performance hit than you might think.
Localize Object Properties: Similar to above, we
assigned pow to Math.pow. Although the strategy is a
good practice for high performance
situations, I will note I didn't see a performance difference here.
That would suggest the JiT compiler is already optimizing for this.
You can think of requestAnimationFrame as UDP, and setTimeout as TCP. With the former,
we don't necessarily care that every calculation is done: we just want something
fast. If the calculation function gets behind, requestAnimationFrame will simply
request the most current frame instead of the next one in queue.
Among other benefits, requestAnimationFrame also doesn't eat your CPU if you
switch tabs.
Smaller Arrays With Uint32Array
It's typical to use the UInt8ClampedArray when working with Canvas.
It's very easy to work with RGB values this way. Unfortunately, we are also
working with a much larger array.
With Uint32Array
we drastically reduce array access time by working with larger integers. Any time
you can reduce your array size, you should - it's a huge performance bottleneck.
Okay, But Now Make it Scale
Making this scale involves getting GPU access. Luckily, that's something we can
do with WebGL. If you write a pixel shader to implement the same logic in the
demo, you could even have real-time calculation of Delta E 00. (And that thing
is a performance hog!)
Shoutout to SeriouslyJS,
which has created a pixel shader to do chroma key on the GPU.
That's all I got. View the annotated source below for some more context plus
a few extra notes. Enjoy!