Few 2D Graphics Optimization Tips

Here are few optimization techniques that were handy to me recently; they mostly concern 2D graphics programming, where these patterns often emerge, but are helpful anywhere where you have to iterate and update data.

In short:

  • iterate
  • don’t do if‘s
  • map all you can
  • don’t float
  • cache your reads

Notation

Further on we’ll work with a W times H RGB bitmap, stored as a 1D array image of length W*H*3 (one component per channel), stored as “r g b r g b r g b …”. To generalize, we’ll use type itype, which is typically char or short.

Don’t search, iterate!

First tip is trivial, yet it can get you a great boost in performance. It’s all about pointers.

Lets say you want to set all pixels to given fixed value – (R, G, B).

If you wrapped the image array in a bitmap object, you’d have access function such as

inline void setRGB(int x, int y, itype r, itype g, itype b)
{
    image[y * W + x    ] = r; 
    image[y * W + x + 1] = g; 
    image[y * W + x + 2] = b;
}

and you’d do something like

for (int row = 0; row < H; row++)
{
    for (int col = 0; col < W; col++)
        bitmap.setRGB(col, row, R, G, B);
}

Even if you did precompute and store the y*W+x value in bitmap::setRGB(), it’s still one int multiplication per pixel access.

On the other hand:

itype* p = image;
for (int i = 0; i < W * H; i++)
{
    *p = R; p++;
    *p = G; p++;
    *p = B; p++;
}

No multiplication, just increments, swift as hell, works every time.

Extending this to just parts of image, and other use-cases (copying bitmap to bitmap, etc.), is up to you, and not the point of this post.

Avoid if‘s, they’re evil!

The if statement is what separates trivial from advanced… yet, it’s your enemy! Especially today, when CPU’s caches get bigger and bigger, flushing them gets more and more expensive, even with the smartest execution prediction in the CPU.

Consider the simple case of averaging two neighboring samples to get a new sample in-between (such as when converting Bayer interlaced image to full RGB image).

As you need every other sample, you could use a function such as this:

inline int safe2index(int x)
{
    if (x < 0) x += 2;
    else if (x >= W) x -= 2;
    return x;
}

that clamps the x into the [0, W) range in steps of 2. Then, when computing samples in even positions on a single line, you get (in “pseudo code”):

for (i = 0; i < W/2; i++)
    bitmap[i * 2, row] = (bitmap[safe2index(i * 2 - 1), row] +
                          bitmap[safe2index(i * 2 + 1), row]) / 2;

When does any of the if‘s in the safe2index() function kick in? The only 2 places are i=0 and i=W/2-1. That’s a whole lot of if‘s where condition is not met.

You could rewrite the above example like this, by splitting the code into 3 cases:

bitmap[0, row] = bitmap[1, row];
for (i = 1; i < W/2 - 1; i++)
    bitmap[i * 2, row] = (bitmap[safe2index(i * 2 - 1), row] +
                          bitmap[safe2index(i * 2 + 1), row]) / 2;
bitmap[(W / 2 - 1) * 2, row] = bitmap[(W / 2 - 1) * 2 - 1, row];

That’s W less of if‘s, for the price of 2 rows of extra code! This piece of code runs about 1.5-3 times faster than the if‘ed version (depends heavily on the processor).

Note: I wrote out here the (W / 2 - 1) * 2 in full so that it’s obvious what the last line is about; rewrite to your liking.

Map what you can!

Lookup tables are probably one of the oldest tricks in the book, but they’re as needed as ever. Todays images have 5, 10 or even 30+ megapixels; which means 15, 30 or 100+ million values. Compare this to the number of individual values the pixel channels can take – 256 for 8-bit precision, or 4,096 or 16,384 for 12- and 14-bit precision raw images.

Thus, creating a lookup table for all possible channel values, and then assigning this value to the pixels means way less work, than computing the new value for each pixel.

Imagine the trivial task of image normalization – find a minimum and maximum of the channel values in the picture, and then stretch the channels so that they cover the range of the image [0, max_itype) (that’s why this technique is often called “histogram stretching”).

Now, you easily find the minimum and maximum, store it away in min and max variables, and you go one by one, each image component, stretching it to (value - min) * max_itype / max. No big deal, simple computation… 15M times? Wow.

Why not create a map – itype map[max_itype], fill this in with corresponding values, and then just assign the new values from the map? It takes 3 more lines of code, and gives you incredible speed-up.

Of course, you can use partial maps (just parts of the expressions are in the map, the rest is still computed), or combine several maps to get the result, etc. It all depends on the application at hand; I once created the full 16M records map/lookup table, for each possible value of the 24bpp RGB image – this map was used for a batch of 100+ images, so the time it took to compute the table was negligible compared to the time it would take doing the same computation on each image.

Fix it, don’t float!

For many years, floating point computation was expensive (most of you probably don’t even remember those days, but it’s true); then the FPU’s came, and things got bit better. Today, FPU’s are claimed to be as fast as CPU’s. Why would any sane person use fixed point arithmetic then?

Well, it’s true that you can do floating point as fast as integer arithmetic today. What you’re not told is that if you need only integer results, using float’s along the way is damn expensive.

Take for instance the function to compute luminance of the pixel using it’s components R, G and B:

inline itype lum(itype r, itype g, itype b)
{
    return itype(0.30 * r + 0.59 * g + 0.11 * b);
}

Do you see something wrong with the function? [except of maybe sticking +0.5 at the end for round off reasons]

Nope… all’s OK, FPU’s fast, it should work good. Then you time some function (e.g. computation of the luminance histogram), and you find out that this is by far the slowest part of the function.

The reason is, that the above function has to convert r, g, and b to double (using float doesn’t make any significant difference), then compute the result on FPU, and then convert the result to integer again.

How’bout we keep it all on CPU? Approximating 0.3 with 77/256, 0.59 with 151/256, and 0.11 by 28/256.

inline itype lum(int r, int g, int b)
{
    return (77 * r + 151 * g + 28 * b) >> 8;
}

This function is about 3-5 times faster than it’s floating point counter-part!

Conclusion

The above tips are very simple, but very powerful; using the combination of the above you can get many times faster programs; they are in the order of most obvious trick to the most subtle, but typically they are applied in the reversed order – first get rid of float‘s, then map, then get rid of if‘s and loops, etc.

And remember:

  • Premature optimization is root of all evil! Forget about optimization when designing.
  • Optimize only what has to be optimized, not what can be optimized. Better profile and measure twice, than optimize twice. Also, optimizing simple function, that is called once a day, is pointless.
  • (Almost) every optimization is also an obfuscation. [not the other way around! :)]
  • Check your algorithms first, then optimize if nothing better can be done. Getting a better algorithm is the best and the cleanest optimization ever.
  • Test and benchmark a lot along the way; optimized code is often harder to (write and) read, thus after every update you should verify that all works; also make sure you actually are making things faster.
  • Optimization often violates OOP rules (or any rules, for that matter). Optimize wisely and when your design is stable, and only if your design can withstand it.

Appendix

One more tip is pretty much universal – don’t forget that it’s the reads that are cached, not the writes.

In other words, it doesn’t matter if you’re writing sequentially or random (only to memory, of course, not for disk), but reading sequentially is much faster than reading from random positions.

3 responses to “Few 2D Graphics Optimization Tips

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s