Fast Gaussian filter for greyscale images (with SSE) – part 1

A few months ago I’ve been working on speeding up some image processing code. It was quite interesting, especially the Gaussian filter. I think it’s a good read for anyone interested in C++ code optimization.

So what does this Gaussian filter do?

The original code implements the filter as a separable (2 pass) convolution. It’s quite simple:

template<int width, int height>
void gauss( Img<unsigned char, width, height>& img, 
            const std::vector<float>& coeff)
{
  const int halfWindow = coeff.size()/2;
  Img<float, width, height> tmp;

  // horizontal pass: img -> tmp
  for (int y=0; y<height; y++)
    for (int x=0; x<width; x++)
    {
      float val = 0;
      for (int i=-halfWindow; i<=halfWindow; i++)
        val += img.getPixel(x+i, y) * coeff[i+halfWindow];
      tmp.setPixel(x, y, val);
    }

  // vertical pass: tmp -> img
  for (int y=0; y<height; y++)
    for (int x=0; x<width; x++)
    {
      float val = 0;
      for (int i=-halfWindow; i<=halfWindow; i++)
        val += tmp.getPixel(x, y+i) * coeff[i+halfWindow];
      img.setPixel(x, y, std::round(val));
    }
}

Continue reading