--- /dev/null
+Short README, perhaps not too useful.
+
+qscale (the "q" is for "quick" -- the confusion with "quantization scale" is
+unfortunate, but there are only so many short names in the world) is a fast
+JPEG-to-JPEG-only up- and downscaler. On my 1.2GHz Core Duo laptop (using
+only one core), qscale is 3-4 times as fast as ImageMagick for downscaling
+large JPEGs (~10Mpix from my digital camera) to more moderate-sized JPEGs
+for web use etc. (like 640x480) without sacrificing quality. (Benchmarking must
+be done with a bit of care, though, in particular due to different subsampling
+options possible etc.) Most of the time in qscale's case is used on reading in
+the original image using libjpeg, which is shared among the two. However, it
+would probably not be realistic to exclude the libjpeg time, as most (although
+not all) real-world scaling tasks would indeed need to read and decode the
+source JPEG.
+
+Note: This is not to denounce ImageMagick in any way. It is a fine library,
+capable of doing much more than qscale can ever hope to do. Comparison between
+the two are mainly to provide a well-known reference, and to demonstrate that
+more specific tools than usually be made faster than generic tools.
+
+qscale is not novel in any way, nor is it perfect (far from it; it's more of a
+proof of concept than anything else) -- it is mainly a piece of engineering.
+However, the following techniques deserve some kind of mention:
+
+ - qscale recognizes that JPEGs are usually stored in the YCbCr colorspace and
+ not RGB. (ImageMagick can, too, if you give it the right flags, but not all
+ its operations are well-defined in YCbCr.) Although conversion between the
+ two is cheap, it is not free, and it is not needed for scaling. Thus, qscale
+ does not do it.
+ - qscale recognizes that JPEGs are stored with the color channels mostly
+ separate (planar) and not chunked. Scaling does not need to be done on
+ chunked data -- in fact, mostly, scaling is easier to do on planar data.
+ Thus, no conversion to chunked before scaling (and no need to convert back
+ to planar afterwards). (Note: Some SIMD properties might be easier to
+ exploit on a chunked representation. It's usually not worth it in total,
+ though.)
+ - qscale can utilize the SSE instruction set found in almost all modern
+ x86-compatible processors to do more work in the same amount of instructions
+ (It can also use the SSE3 instruction set, although the extra boost on top
+ of SSE is smaller. In time, it will utilize the SSE extensions known as
+ SSE4.1, which can help even more.) It takes care to align the image data and
+ memory accesses on the right boundaries wherever it makes sense.
+ - qscale recognizes (like almost any decent scaling program) that most
+ practical filter kernels are separable, so scaling can be done in two
+ sequential simpler passes (horizontal and vertical) instead of one. The
+ order does matter, though -- I've found doing the vertical pass (in
+ cache-friendly order, doing multiple neighboring pixels at a time to
+ exploit that the processor reads in entire cache lines and not individual
+ bytes at a time) before the horizontal to be superior, in particular
+ because this case is easier to SIMD-wise.
+ - qscale understands that JPEGs are typically subsampled; ie., that the
+ different color components are not stored at the same resolution. On
+ the web, this is typically because the eye is less sensitive to color
+ (chroma) information and as such much of it can safely be stored in
+ a lower resolution to reduce file size without much visible quality
+ degradation; in the JPEGs stored by a digital camera, it is simply
+ because much of the color information is interpolated anyway (since
+ the individual CCD dots are sensitive to either red, green or blue,
+ not all at the same time), so it would not make much sense to pretend
+ there is full color information. qscale does not ask libjpeg to
+ interpolate the "missing" color information nor to downscale the
+ already-downscaled color channels as ImageMagick does, but instead
+ does a single scaling pass from the original resolution to the final
+ subsampled resolution. (This is impossible for any program working
+ in RGB mode, or chunked YCbCr.) This increases both speed and quality,
+ although the effect on the latter is not particularly large.
+
+The following optimizations are possible but not done (yet?):
+
+ - qscale does not do the IDCT itself, even though there is improvement
+ potential over libjpeg's IDCT. (There is an unmaintained and little-used fork
+ of libjpeg called libjpeg-mmx that demonstrates this quite well.) In fact,
+ since the DCT can be viewed as just another (separable, but not
+ time-invariant) FIR filter, the quantization scaling and IDCT could probably
+ be folded into the scaling in many cases, in particular those where the
+ filter kernel is large (ie. large amounts of scaling).
+ - qscale does not use multiple processors or cores (although different cores
+ can of course work on different images at the same time).
+ - qscale does not make very good use of the extra eight SSE registers found
+ on 64-bit x86-compatible (usually called amd64 or x86-64) machines. In
+ fact, out of the box it might not even compile on such machines.
+
+qscale is Copyright 2008 Steinar H. Gunderson <sgunderson@bigfoot.com>, and
+licensed under the GNU General Public License, version 2. The full text of
+the GPLv2 can be found in the included COPYING file.