Thursday, July 14th, 2005, 1:17 pm
Excessive Use of Compression
Compression is often used unnecessarily. Two advantages can be gained by compressing files:
- Centralising a collection of files in one single file
- Reducing the overall volume of data
Tarring of files can handle the former task while the latter can have no positive effects (e.g. when compressing JPEG files that are already condensed). In some circumstances, the data being compressed is very light to begin with. Why compress it and lead to complications (indexing and searching, latency in opening files, re-ordering, etc.)? Why compress Web pages in Apache when there is still incomplete support by browsers? As regards databases, anybody whose textual content exceeds 10 MB is serious about Web development and thus will have plenty of server space to spare.
There are a few cases where use of compression can be justified. For example, genome sequencing benefits greatly from compression (DNA can be immensely big). On the contrary, text-only content, which excludes graphics and other gratuitous media, will often be small in terms of volume. Old log files on Web servers, as yet another example, barely ever get accessed (hence no latency when uncompressed) and are easily reduced in size (patterns that recur make it over 80% compressible). Lastly, backups of large data volumes (e.g. mirrors) might be worth the complications associated with compression. That is exactly when size becomes the major issue.