Re: More "news" Roy mysteriously doesn't "publish"

Home	Messages Index

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index

Re: More "news" Roy mysteriously doesn't "publish"

Subject: Re: More "news" Roy mysteriously doesn't "publish"
From: Rex Ballard <rex.ballard@xxxxxxxxx>
Date: Thu, 26 Mar 2009 16:01:13 -0700 (PDT)
Bytes: 6430
Complaints-to: groups-abuse@xxxxxxxxxx
Injection-info: w34g2000yqm.googlegroups.com; posting-host=75.220.69.87; posting-account=-EkKmgkAAAAxynpkobsxB1sKy9YeqcqI
Newsgroups: comp.os.linux.advocacy
Organization: http://groups.google.com
References: <f9tqucqmk3w5$.dlg@xxxxxxxxxxxxxxx>
User-agent: G2/1.0
Xref: ellandroad.demon.co.uk comp.os.linux.advocacy:751369

On Mar 25, 12:00 pm, Erik Funkenbusch <e...@xxxxxxxxxxxxxxxxxxxxxx>
wrote:
> Linus Torvolds calls ext3 "moronic".

> http://lkml.org/lkml/2009/3/24/460

> "This is why I absolutely _detest_ the idiotic ext3 writeback behavior. It
> literally does everything the wrong way around - writing data later than
> the metadata that points to it. Whoever came up with that solution was a
> moron. No ifs, buts, or maybes about it."

Linus is right is the purist sense.  The reality is that many SAN
storage systems maintain enough power for long enough to write the
data to a "panic area" if the power is about to fail.  This means that
the metadata only needs to know where in the panic buffer the ext3
driver stored the panic buffer.

Writing both medatada (index, directory, inode) and physical data at
the same time means you have to make two separate disk write calls
concurrently.  If you cache the cylinder and write the metadata as it
increments, you can wait until you've filled the cylinder and then
write a big buffer all at once.

The purist, for example the Sun architect, would tell you that there
is still the possibility that you could lose power before you had a
chance to write the cylinder (about 10 ms).  This is theoretically
true, but very often pragmatic choices are made to accelerate the
system based on secondary functionality.

If you are using a SAS or SATA-II drive, writing the entire buffer in
a single rotation of the hard drive is trivial and you really don't
have the ability to force the write earlier anyway.  If you have a 16
megabyte buffer in your hard disk, and you make repeated writes to the
metafile of only 1-2 sectors and you also make 5 writes of 1 megabyte,
there is a very good chance that the larger write will be queued later
anyway.

The irony is that Linus was the one who found dozens of ways to
accelerate *nix by using the MMU mapping rather than physically
copying memory and by allocating optimal chunks of memory (big and
small blocks in separate pools) using memory mapping.

I remember the days when we had direct access to the drive stepper
motors on those old ST-506 drives, and we didn't have cache and MMU
sitting between the ALU and the memory bus, but these days, many
things that don't seem "optimal" actually are because you eliminate
latentcy.

Modern processors have several layers of cache, the disk drives have
layers of cache depending on the drive type (SATA or SAS) the storage
system used (SAN or NAS) , and the nature of the application (random
access read-write like databases, or sequential files).

Linux has always done well precisely because it doesn't have to be
intricately tuned for performance.  Remember the days when you had to
explicitly allocate shared memory, semaphores, disk buffers, and all
the other "fine tuning" on BSD, SysV, Solaris 7, or HP_UX.  If you
tuned it right, you got awesome performance, but if you got even one
parameter wrong, your system could slow to a crawl due to resource
bottlenecks.

Microsoft has made huge improvemements in this area, especially in the
transition from Windows NT 4.0 to Windows 2000.  In terms of pure
performance, Windows 2000 was probably the best system Microsoft ever
released.  Too bad Gates and Ballmer decided they had to knock 3rd
party firewalls and anti-malware off the desktop to keep their back
doors open.

Vista would probably be much faster if they didn't have to index the
hell out of the ENTIRE hard drive every time somebody puts ANYTHING on
the hard drive (even temporary browser images and html files).

If Ballmer really wants to kill Linux, he'll take all that extra junk
out of there, close the back doors that let the viruses in in the
first place, and let third party vendors compete for the prize of
guarding the desktop.

He might even consider making the bulk of the system read-only to
users and only allow them to write to documents and settings and My
Documents - preferably more like /home on Linux.  Another hint, long
file names loaded with spaces is very hostile to pretty much any 3rd
party scripting language, since variable substitution breaks.

Back to the original topic.

Linus is ironically taking the same position that Tannenbaum used to
take against Linux and it's "fat kernel" as opposed to the Mach
microkernel.  Ironically, it was Linus' ability to exploit the
processor's cache, MMU, and eliminate buffering that gave it the
technological advantage over traditional BSD, UNIX, and Mach  based
kernels.

Many vendors now use many of the techniques used by Linux and his team
in their own kernels.

Some of those ideas, such as passing interrupt events in message
queues even go back to mainframs.

Quibbling over whether you should write out a 4k disk buffer every
time you update a 512 byte metadata sector in the days of quad core
cpus with 3 layers of cache, intelligent drive controllers such as
SATA or SAS, and intelligent cached drives such as SATA or SAS arrays,
is a bit like Henry Ford still insisting that Manual spark advance and
mixture adjustments were better than the automated techniques used on
the Model A and it's successors.

Follow-Ups:
- Re: More "news" Roy mysteriously doesn't "publish"
  - From: Chris Ahlstrom

[Date Prev]	[Date Next]	[Thread Prev]	[Thread Next]

Author Index	Date Index	Thread Index