Wednesday, February 07, 2007

over 60% performance boost for "tar xf"

"tar" is a commonly used single threaded utility for generating and extracting archives. With recent advances in filesystem implementations like e.g. Sun Solaris' ZFS, I read several time the wish that tar really should be multi-threaded to get better performance.

Well, I gave it a shot and wrote a preloadable binary patch that hands of write and close requests to worker threads. Unfortunately, I don't have a ZFS with many spindles, but I wanted to know if there might be a performance boost. So I tried it on a memory based TMPFS on a 4 processor Sun Fire V440 with 16GB of RAM running Solaris. I extracted a 1GB tar file first with the patched version of tar and then in single threaded mode. Result: over 60% performance improvement for the multi-threaded version.

I'd say it was worth the effort writing the patch. If you'd like to use it, too, you can get it here.

Thursday, July 20, 2006

Scalability prediction with sysstat

"How many users can run their applications on this machine?"

This kind of question comes up often, but determining a precise value is hard. A key metric for getting this number is the amount of memory needed by every user. Unfortunately it's difficult to find out how much memory this is, because certain pages (e.g. code, read-only data) are shared among different processes on Solaris, which is of course a Good Thing TM.

In most cases the highest impact on exclusively used memory have stack and heap. So if you are curious how much heap and stack your users are using, try sysstat. You can get it here.

I'd be happy to hear from you, if you simply like sysstat, want to report a bug, or have a proposal for an enhancement.

Thursday, May 11, 2006

The network is the system... - the computer, says Sun.

Well, I think so, too. So I gave my sysstat utility a major overhaul and added multicast support. Now one can monitor several machines on the network from a single terminal or as soon marketing might say, the whole computer - really convenient!

Just start sysstat with option -d on all machines, you want to monitor. Then start it without an option. Then all hosts were a sysstat were started will pop up - even the ones were a colleage started it (with or without option -d).

Get it here.

Monday, May 08, 2006

What drills, scales, and OSs have in common...

In the past week I had an interesting discussion in a German Usenet group. The person who started the thread, actually had a question about broken core files of a multithreaded application on Linux. I pointed him to this LWN article (section: "write(), thread safety, and POSIX") to emphasize that Linux has several
shortcomings in this area and told him that he might be hitting one of them.

In the following discussion (never say that Linux has a weakness), someone pointed out that writing to one and the same file from multiple threads without explicit mutual exclusion is a 'fringe case'. Therefore, his reasoning was that, although Linux was in violation of the SUS spec, support for atomic writes wouldn't be worth the effort. I remind him that SUS requires write, not only to be thread-safe, but additionally, to be async-signal-safe, and that one cannot ensure atomicity of writes within signal handlers using mutexes. His response was that using write from a signal was bad design and therefore also need not be supported. After saying that writing log messages or transactions to a file from multiple threads are applications that really need this support, especially when file descriptors are shared among independent processes, I didn't get another answer.

Obviously one can argue about fixing this issue, because most applications can work around it. So, while I believe any violation of the SUS spec is a bug that should be fixed, there are people, especially among the Linux kernel developers, that disagree. So it really seems to be a matter of taste.

Thinking about this for some time, this situation strongly reminded me of people, who believe that their 10 Euro bathroom scales from the discounter around the corner, really has 100g precision or the one in the kitchen even 1g. Yes, they show weight in a 1g or 100g granularity, but the precision is usually (unless you spend a premium) much worse. Even, if one had a kilogram of diamonds or gold, you wouldn't be weighing them with such a scale, would you? ;-)

The same accounts for drills: you can get a percussion drill for as low as 30 Euro. The other end is possibly available from Hilti, who is able to charge two to three thousand Euro. But the professional ones are made for running continuously 10 hours a day, 365 days a year, the specified output power really goes into drilling and not into creating noise, and more.

OSs nowadays come for free, so they all seem to have the same price. But although Linux has a system call named write, it doesn't mean that it conforms to POSIX although it might even have the same API. Remember, write can be implemented differently, as weighing masses can be, too. So why not use the right tool for the problem at hand? Have you ever heard of a craftsman being fond of working with a 50 Euro drill from a discounter? Some might have tried and they stay really silent, because they found out that it is much more expensive, if you take the
shortcomings into account.

So work like a professional craftsman and employ an OS that is build for multithreaded applications, makes guarantees about write and every other systemcall, comes with detailed documentation and source code. Use Solaris.

P.S.: Click here to take a look at the write system call of Solaris, which obviously realizes mutual exclusion for multiple threads:

P.P.S.: Concerning precision - high resolution time stamp counter are synchronized on SPARC, but not on x86.

Tuesday, April 25, 2006

How missing features sell stuff...

Developing software on and for Solaris can be real fun, considering all the tools available. OTOH, one is permanently confronted with the situation that there is a large installation base of Solaris 8 and 9. So people keep asking "Does this run on Solaris 8?".

Well, how could I be able to answer this question without testing or guessing? So what can I do? I don't really want to guess! Should I install Solaris 8 on a separate disk or get another machine? What about the idea installing Solaris 8 in a zone in Solaris 10? Sounds great and convenient, doesn't it? But unfortunately, I got the answer on that this won't work. Too sad!

To come back to the topic: I just bought another SPARC, because Solaris is _missing_ this feature. What a weird world!

Let's look at this situation once more. Could it be that Sun has already made Solaris too good to be able to sell new machines and get customers to install the new Solaris 10, which is actually free of charge? Looking at the software vendor with the biggest market share, my theory seems to be confirmed. They can easily sell their new products, as people keep hoping that everything gets better in the next release. How sad.

BTW: the same seems to be true for all kinds of products. When did you buy new HIFI equipment? I don't think of MP3 players or something the like, but of real quality! Was it 10 years ago? 20? Today you even have a hard time buying real good quality Stereo amplifiers without tuner and lot's of things most people don't use and which certainly are no good for sound quality.

Now I sit and wait for Christmas - erh - for my Ultra 60 to arrive. Somehow I am happy although I spent money for something that could have been achieved in a more economic way. But what did I buy? An about five year old machine - you wouldn't dare doing this with a PeeCee. Again, probably to good quality...

To Sun: find out what people are willing to pay for at regular intervals and get it sold to them. Lower the quality if needed. It would be too sad if you went bankrupt, because the quality is too good. Think of HP's printers. They were once too good (my father still has an about 8 years old HP5MP).

Just my $0.02...

Tuesday, April 18, 2006

all key metrics at a glance

When one wants to know what is happening on a system, most people start top. Solaris users have prstat, which has some advantages, but misses some values listed by top. So what do you do? I tell you what I have been doing all the time: start all of the following in seperate xterms: vmstat, iostat, prstat.

This is very informative, but presents values I don't really care about and misses some I would like to have. So I wrote sysstat that presents all metrics I want at a single glance. Look here:

If you like this output, you can get the sources of the software here.

Thursday, April 13, 2006

xjobs & DTrace

After having integrated all the features that I thought to be most important for xjobs, I considered giving it some tuning. Some things came right to my mind (object caching, more efficient token passing, and so on). But then I wanted to see, what DTrace could tell me.

Unfortunately, I forgot to log my tests directly, so I reran the tests against the previous release for this blog. The old release misses the object cache and token tunings that I included before considering DTrace, but the numbers should be roughly the same, as the D script only looks at syscall times.

I used the following, fairly simple script that sums up the times spent in syscalls:

#!/usr/sbin/dtrace -s
/ execname == $$1 /
self->ts = timestamp;
/ self->ts /
@[probefunc] = sum(timestamp - self->ts);
self->ts = 0;

Then I did the following:

maierkom@aquila:~$ systime xjobs
dtrace: script '/home/maierkom/bin/sun4/systime' matched 460 probes
maierkom@aquila:~$ bg
[1] systime xjobs &
maierkom@aquila:~$ /usr/bin/ls -1 /usr/sbin | src/xjobs-20060412/xjobs -v2 echo >> & /tmp/12
maierkom@aquila:~$ fg
systime xjobs

issetugid 8500
fstat64 11800
setcontext 14700
sysconfig 18400
ioctl 21400
read 30300
getcwd 43400
brk 62600
stat 95700
resolvepath 110900
memcntl 158400
schedctl 2193000
lwp_self 2504800
getrlimit 3253400
fstat 4133900
getpid 4888400
lwp_sigmask 4929800
setpgrp 5193200
unlink 6338700
fcntl 7303600
lstat64 10207400
mmap 12543500
munmap 12776900
write 15068100
open 19000800
exece 432717100
access 787848500
fork1 1721425000
close 87198287700
waitsys 103746109400

OK, waitsys is big. I guess that's OK. But WTF is the matter with close? xjobs has to close all open filedescriptors that are not needed by the jobs forked. So I changed the filedescriptor handling, and set all
descriptors to be closed on exec. Additionally, access uses a lot of
time. When the utility to execute is given on the command line of xjobs,
one only needs to search the PATH once. OK, then let's do this caching.

After adding these changes and some the ones mentioned above concerning
the user code, I get the following results:

dtrace: script '/home/maierkom/bin/sun4/systime' matched 460 probes
maierkom@aquila:~$ bg
[1] systime xjobs &
maierkom@aquila:~$ /usr/bin/ls -1 /usr/sbin | src/xjobs-20060413/xjobs -v2 echo > & /tmp/13
maierkom@aquila:~$ fg
systime xjobs

getrlimit 7100
issetugid 8700
fstat64 13900
setcontext 17400
sysconfig 19800
ioctl 23700
read 36500
getcwd 57200
brk 73300
resolvepath 106500
stat 110700
memcntl 185800
schedctl 1784700
lwp_self 2381700
fstat 2850800
setpgrp 3608200
lwp_sigmask 4380400
getpid 5376800
unlink 5641200
access 5886200
munmap 6634300
mmap 7548900
lstat64 7577200
fcntl 12572000
write 16043800
open 17121400
close 19644800
waitsys 32769800
fork1 119443300
exece 368008400

That looks much better. Now, let's take a look at the overall numbers:

maierkom@aquila:~$ /usr/bin/ls -1 /usr/sbin | timex src/xjobs-20060412/xjobs -v2 echo > /tmp/12

real 6.86
user 17.18
sys 4.81

maierkom@aquila:~$ /usr/bin/ls -1 /usr/sbin | timex src/xjobs-20060413/xjobs -v2 echo > /tmp/13

real 0.68
user 0.13
sys 0.48

WOW! DTrace you made my day! Factor ten for real and system time. xjobs is now really fast.

Car stereo requirements

Having a new car stereo system that plays MP3 CDs is really a Nice Thing(TM). But what's odd about it, are the new requirements that come up.

It started, when I began encoding some of my favorite CDs, so I could listen to them when driving. First I did everything by hand using cdda2wav and lame. Thinking, I would be probably doing it again and again, I wrote a shell script to make it a little bit more convenient and automate the process of looking up the song title, interpret and album name and encode it into the filename and directory.

Fine, but then I thought: my machine has two processors, I could start two lame processes at once and get the songs encoded to mp3 in half the time. Looking at the Solaris board utilities, I couldn't find anything apropriate. Here was my odd requirement. So I wrote xjobs and published it under GPLv2. Get it here.

Now my shell scripts pipes its output directly to xjobs, and xjobs takes care that there are always two lame processes running. Convenient and fast.