|
|
| Next: porting DOS to Linux |
| Author |
Message |
Fred Zwarts External

Since: Sep 21, 2006 Posts: 6
|
Posted: Wed Aug 01, 2007 4:17 pm Post subject: Problem with NFS caching Archived from groups: comp>os>linux>development>system (more info?) |
|
|
We have a data-acquisition system that needs to acquire data at a steady continuous rate
(of a few MB/s) and write the data to a file on a NFS server. We have measured that
NFS is able to sustain a long-term average rate of more than 20 MB/s, so it should not
be a problem.
However, the problem we see is that our system runs for some time (of the order of
1 minute) and then it stops completely for a few seconds, then is runs again for some
time, stops again for a few seconds, etc. During the time that the system stops,
some measurements are missed by the acquisition part of the system.
If the system writes to a file on a local disk, this problem is not seen, but the idea is
that the acquisition system should run an a small embedded diskless system.
If we monitor the Linux system, we see that during the seconds that the acquisition
stops there is a lot of network activity. The system then sends a lot of data.
We think that this is due to the way that the NFS client writes the cached data.
It seems that during the cache flush from the NFS client to the NFS-server,
any write to the file is blocked.
This hypothesis is supported by the fact that we see that the sync command
(or a call to the sync function) also stops the acquisition for a few seconds
and causes a peak in the network traffic.
I have been reading some documents about the NFS protocol. I understand
that there is a good reason why the NFS clients keeps as much data as
possible in its cache before updating the NFS server. This reduces the network
traffic. What I do not understand is why writing to a file is blocked during the
cache flush. Why doesn't it use e.g. a double buffering scheme? This would improve
the performance a lot (because it allows an overlap of flushing one buffer and
filling the next one) and would also allow for a steady continuous data write rate.
I have seen this behavior on two very different Linux NFS clients:
An ELinOS V4.0 system (kernel version 2.6.15) running on a PowerPC processor
(embedded in a VME system) and a Suse Enterprise V10 system (kernel version
2.6.16.27-0.9-smp) on an AMD-64 dual-core processor (in a desktop PC system).
I cannot reproduce it on a Fedora core 3 system (kernel version 2.6.9-1.667smp)
on a Pentium processor (in a desktop PC system).
I wonder whether this problem is present only in certain Linux versions.
Is this a bug or is it designed as such?
I further wonder whether there are parameters in the NFS client that can change this
behavior.
I would appreciate to hear suggestions for solving this problem.
(Now that I found this problem, I think that I also understand why compiling and
linking large systems takes so much time on some systems. There I also see periods
of several seconds in which there is almost no CPU time consumption and a lot of
network traffic, probably caused by the same problem.)
Regards,
Fred.Zwarts. |
|
| Back to top |
|
 |
phil-news-nospam External

Since: Nov 16, 2006 Posts: 329
|
Posted: Wed Aug 01, 2007 4:31 pm Post subject: Re: Problem with NFS caching [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Wed, 1 Aug 2007 16:17:26 +0200 Fred Zwarts <F.Zwarts.RemoveThis@kvi.nl> wrote:
| We have a data-acquisition system that needs to acquire data at a steady continuous rate
| (of a few MB/s) and write the data to a file on a NFS server. We have measured that
| NFS is able to sustain a long-term average rate of more than 20 MB/s, so it should not
| be a problem.
| However, the problem we see is that our system runs for some time (of the order of
| 1 minute) and then it stops completely for a few seconds, then is runs again for some
| time, stops again for a few seconds, etc. During the time that the system stops,
| some measurements are missed by the acquisition part of the system.
| If the system writes to a file on a local disk, this problem is not seen, but the idea is
| that the acquisition system should run an a small embedded diskless system.
|
| If we monitor the Linux system, we see that during the seconds that the acquisition
| stops there is a lot of network activity. The system then sends a lot of data.
|
| We think that this is due to the way that the NFS client writes the cached data.
| It seems that during the cache flush from the NFS client to the NFS-server,
| any write to the file is blocked.
| This hypothesis is supported by the fact that we see that the sync command
| (or a call to the sync function) also stops the acquisition for a few seconds
| and causes a peak in the network traffic.
|
| I have been reading some documents about the NFS protocol. I understand
| that there is a good reason why the NFS clients keeps as much data as
| possible in its cache before updating the NFS server. This reduces the network
| traffic. What I do not understand is why writing to a file is blocked during the
| cache flush. Why doesn't it use e.g. a double buffering scheme? This would improve
| the performance a lot (because it allows an overlap of flushing one buffer and
| filling the next one) and would also allow for a steady continuous data write rate.
|
| I have seen this behavior on two very different Linux NFS clients:
| An ELinOS V4.0 system (kernel version 2.6.15) running on a PowerPC processor
| (embedded in a VME system) and a Suse Enterprise V10 system (kernel version
| 2.6.16.27-0.9-smp) on an AMD-64 dual-core processor (in a desktop PC system).
| I cannot reproduce it on a Fedora core 3 system (kernel version 2.6.9-1.667smp)
| on a Pentium processor (in a desktop PC system).
| I wonder whether this problem is present only in certain Linux versions.
| Is this a bug or is it designed as such?
| I further wonder whether there are parameters in the NFS client that can change this
| behavior.
| I would appreciate to hear suggestions for solving this problem.
|
| (Now that I found this problem, I think that I also understand why compiling and
| linking large systems takes so much time on some systems. There I also see periods
| of several seconds in which there is almost no CPU time consumption and a lot of
| network traffic, probably caused by the same problem.)
This is (one reason) why I avoid using NFS altogether. I would rather have
a specific protocol and a daemon on a server handling it. In some cases it
can be a file transfer protocol and daemon (HTTP/POST, HTTP/PUT, FTP, RSYNC).
In others it might be custom.
--
|---------------------------------------/----------------------------------|
| Phil Howard KA9WGN (ka9wgn.ham.org) / Do not send to the address below |
| first name lower case at ipal.net / spamtrap-2007-08-01-1129.RemoveThis@ipal.net |
|------------------------------------/-------------------------------------| |
|
| Back to top |
|
 |
David Schwartz External

Since: Jun 01, 2007 Posts: 87
|
Posted: Fri Aug 03, 2007 4:00 pm Post subject: Re: Problem with NFS caching [Login to view extended thread Info.] Archived from groups: per prev. post (more info?) |
|
|
On Aug 1, 9:31 am, phil-news-nos... DeleteThis @ipal.net wrote:
> This is (one reason) why I avoid using NFS altogether. I would rather have
> a specific protocol and a daemon on a server handling it. In some cases it
> can be a file transfer protocol and daemon (HTTP/POST, HTTP/PUT, FTP, RSYNC).
> In others it might be custom.
Yeah. I think he has four choices:
1) Use an application protocol (ftp, scp, rsync, or custom) rather
than NFS.
2) Try other remote network protocols like CIFS.
3) Use a custom or semi-custom NFS client in your application.
4) Try to tweak/tune NFS to use less/no kernel buffering and buffer
the writes in your application.
DS |
|
| Back to top |
|
 |
|
|
|
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
| |
|
|