• NFS - only one client at a time can read files

    From David Brown@1:0/0 to All on Fri Sep 20 14:07:05 2013
    I have a strange problem with NFS.

    I have NFS serving enabled on my Fedora 14 workstation, exporting a
    directory with these options:

    (ro,no_root_squash,sync,no_subtree_check)


    I have an embedded Linux card that is getting its kernel and rootfs from
    this export over NFS. The card is then copying a subdirectory of this
    export onto a flash-mounted file system using rsync (roughly as "rsync
    -av /unpacked/ /mnt/" ).

    When I run with one card, this works fine.

    When I have multiple cards connected, each one gets its kernel and
    mounts its rootfs fine, but it seems that only one client card can read
    at a time. If I watch the progress of the rsyncs, I can see one card
    will run for a bit, then stop and complain about nfs timeouts. Another
    card will run for a bit, before it too stops with a timeout. This goes
    back and forth - at any given time, only one client is successfully reading.


    Any ideas as to what might be wrong, or what I can check, would be
    appreciated.


    David

    --- MBSE BBS v1.0.0 (GNU/Linux-i386)
    * Origin: The Kofo System II BBS telnet://fido2.kofobb
  • From unruh@1:0/0 to All on Fri Sep 20 15:15:01 2013
    On 2013-09-20, David Brown <david@westcontrol.removethisbit.com> wrote:
    I have a strange problem with NFS.

    I have NFS serving enabled on my Fedora 14 workstation, exporting a
    directory with these options:

    (ro,no_root_squash,sync,no_subtree_check)


    I have an embedded Linux card that is getting its kernel and rootfs from
    this export over NFS. The card is then copying a subdirectory of this
    export onto a flash-mounted file system using rsync (roughly as "rsync
    -av /unpacked/ /mnt/" ).

    When I run with one card, this works fine.

    When I have multiple cards connected, each one gets its kernel and
    mounts its rootfs fine, but it seems that only one client card can read
    at a time. If I watch the progress of the rsyncs, I can see one card
    will run for a bit, then stop and complain about nfs timeouts. Another
    card will run for a bit, before it too stops with a timeout. This goes
    back and forth - at any given time, only one client is successfully reading.


    Any ideas as to what might be wrong, or what I can check, would be appreciated.

    Disks have only one read head. It cannot be in two places at once.



    David

    --- MBSE BBS v1.0.0 (GNU/Linux-i386)
    * Origin: The Kofo System II BBS telnet://fido2.kofobb
  • From Chris Davies@110:110/2002 to All on Fri Sep 20 15:54:20 2013
    Reply-To: chris@roaima.co.uk

    David Brown <david@westcontrol.removethisbit.com> wrote:
    I have a strange problem with NFS.

    If I watch the progress of the rsyncs, I can see one card will run for
    a bit, then stop and complain about nfs timeouts. Another card will
    run for a bit, before it too stops with a timeout. This goes back and
    forth - at any given time, only one client is successfully reading.

    This sounds like you don't have anywhere near enough rpc/nfsd daemons
    on your NFS server.

    On my Debian box there's a file /etc/default/nfs-kernel-server that
    defines the number of kernel nfsd "processes" to start at boot time. (I
    don't know where your equivalent configuration file will live.) The
    default on my system is 8, but you probably want to increase it to 32
    or even 64.

    To test the theory, count the number of nfsd processes already running
    ps -ef | grep -w '[n]fsd' | wc -l

    And then increase it, for example from 8 to 32
    nfsd 32

    If this works, you can configure it for boot-time.
    Chris

    --- MBSE BBS v1.0.0 (GNU/Linux-i386)
    * Origin: Roaima. Harrogate, North Yorkshire, UK (110:110/2002@linuxnet)
  • From Tauno Voipio@110:110/2002 to All on Fri Sep 20 17:37:13 2013
    On 20.9.13 5:07 , David Brown wrote:
    I have a strange problem with NFS.

    I have NFS serving enabled on my Fedora 14 workstation, exporting a
    directory with these options:

    (ro,no_root_squash,sync,no_subtree_check)


    I have an embedded Linux card that is getting its kernel and rootfs from
    this export over NFS. The card is then copying a subdirectory of this
    export onto a flash-mounted file system using rsync (roughly as "rsync
    -av /unpacked/ /mnt/" ).

    When I run with one card, this works fine.

    When I have multiple cards connected, each one gets its kernel and
    mounts its rootfs fine, but it seems that only one client card can read
    at a time. If I watch the progress of the rsyncs, I can see one card
    will run for a bit, then stop and complain about nfs timeouts. Another
    card will run for a bit, before it too stops with a timeout. This goes
    back and forth - at any given time, only one client is successfully reading.


    Any ideas as to what might be wrong, or what I can check, would be appreciated.


    David


    This feels like rsync is doing exclusive access to the concerned files,
    to prevent shooting at a moving target.

    For boot disk copying, an image file and dd may be better.

    --

    Tauno Voipio


    --- MBSE BBS v1.0.0 (GNU/Linux-i386)
    * Origin: A noiseless patient Spider (110:110/2002@linuxnet)
  • From Chris Davies@110:110/2002 to All on Sat Sep 21 13:00:57 2013
    Reply-To: chris@roaima.co.uk

    Tauno Voipio <tauno.voipio@notused.fi.invalid> wrote:
    This feels like rsync is doing exclusive access to the concerned files,
    to prevent shooting at a moving target.

    I've never seen rsync grab exclusive access to files. It could more likely occur over SMB/CIFS, which provides file locking by default, but not
    over NFS.

    Chris

    --- MBSE BBS v1.0.0 (GNU/Linux-i386)
    * Origin: Roaima. Harrogate, North Yorkshire, UK (110:110/2002@linuxnet)
  • From Tauno Voipio@110:110/2002 to All on Sat Sep 21 15:18:52 2013
    On 21.9.13 4:00 , Chris Davies wrote:
    Tauno Voipio <tauno.voipio@notused.fi.invalid> wrote:
    This feels like rsync is doing exclusive access to the concerned files,
    to prevent shooting at a moving target.

    I've never seen rsync grab exclusive access to files. It could more likely occur over SMB/CIFS, which provides file locking by default, but not
    over NFS.

    Chris


    Thanks for correcting. I was too lazy to wade the sources.

    --

    -Tauno


    --- MBSE BBS v1.0.0 (GNU/Linux-i386)
    * Origin: A noiseless patient Spider (110:110/2002@linuxnet)
  • From David Brown@1:0/0 to All on Mon Sep 23 07:28:35 2013
    On 20/09/13 17:15, unruh wrote:
    On 2013-09-20, David Brown <david@westcontrol.removethisbit.com> wrote:
    I have a strange problem with NFS.

    I have NFS serving enabled on my Fedora 14 workstation, exporting a
    directory with these options:

    (ro,no_root_squash,sync,no_subtree_check)


    I have an embedded Linux card that is getting its kernel and rootfs from
    this export over NFS. The card is then copying a subdirectory of this
    export onto a flash-mounted file system using rsync (roughly as "rsync
    -av /unpacked/ /mnt/" ).

    When I run with one card, this works fine.

    When I have multiple cards connected, each one gets its kernel and
    mounts its rootfs fine, but it seems that only one client card can read
    at a time. If I watch the progress of the rsyncs, I can see one card
    will run for a bit, then stop and complain about nfs timeouts. Another
    card will run for a bit, before it too stops with a timeout. This goes
    back and forth - at any given time, only one client is successfully reading.


    Any ideas as to what might be wrong, or what I can check, would be
    appreciated.

    Disks have only one read head. It cannot be in two places at once.


    Disks are not the bottleneck. The whole shared area is small enough to
    be in ram cache on the server.



    --- MBSE BBS v1.0.0 (GNU/Linux-i386)
    * Origin: The Kofo System II BBS telnet://fido2.kofobb
  • From David Brown@1:0/0 to All on Mon Sep 23 07:34:16 2013
    On 21/09/13 15:00, Chris Davies wrote:
    Tauno Voipio <tauno.voipio@notused.fi.invalid> wrote:
    This feels like rsync is doing exclusive access to the concerned files,
    to prevent shooting at a moving target.

    That would sound right - except that I agree with Chris' point below
    that rsync does not lock files in any way. (I've often seen large
    rsyncs end with a message saying that some files changed during the
    rsync run.)


    I've never seen rsync grab exclusive access to files. It could more likely occur over SMB/CIFS, which provides file locking by default, but not
    over NFS.

    Chris



    --- MBSE BBS v1.0.0 (GNU/Linux-i386)
    * Origin: The Kofo System II BBS telnet://fido2.kofobb
  • From David Brown@1:0/0 to All on Mon Sep 23 07:44:24 2013
    On 20/09/13 17:54, Chris Davies wrote:
    David Brown <david@westcontrol.removethisbit.com> wrote:
    I have a strange problem with NFS.

    If I watch the progress of the rsyncs, I can see one card will run for
    a bit, then stop and complain about nfs timeouts. Another card will
    run for a bit, before it too stops with a timeout. This goes back and
    forth - at any given time, only one client is successfully reading.

    This sounds like you don't have anywhere near enough rpc/nfsd daemons
    on your NFS server.

    On my Debian box there's a file /etc/default/nfs-kernel-server that
    defines the number of kernel nfsd "processes" to start at boot time. (I
    don't know where your equivalent configuration file will live.) The
    default on my system is 8, but you probably want to increase it to 32
    or even 64.

    To test the theory, count the number of nfsd processes already running
    ps -ef | grep -w '[n]fsd' | wc -l

    And then increase it, for example from 8 to 32
    nfsd 32

    If this works, you can configure it for boot-time.
    Chris


    This sounds like a possible explanation. A quick check shows that I
    have 8 nfsd threads running. Rsync almost certainly needs several
    connections while it is working, as it runs through the source tree to
    see what it should be copying - contention for the nfs connection
    threads could be the cause.

    I'll have to translate your commands here from "Debian" into "Fedora
    14", but now that I know what I am looking for, google can help with the translation. Later on, this whole thing will run on a debian server -
    but at the moment it is prototyping on my (outdated) Fedora desktop.

    Additionally, the mere act of talking about the problem has suggested an alternative solution. I am copying a bunch of data from one computer to another computer using rsync. Why not just use an rsync server? (The historical answer is that the copy was originally a "cp -a" rather than
    "rsync -a".)

    Thanks for the help,

    David





    --- MBSE BBS v1.0.0 (GNU/Linux-i386)
    * Origin: The Kofo System II BBS telnet://fido2.kofobb
  • From David Brown@1:0/0 to All on Mon Sep 23 09:51:45 2013
    On 23/09/13 09:44, David Brown wrote:
    On 20/09/13 17:54, Chris Davies wrote:
    David Brown <david@westcontrol.removethisbit.com> wrote:
    I have a strange problem with NFS.

    If I watch the progress of the rsyncs, I can see one card will run for
    a bit, then stop and complain about nfs timeouts. Another card will
    run for a bit, before it too stops with a timeout. This goes back and
    forth - at any given time, only one client is successfully reading.

    This sounds like you don't have anywhere near enough rpc/nfsd daemons
    on your NFS server.

    On my Debian box there's a file /etc/default/nfs-kernel-server that
    defines the number of kernel nfsd "processes" to start at boot time. (I
    don't know where your equivalent configuration file will live.) The
    default on my system is 8, but you probably want to increase it to 32
    or even 64.

    To test the theory, count the number of nfsd processes already running
    ps -ef | grep -w '[n]fsd' | wc -l

    And then increase it, for example from 8 to 32
    nfsd 32

    If this works, you can configure it for boot-time.
    Chris


    This sounds like a possible explanation. A quick check shows that I
    have 8 nfsd threads running. Rsync almost certainly needs several connections while it is working, as it runs through the source tree to
    see what it should be copying - contention for the nfs connection
    threads could be the cause.

    I'll have to translate your commands here from "Debian" into "Fedora
    14", but now that I know what I am looking for, google can help with the translation. Later on, this whole thing will run on a debian server -
    but at the moment it is prototyping on my (outdated) Fedora desktop.

    Additionally, the mere act of talking about the problem has suggested an alternative solution. I am copying a bunch of data from one computer to another computer using rsync. Why not just use an rsync server? (The historical answer is that the copy was originally a "cp -a" rather than "rsync -a".)

    Thanks for the help,

    David


    I've now changed the thread count in /etc/sysconfig/nfs to 64 and
    re-starting the nfs server - it made no difference that I could see, but
    my testing was done with a copy to tmpfs on the clients rather than to
    the NAND filesystem (since that takes 20 seconds rather than 12
    minutes). So I am not convinced that the nfs threads are the whole
    answer, but can't yet rule them out. And it should certainly do no harm
    to leave them at 64.

    In the end, I am copying a compressed tarball from the server onto the
    client's tmpfs with a simple "cp" on NFS - this takes about 3 seconds.
    It will not matter if it takes x * 3 seconds for "x" cards in parallel.
    Unpacking these tarballs into the NAND is now an entirely local
    operation on the cards, and will therefore be free from any issues with
    the server or network. It is also faster even for one card.

    Other than that, I will also do testing with the network setup here.
    The cards are currently running across our main LAN, which is well
    over-due for a re-organisation after many years of "organic" growth.

    But I am happy for now with the tarball copying solution.


    --- MBSE BBS v1.0.0 (GNU/Linux-i386)
    * Origin: The Kofo System II BBS telnet://fido2.kofobb
  • From Chris Davies@110:110/2002 to All on Fri Sep 27 09:40:32 2013
    Reply-To: chris@roaima.co.uk

    David Brown <david@westcontrol.removethisbit.com> wrote:
    Additionally, the mere act of talking about the problem has suggested an alternative solution. I am copying a bunch of data from one computer to another computer using rsync. Why not just use an rsync server? (The historical answer is that the copy was originally a "cp -a" rather than "rsync -a".)

    If your file transfer is network bound then rsync as two separate
    processes (client & server) should run faster than a single process
    accessing a remote filesystem. If the bottleneck is elsewhere it won't
    help, as single-process rsync falls back to a basic copy.

    Chris

    --- MBSE BBS v1.0.0 (GNU/Linux-i386)
    * Origin: Roaima. Harrogate, North Yorkshire, UK (110:110/2002@linuxnet)
  • From David Brown@1:0/0 to All on Thu Oct 10 14:28:04 2013
    On 20/09/13 16:07, David Brown wrote:
    I have a strange problem with NFS.

    I have NFS serving enabled on my Fedora 14 workstation, exporting a
    directory with these options:

    (ro,no_root_squash,sync,no_subtree_check)


    I have an embedded Linux card that is getting its kernel and rootfs from
    this export over NFS. The card is then copying a subdirectory of this
    export onto a flash-mounted file system using rsync (roughly as "rsync
    -av /unpacked/ /mnt/" ).

    When I run with one card, this works fine.

    When I have multiple cards connected, each one gets its kernel and
    mounts its rootfs fine, but it seems that only one client card can read
    at a time. If I watch the progress of the rsyncs, I can see one card
    will run for a bit, then stop and complain about nfs timeouts. Another
    card will run for a bit, before it too stops with a timeout. This goes
    back and forth - at any given time, only one client is successfully reading.


    Any ideas as to what might be wrong, or what I can check, would be appreciated.


    David


    I figured out my problem - I'm noting it here in case anyone ever reads
    these as archives.

    It turned out that there was a configuration fault in the rootfs I had
    mounted, leading to all the cards getting the same fixed IP address
    shortly after root was mounted. Different cards therefore got contact
    with the server according to who answered the ARP requests first - I'm surprised everything worked in the end. The fix was obviously quite
    simple once I had found the problem (thanks to wireshark and a managed
    switch with port mirroring).


    --- MBSE BBS v1.0.0 (GNU/Linux-i386)
    * Origin: The Kofo System II BBS telnet://fido2.kofobb