• SVDM - outbound calling issue

    From Deon George@1:103/705 to GitLab issue in main/sbbs on Wed Dec 21 20:50:08 2022
    open https://gitlab.synchro.net/main/sbbs/-/issues/468

    As per the discussion in Synchronet Discussion.SVDM is stops outbound calling after some time - where the mailer using it thinks the other end is "sleeping". It looks like outbound data is sent, but incoming data is not seen by SVDM and passed onto the mailer.However, inbound calling works fine, and if the inbound call causes the mailer to exit (and start the BBS), upon return, outbound calling works again for a period of time.Naturally stopping the mailer (and thus exiting SVDM) and restarting works fine.Attached is a log from dbgview, it shows:* After a fresh start of SVDM, outbound polling works successfully. I made several outbound polls after each other and they were all OK. See time stamp 08:33:57 for the first call, which was successful and completed 08:34:21 (not mail was actually transferred - so it was all EMSI chat). There were a couple of other polls up to 08:37:28 which were successful.* At 15:21:44 and outbound call failed. (From the other end, I can see the incoming connection and the start of the EMSI setup, but no response to it - and the calling mailer didnt see the chatter.What's obvious for me:* On the successful calls, there are two "recv returned 107" (probably as part of the EMSI setup), on the failed call there is only 1.* On the failed call SVDM is "TX: 2 bytes" twice a second, where on the successful calls this is normally in done twice, followed by a "TX: 15 bytes".Hope this helps[SVDM-DEON.zip](/uploads/2ee3fe679ac85bce0e2244b0b7c847cb/SVDM-DEON.zip)
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Fri Dec 23 11:22:51 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2959

    Are you using the receive data rate option (-r on the command-line, or "Rate" in the svdm.ini file) or the -R command-line option?It appears you are running with the debug (-d) option, so I would expect the "rx_delay" setting value to be output to the debug log when SVDM terminates. Normally, this value would be 0.I just committed a change that adds the clock values to several of the debug messages that will help me to eliminate the rx_delay setting/checking as the root cause of the issue (or not). Please run again with a build using that commit, or later.This smells like an issue sending or receiving over the socket, but I can't tell if that 2-byte TX's are actually getting through to the receiving app or not. A TCP packet dump (e.g. using Wireshark) could be helpful to understand whether this is a transmit or a receive problem.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Deon George@1:103/705 to GitLab note in main/sbbs on Fri Dec 23 16:04:54 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2961

    OK, I'll get your debug file after I've submitted this.To answer your questions, no, dont use -r (or Rate) in the SVDM.INI file. And -d is only used since I started debugging this. (Not sure where the `debug log` is, but it outputs to the screen on exit: `rx_delay: 0`.)I start SVDM with "-raw -d -l -p2021 pop %1 %2 %3 %4" (but generally only start the bat file with 1 argument)Attached, I've created more debugging with dbgview - "a good session" and a "bad session", together with tcpdump on the linux side. (The times in the timestamps are in sync.)I'm guessing (and my memory of how this starts), on it, the mailer sends 0d 0a to start the EMSI session (the two bytes), where the server responds with two EMSI_REQ, the client should respond with an EMSI_DAT.In the good logs, you can see this starting.In the bad logs, I'm thinking that SVDM doesnt receive (or pass through) the EMSI_REQs as received, so the mailer is still sending the 0d 0a to start the session...[W7-SVDM-DEON.zip](/uploads/78432bfe07a1c8e4212a5f5222829848/W7-SVDM-DEON.zip)
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Fri Dec 23 18:21:42 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2962

    The Windows debug log is that thing that's viewable with the "dbgview" utility I referred you to. That said, the rx_delay value and other debug values that are output upon termination of SVDM are only logged to stdout (the console), not the debug log, and they're only output when the '-d' option is used. So I misspoke. In any case, an rx_delay value of 0 is good/normal.The SVDM debug log output in your attached W7*.TXT files is not from the current revision of svdm.exe in Git (the clock values I recently added and was looking for in the log messages are not there). Do you need me to build a new version and put it up for download on Vertrauen or can you grab an artifact from gitlab.synchro.net?Also, please attach the resulting cap/pcap files from Ethernet packet captures, not the plain text/hexadecimal output of tcpdump, so I can use a visual packet dissector (e.g. Wireshark), and not have to try and make sense of the layers of protocol headers in hexadecimal, myself.Thank you!
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Deon George@1:103/705 to GitLab note in main/sbbs on Sun Dec 25 15:31:15 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2965

    Right, I wrote the message, and captures with tcpdump *before* I got your new debug one (to answer the question that you couldnt tell if the two byte TX's were getting through - from the tcpdump, those two bytes are 0x20 0x0d) - and to help determine if it was a sending or receiving issue. (I'm betting the latter, since those two bytes made it to the server's tcpdump capture...). It was failing at the time, so it was easy to quickly grab that tcpdump.Anyway, since running the debug svdm (I got it from the build artifact from the pipeline linked to the commit 19545587 linked to this issue), it hasnt failed (two days so far). Nightly polling has worked, as well as the occasional forced poll...I'll leave it running as is for a few days and see if it does eventually fail... If it does, I'll get you the pcaps from tcpdump as well... Thanks and Merry Christmas :)
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Deon George@1:103/705 to GitLab note in main/sbbs on Mon Dec 26 14:32:03 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2967

    OK, I think I have it this time: The zip file has 3 files:* `W7-bad.log` is the log from DBGVIEW, running for a few days...* `w7-bad.pcap` is from 8:53:23am (line 1117). Just before 8:53 it failed, but I didnt have tcpdump running. When I started tcpdump, the next two sessions worked OK (and mail transferred - so a capture of a "good session"). I tried twice 15 mins later and it failed (9:10:33am), so this has the first failed sesssion and the below log file has the second.* `w7-bad-1.pcap` is just a single session showing receipt of the two bytes, but not response passed back to the mailer. (9:11:23am - from line 1340).Hope this helps...[SVDM-TAKE2.ZIP](/uploads/92a8fc2488b55a93fbcb464f1891857f/SVDM-TAKE2.ZIP)
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Mon Dec 26 16:24:37 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2969

    Okay, I added some more debug log output around select() for receive and incremented the version to 0.3.Try that and please supply the debug log output again around a failure scenario (the packet captures aren't necessary right now).I also did the drop file clean up thing you suggested on DOVE-Net.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Deon George@1:103/705 to GitLab note in main/sbbs on Tue Dec 27 03:12:48 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2970

    Here you go:```00000212 10:10:25 PM [3768] SBBSVDM: RX command: ' ATD010003001 ' at 45592593500000213 10:10:25 PM [3768] SBBSVDM: dial(010003001)00000214 10:10:25 PM [3768] SBBSVDM: Connecting to port 11179 at host 'd-11-1.leenooks.lan' via Raw00000215 10:10:25 PM [3768] SBBSVDM: C:\GitLab-Runner\builds\4S9KMgUy\2\main\sbbs\src\vdmodem\vdmodem.c 640 calling getaddrinfo00000216 10:10:25 PM [3768] SBBSVDM: C:\GitLab-Runner\builds\4S9KMgUy\2\main\sbbs\src\vdmodem\vdmodem.c 662 calling connect00000217 10:10:25 PM [3768] SBBSVDM: Setting socket options00000218 10:10:25 PM [3768] SBBSVDM: Socket send buffer length: 8192 bytes00000219 10:10:25 PM [3768] SBBSVDM: Socket receive buffer length: 8192 bytes00000220 10:10:25 PM [3768] SBBSVDM: Socket TCP_NODELAY: 000000221 10:10:25 PM [3768] SBBSVDM: C:\GitLab-Runner\builds\4S9KMgUy\2\main\sbbs\src\vdmodem\vdmodem.c 714 connected!00000222 10:10:25 PM [3768] SBBSVDM: Modem response: CONNECT 960000000223 10:10:25 PM [3768] SBBSVDM: select returned 1 at 45592617200000224 10:10:25 PM [3768] SBBSVDM: recv returned 10700000225 10:10:27 PM [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED00000226 10:10:28 PM [3768] SBBSVDM: New esc count = 0 at 45592922200000227 10:10:28 PM [3768] SBBSVDM: TX: 2 bytes00000228 10:10:28 PM [3768] SBBSVDM: TX: 2 bytes00000229 10:10:29 PM [3768] SBBSVDM: TX: 2 bytes00000230 10:10:30 PM [3768] SBBSVDM: TX: 2 bytes00000231 10:10:30 PM [3768] SBBSVDM: TX: 2 bytes00000232 10:10:31 PM [3768] SBBSVDM: TX: 2 bytes00000233 10:10:32 PM [3768] SBBSVDM: TX: 2 bytes00000234 10:10:32 PM [3768] SBBSVDM: TX: 2 bytes00000235 10:10:33 PM [3768] SBBSVDM: TX: 2 bytes00000236 10:10:33 PM [3768] SBBSVDM: TX: 2 bytes00000237 10:10:34 PM [3768] SBBSVDM: TX: 2 bytes00000238 10:10:35 PM [3768] SBBSVDM: TX: 2 bytes00000239 10:10:35 PM [3768] SBBSVDM: TX: 2 bytes00000240 10:10:36 PM [3768] SBBSVDM: TX: 2 bytes00000241 10:10:36 PM [3768] SBBSVDM: TX: 2 bytes00000242 10:10:36 PM [3768] SBBSVDM: hangup_event signaled at 45593763600000243 10:10:36 PM [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED00000244 10:10:36 PM [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED00000245 10:10:36 PM [3768] SBBSVDM: RX command: ' ATZ ' at 45593780900000246 10:10:37 PM [3768] SBBSVDM: Modem response: OK00000247 10:10:37 PM [3768] SBBSVDM: RX command: 'ATI ' at 45593806200000248 10:10:37 PM [3768] SBBSVDM: Modem response: Synchronet Virtual DOS Modem for Windows v0.3 Copyright 2022 Rob Swindell00000249 10:10:37 PM [3768] HEAD/2a6d998300000250 10:10:37 PM [3768]00000251 10:10:37 PM [3768] OK00000252 10:10:38 PM [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED```
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Deon George@1:103/705 to GitLab note in main/sbbs on Tue Dec 27 16:04:26 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2972

    I neglected to give you this when svdm exited:```rx_delay: 0rx_buflen: 10000largest recv: 1030lasttx: 502008272lastrx: 501996714lastring: 0timer: 502013604```
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Fri Dec 30 17:54:45 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2983

    I'm not really seeing any clues as to the cause yet - nothing to point the finger at SVDM, specifically.So far the packet captures have been from the server-side (your "linux mailer"). Can you provide a packet capture for the Windows client side (running SVDM)? Hopefully this will prove if the missing 107 bytes (that POP is looking for as part of the EMSI exchange) were actually received by the network stack on the client.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Deon George@1:103/705 to GitLab note in main/sbbs on Fri Dec 30 18:33:44 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2984

    SVDM confirms receipt of the 107 bytes:```00000224 10:10:25 PM [3768] SBBSVDM: recv returned 107```Also my TCP networking is a little rusty, but given this is TCP, from the tcpdump that I took it can validate that the receiver (Windows) got my packet right? IE (looking at the `w7-bad-1.pcap` file):I `...fe98:b5d1` sent (the 107 bytes - confirmed by `length 107`) as `SEQuence 1:108`:```09:11:24.459482 IP6 fd00:368:0:1000:20c:29ff:fe98:b5d1.11179 > fd00:368:0:1000:29a7:8f39:26b1:9e6.49366: Flags [P.], seq 1:108, ack 1, win 507, length 107 0x0000: 6001 989b 007f 0640 fd00 0368 0000 1000 `......@...h.... 0x0010: 020c 29ff fe98 b5d1 fd00 0368 0000 1000 ..)........h.... 0x0020: 29a7 8f39 26b1 09e6 2bab c0d6 a286 a9c0 )..9&...+....... 0x0030: a8d2 2a12 5018 01fb eb44 0000 2a2a 454d ..*.P....D..**EM 0x0040: 5349 5f52 4551 4137 3745 0808 0808 0808 SI_REQA77E...... 0x0050: 0808 0808 0808 0808 5468 6973 2069 7320 ........This.is. 0x0060: 6120 6d61 696c 206f 6e6c 7920 7379 7374 a.mail.only.syst 0x0070: 656d 202d 2075 6e6c 6573 7320 796f 7520 em.-.unless.you. 0x0080: 6172 6520 6120 6d61 696c 6572 2c20 796f are.a.mailer,.yo 0x0090: 7520 7368 6f75 6c64 2064 6973 636f 6e6e u.should.disconn 0x00a0: 6563 7420 3a29 0d ect.:).```The windows system `...26b1:9e6` provided both a SACK and an ACK `ack 108`:```09:11:24.462327 IP6 fd00:368:0:1000:29a7:8f39:26b1:9e6.49366 > fd00:368:0:1000:20c:29ff:fe98:b5d1.11179: Flags [.], ack 108, win 16533, options [nop,nop,sack 1 {1:108}], length 0 0x0000: 6000 0000 0020 0680 fd00 0368 0000 1000 `..........h.... 0x0010: 29a7 8f39 26b1 09e6 fd00 0368 0000 1000 )..9&......h.... 0x0020: 020c 29ff fe98 b5d1 c0d6 2bab a8d2 2a12 ..).......+...*. 0x0030: a286 aa2b 8010 4095 a956 0000 0101 050a ...+..@..V...... 0x0040: a286 a9c0 a286 aa2b .......+09:11:27.695760 IP6 fd00:368:0:1000:29a7:8f39:26b1:9e6.49366 > fd00:368:0:1000:20c:29ff:fe98:b5d1.11179: Flags [P.], seq 1:3, ack 108, win 16533, length 2 0x0000: 6000 0000 0016 0680 fd00 0368 0000 1000 `..........h.... 0x0010: 29a7 8f39 26b1 09e6 fd00 0368 0000 1000 )..9&......h.... 0x0020: 020c 29ff fe98 b5d1 c0d6 2bab a8d2 2a12 ..).......+...*. 0x0030: a286 aa2b 5018 4095 5850 0000 200d ...+P.@.XP....```The `ACK` included the two bytes (TX: 2 bytes) 0x20 0x0d - as sequence 1:3, which my linux server also ACKed as `ack 3`:```09:11:27.695901 IP6 fd00:368:0:1000:20c:29ff:fe98:b5d1.11179 > fd00:368:0:1000:29a7:8f39:26b1:9e6.49366: Flags [.], ack 3, win 507, length 0 0x0000: 6001 989b 0014 0640 fd00 0368 0000 1000 `......@...h.... 0x0010: 020c 29ff fe98 b5d1 fd00 0368 0000 1000 ..)........h.... 0x0020: 29a7 8f39 26b1 09e6 2bab c0d6 a286 aa2b )..9&...+......+ 0x0030: a8d2 2a14 5010 01fb ead9 0000 ..*.P.......```But this did give me an idea - I'm wondering if it is timing issue, ie: I send the EMSI_REQ immediately upon connection and perhaps it got lost (?) before the mailer was ready to consume it? Since the mailer hasnt seen it, it is sending the 0x20 0x0d and not getting a reply (because I already send the EMSI_REQ).(Not sure why then restarting SVDM resolves it.)So is there anyway to proof the mailer picked up the 107 bytes?I'll also get my mailer, to resend the EMSI_REQ a second time (or even delay a few seconds after connect), to see if that helps...
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Deon George@1:103/705 to GitLab note in main/sbbs on Fri Dec 30 19:21:03 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2987

    OK, it is a timing issue, and perhaps a lost data issue?My mailer should have been resending the emsi_dat again after 5 seconds (but wasnt because it didnt time out per-se, it was receiving from SVDM - albeit 0x20 0x0d).When I fixed that, the session proceeded as normal. So I'll keep an eye on it and see if that does actually provide a workaround to the issue.```00000149 2:15:19 PM [2928] SBBSVDM: Modem response: CONNECT 960000000150 2:15:19 PM [2928] SBBSVDM: select returned 1 at 77298474800000151 2:15:19 PM [2928] SBBSVDM: recv returned 10700000152 2:15:22 PM [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED00000153 2:15:22 PM [2928] SBBSVDM: New esc count = 0 at 77298801000000154 2:15:22 PM [2928] SBBSVDM: TX: 2 bytes00000155 2:15:23 PM [2928] SBBSVDM: TX: 2 bytes00000156 2:15:24 PM [2928] SBBSVDM: TX: 2 bytes00000157 2:15:24 PM [2928] SBBSVDM: TX: 2 bytes00000158 2:15:24 PM [2928] SBBSVDM: select returned 1 at 77298975200000159 2:15:24 PM [2928] SBBSVDM: recv returned 10700000160 2:15:25 PM [2928] SBBSVDM: TX: 2 bytes00000161 2:15:26 PM [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED00000162 2:15:26 PM [2928] SBBSVDM: TX: 15 bytes00000163 2:15:26 PM [2928] SBBSVDM: select returned 1 at 77299119100000164 2:15:26 PM [2928] SBBSVDM: recv returned 1500000165 2:15:26 PM [3884] SBBS: !VDD_OUTBUF_PURGE: NOT IMPLEMENTED00000166 2:15:26 PM [2928] SBBSVDM: TX: 280 bytes00000167 2:15:26 PM [2928] SBBSVDM: select returned 1 at 77299200500000168 2:15:26 PM [2928] SBBSVDM: recv returned 30```The question is, where is the first 107 bytes? :)
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Rob Swindell@1:103/705 to GitLab note in main/sbbs on Fri Dec 30 19:31:41 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2988

    I think the question is: where are the *second* 107 bytes?The working log you sent previously, showed:> 00000013 10:37:39 AM [3572] SBBSVDM: recv returned 107 > 00000016 10:37:43 AM [3572] SBBSVDM: TX: 2 bytes > 00000018 10:37:44 AM [3572] SBBSVDM: recv returned 107 > 00000020 10:37:44 AM [3572] SBBSVDM: TX: 2 bytes In the failing logs, SVDM is responding to the first 107 bytes received (with a 2 byte response), but never gets the apparently-expected second 107 message from the server.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Deon George@1:103/705 to GitLab note in main/sbbs on Fri Dec 30 19:49:03 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2989

    Actually a response to an EMSI_REQ is an EMSI_DAT (not 0x20 0x0d) and the logs show the mailer responds to it when it is sent a second time.There was a particular scenario that my mailer wasnt sending it a second time, and I think that was what resulted in each failed session.IIRC, the process should be - upon connection:* Client sends something <-- I always thought this was optional, perhaps not. (eg: 0x20 0x0d)* Server sends EMSI_REQ* Client responds with EMSI_DATThus it could well be that the first 107 bytes is actually sent to POP, but because it hasnt sent it's 0x20 0x0d it ignores it. (ie it's not ready to to start a session).Now that I've fixed sending it again, I'll monitor it over the next few days - and if there are no more problems, I guess we can put it down to POP throwing away the first EMSI_REQ because it wasnt ready for it yet...
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Deon George@1:103/705 to GitLab note in main/sbbs on Sat Dec 31 13:42:50 2022
    https://gitlab.synchro.net/main/sbbs/-/issues/468#note_2993

    So appreciate your help working through this one.I put in a 2 second delay in my mailer when a TCP connection is made, but POP is still not reacting to the first EMSI_REQ (it still responds with 0x20 0x0d). When a second EMSI_REQ is sent 5 seconds later, it still responds with 0x20 0x0d, but then followed by an EMSI_DAT and all is good.So my fix was to make sure that that second EMSI_DAT gets sent and all is good :)HNY.
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)
  • From Deon George@1:103/705 to GitLab issue in main/sbbs on Sat Dec 31 13:42:50 2022
    close https://gitlab.synchro.net/main/sbbs/-/issues/468
    --- SBBSecho 3.20-Linux
    * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)