Problem with Mono / tcp queries / kvm

Sep 8, 2012 at 12:38 PM

Hi,

This may or may not be an issue with the code, though I'm struggling to find the cause.  I have my code running on Windows, and Ubuntu with Mono (xen, hyperV etc).  Works fine.  I just tried an Ubuntu KVM node for the first time (same OS as elsewhere - fresh ubuntu 12.04 lts from cd).  Interestingly the code stopped running about every 10 minutes or so.  I've just figured out what causes this and it's TCP requests.  Given that my code is actually the same regardless of whether it's TCP or UDP, I'm guessing mono/kvm (might be nothing to do with kvm of course but I know of no other difference) doesn't like something in the library. 

I can run thousands of queries per second against the code via udp, then one single 'dig +tcp @server www.test.com' will stop it responding.  If via the code I server.stop(), wait 500ms, server.start(), udp requests start working again.  I'm listening on IPv6, if that makes a difference.

When the query is received, it does get as far as calling the 'processquery' routine, because at the end of that routine, my code logs the query and the tcp query appears in the log.  After that I 'return query'.  The client never gets the response and the server stops receiving requests (though the code stays responsive and I server.stop, server.start via a button and all starts working again).

Any ideas what might cause this?

Sep 14, 2012 at 11:13 AM
Edited Sep 14, 2012 at 11:14 AM

Some more information.  I had thought this was a KVM issue due to the amount of times I've used this code elsewhere without issue, but further testing shows, whilst this does not occur on Windows, it does for mono.  I'm still struggling to find out why.  I have some servers setup using XEN and whilst mono prints out a 'Got a bad hardware address length for an AF_PACKET 16 8' error, which I believe is some sort of mono network interface bug, it works.   Same version of mono, same OS under vmware and XEn with another provider, any tcp query inbound will stop the library listener responding to all udp and tcp requests. I replicated this under vmware player and the same OS on my desktop but could not get it to fault on same OS under XEN with a vps host, nor on ESX with an older version of ubuntu.

There would appear to be a definite problem using the library as a server receiving tcp queries, but oddly, only on some platforms under mono.  Perhaps this is some sort of deadlock.  The OS in each case is identical.  The version of mono is the stock default from the OS and the same in each case.  The only difference I can see is the virtulization technology and perhaps as a result the network interfaces (though I've tried every combination of these I can). Serving UDP queries is fine.

Any hints on how to debug this?

Sep 14, 2012 at 11:42 AM
Edited Sep 14, 2012 at 11:42 AM

One more thing I just noted in trying to figure out what is going on that might help pinpoint this.

If I do dig@server www.test.com +tcp

dig hangs - It does not respond and times out. 'no servers could be reached'.  I can replicate this constantly.

If I run the same command then press the button I have on my app  that says server.stop, pause, server.start the response is received immediately server.stop is requested.  So, what ever the library does to flush / close down during server.stop makes the outstanding request get answered - the library is getting caught up somewhere when answering tcp requests.

Sep 20, 2012 at 5:47 PM

I managed to get traces of the library making the same request via udp and tcp under mono.  Here's the data

UDP query for www.me.com, works

[0x7fd088245700: 20.80227 0] ENTER: ARSoft.Tools.Net.Dns.DnsServer:EndUdpReceive (System.IAsyncResult)(this:0x7fd08b2b8160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], [.MyAsyncResult:0x7fd08b2ba850], )
[0x7fd088245700: 20.80313 1] ENTER: ARSoft.Tools.Net.Dns.DnsMessageBase:Create (byte[],bool,ARSoft.Tools.Net.Dns.DnsServer/SelectTsigKey,byte[])(0x7fd08b419280, 1, (nil), (nil), )
<snip - board does not allow large posts>
[0x7fd088245700: 20.97559 0] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:EndUdpReceive (System.IAsyncResult)
[0x7fd08182f700: 20.97565 0] ENTER: ARSoft.Tools.Net.Dns.DnsServer:EndUdpSend (System.IAsyncResult)(this:0x7fd08b2b8160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], [.MyAsyncResult:0x7fd08b2bacb0], )
[0x7fd08182f700: 20.97607 0] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:EndUdpSend (System.IAsyncResult)

 

TCP query for www.me.com, failed

[0x7fd08182f700: 104.02094 0] ENTER: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)(this:0x7fd08b2b8160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], [.SocketAsyncResult:0x7fd08b42ccc0], )
[0x7fd08182f700: 104.02141 1] ENTER: ARSoft.Tools.Net.Dns.DnsServer:StartTcpAcceptConnection ()(this:0x7fd08b2b8160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], )
[0x7fd08182f700: 104.02144 1] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:StartTcpAcceptConnection ()
[0x7fd08182f700: 104.02145 0] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)

 

stops here and the dig client will retry every 5 seconds producing the same as the above again.  The query eventually times out.  If I server.stop just after the query is made, then I get all this and the query is answered

 

[0x7fd0815e9700:] EXCEPTION handling: System.Net.Sockets.SocketException: Invalid arguments
[0x7fd0815a4700:] EXCEPTION handling: System.Threading.ThreadAbortException: Thread was being aborted
[0x7fd0815a4700:] EXCEPTION handling: System.Net.Sockets.SocketException: interrupted
[0x7fd0815e9700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
[0x7fd0815e9700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
[0x7fd08162a700:] EXCEPTION handling: System.Net.Sockets.SocketException: Invalid arguments
[0x7fd08151a700:] EXCEPTION handling: System.Net.Sockets.SocketException: Invalid arguments
[0x7fd0814d5700:] EXCEPTION handling: System.Net.Sockets.SocketException: Invalid arguments
[0x7fd08155f700:] EXCEPTION handling: System.Net.Sockets.SocketException: Invalid arguments
[0x7fd0814d5700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
[0x7fd0814d5700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
[0x7fd0814d5700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
[0x7fd08155f700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
[0x7fd0814d5700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
[0x7fd08155f700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
[0x7fd0814d5700:] EXCEPTION handling: System.Net.Sockets.SocketException: The descriptor is not a socket
[0x7fd08182f700: 120.35884 0] ENTER: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)(this:0x7fd08b2b8160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], [.SocketAsyncResult:0x7fd08b42c6c0], )
[0x7fd08182f700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
EXCEPTION: catch found at clause 0 of ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)
[0x7fd08182f700: 120.36139 0] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)
[0x7fd08182f700: 120.36143 0] ENTER: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)(this:0x7fd08b2b8160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], [.SocketAsyncResult:0x7fd08b42c540], )
[0x7fd08182f700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
EXCEPTION: catch found at clause 0 of ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)
[0x7fd08182f700: 120.36148 0] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)
[0x7fd08182f700: 120.36153 0] ENTER: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)(this:0x7fd08b2b8160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], [.SocketAsyncResult:0x7fd08b42c3c0], )
[0x7fd088245700: 120.35889 0] ENTER: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)(this:0x7fd08b2b8160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], [.SocketAsyncResult:0x7fd08b42c600], )
[0x7fd088245700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
EXCEPTION: catch found at clause 0 of ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)
[0x7fd088245700: 120.36197 0] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)
[0x7fd088245700: 120.36203 0] ENTER: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)(this:0x7fd08b2b8160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], [.SocketAsyncResult:0x7fd08b42c780], )
[0x7fd088245700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
EXCEPTION: catch found at clause 0 of ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)
[0x7fd08182f700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
[0x7fd08151a700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
EXCEPTION: catch found at clause 0 of ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)
[0x7fd08182f700: 120.36228 0] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)
[0x7fd08182f700: 120.36230 0] ENTER: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)(this:0x7fd08b2b8160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], [.SocketAsyncResult:0x7fd08b42ca80], )
[0x7fd08182f700:] EXCEPTION handling: System.ObjectDisposedException: The object was used after being disposed.
EXCEPTION: catch found at clause 0 of ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)
<snip - board does not allow large posts>
[0x7fd08182f700: 120.40488 0] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:EndTcpSendData (System.IAsyncResult)
[0x7fd088245700: 120.40491 0] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:EndTcpReadData (System.IAsyncResult)
[0x7fd08182f700: 120.40924 0] ENTER: ARSoft.Tools.Net.Dns.DnsServer:EndTcpReadLength (System.IAsyncResult)(this:0x7fd08b2b8160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], [.SocketAsyncResult:0x7fd088425840], )
[0x7fd08182f700: 120.40942 1] ENTER: ARSoft.Tools.Net.Dns.DnsMessageBase:ParseUShort (byte[],int&)(0x7fd08841f730, [BYREF:0x7fd08182ea3c], )
[0x7fd08182f700: 120.40955 1] LEAVE: ARSoft.Tools.Net.Dns.DnsMessageBase:ParseUShort (byte[],int&)result=0
[0x7fd08182f700: 120.40967 0] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:EndTcpReadLength (System.IAsyncResult)
pjsmith@ubuntu:~/pjdns$ [0x7fd08182f700: 104.02094 0] ENTER: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)(this:0x7fd08b2b8160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], [.SocketAsyncResult:0x7fd08b42ccc0], )
-bash: syntax error near unexpected token `('
pjsmith@ubuntu:~/pjdns$ [0x7fd08182f700: 104.02141 1] ENTER: ARSoft.Tools.Net.Dns.DnsServer:StartTcpAcceptConnection ()(this:0x7fd08b2b8160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], )
-bash: syntax error near unexpected token `('
pjsmith@ubuntu:~/pjdns$ [0x7fd08182f700: 104.02144 1] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:StartTcpAcceptConnection ()
-bash: syntax error near unexpected token `('
pjsmith@ubuntu:~/pjdns$ [0x7fd08182f700: 104.02145 0] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)
[0x7f01ce633700:] EXCEPTION handling: System.IO.FileNotFoundException: Could not load file or assembly 'UIAutomationWinforms, Version=

 

If I try the dig via TCP and abort it so there is one outstanding query with the library, I get

[0x7f5bc0e16700: 8.05668 0] ENTER: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)(this:0x7f5bca329160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], [.SocketAsyncResult:0x7f5bca49df00], )
[0x7f5bc0e16700: 8.05891 1] ENTER: ARSoft.Tools.Net.Dns.DnsServer:StartTcpAcceptConnection ()(this:0x7f5bca329160[ARSoft.Tools.Net.Dns.DnsServer pjdns.exe], )
[0x7f5bc0e16700: 8.05900 1] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:StartTcpAcceptConnection ()
[0x7f5bc0e16700: 8.05902 0] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:EndTcpAcceptConnection (System.IAsyncResult)

 

and there is stops. It does eventually time out

 

[0x7f5bc0e16700: 378.28046 0] ENTER: ARSoft.Tools.Net.Dns.DnsServer:TcpTimedOut (object)([.MyState:0x7f5bca338410], )
[0x7f5bc0e16700: 378.28056 0] LEAVE: ARSoft.Tools.Net.Dns.DnsServer:TcpTimedOut (object)

but won't answer queries in time unless you abort the thread via server.stop.

Anyone got any ideas how to resolve?

 

I had to snip a lot of the content out because the board will not allow me to post the full log.  Hopefully there is enough start and end to help.

 

 

Coordinator
Sep 20, 2012 at 9:10 PM

Thanks for all the investigation.

I have an idea, what could cause the problems and will rewrite the tcp connection handling the next days. Hopefully I will finish it until Wednesday.

Alex

Coordinator
Sep 24, 2012 at 10:58 AM

Hi,

unfortunately I could not reproduce the problem. But I rewrote the tcp connection handling. Maybe you can test the changes. A debug build can be found under http://docs.ar-soft.de/ARSoft.Tools.Net.Pre_1.7.1.zip

Alex

Sep 24, 2012 at 3:29 PM

Many thanks for this.  I've travelling at the moment but will be back in a couple days.  It might be a couple days before I get to test this, but will report back asap.

Sep 24, 2012 at 5:18 PM

I've just had a chance to try this, quickly before I leave.  I first re-produced the issue then uploaded the new dll.  Immediate success :)  I should be packing, but I'm going to try and hammer it a bit to make sure all is good.

Thank you.

Coordinator
Sep 25, 2012 at 3:13 PM

I released the changes as version 1.7.1 a few moments ago.

Sep 27, 2012 at 3:16 PM

Just to confirm I've had this running for a few hours now without issues.