CGate Error 105001 "Cancelled connection to specified server"

Discussion in 'C-Bus Toolkit and C-Gate Software' started by more-solutions, Jul 29, 2011.

  1. more-solutions

    more-solutions

    Joined:
    Apr 23, 2006
    Messages:
    283
    Likes Received:
    4
    Location:
    Peterborough, UK
    I'm trying to connect to C-gate across an SSH tunnel, which has been working previously. The only thing that may have changed since it last worked is I have upped the log level to 9 (although to be honest I thought I had connected with Toolkit since then).

    The full text of the error message is (typed manually so there may be typos):
    The connection to CGate was cancelled as CGate failed to meet the connection requirements. Connection was disallowed for the following reason "Unable to determine server version"
    This is the relevant (hopefully!) section from event.log:
    Code:
    20110729-090434 803 cmd5735 - Host:/127.0.0.1 opened command interface from port: 46921
    20110729-090434 766 cmd5735 - Response: 201 Service ready: Clipsal C-Gate Version: v2.8.0 (build 2316) #cmd-syntax=1.0
    20110729-090434 761 cmd5735 - Command: SESSION_ID
    20110729-090435 803 cmd5737 - Host:localhost/127.0.0.1 opened command interface from port: 54713
    20110729-090435 766 cmd5737 - Response: 201 Service ready: Clipsal C-Gate Version: v2.8.0 (build 2316) #cmd-syntax=1.0
    20110729-090435 766 cmd5735 - Response: 300 sessionID=cmd5735
    20110729-090435 761 cmd5737 - Command: &4[14]get cgate version
    20110729-090436 761 cmd5737 - Command: &4[15]apiver
    20110729-090436 766 cmd5737 - Response: [14] 300 cgate: version=v2.8.0 (build 2316)
    20110729-090436 766 cmd5737 - Response: [15] 138-event=2.0
    20110729-090437 766 cmd5737 - Response: [15] 138-ccp=1.1
    20110729-090437 766 cmd5737 - Response: [15] 138-scp=1.0
    20110729-090437 766 cmd5737 - Response: [15] 138-schema-tag=1.3
    20110729-090437 766 cmd5737 - Response: [15] 138-schema-unitspec=1.0
    20110729-090437 766 cmd5737 - Response: [15] 138-schema-patch=1.0
    20110729-090437 766 cmd5737 - Response: [15] 138-schema-cbusunits=1.0
    20110729-090438 766 cmd5737 - Response: [15] 138-schema-transform=1.0
    20110729-090438 761 cmd5737 - Command: STOP 15 # Exceeded timeout of 2000ms
    20110729-090438 766 cmd5737 - Response: [15] 138-schema-applications=2.0
    20110729-090438 766 cmd5737 - Response: [15] 138-obj-application=1.4
    20110729-090439 766 cmd5737 - Response: [15] 138-obj-application-accesscontrol=1.1
    20110729-090439 766 cmd5737 - Response: [15] 138-obj-application-aircon=1.1
    20110729-090439 766 cmd5737 - Response: [15] 138-obj-application-clock=1.1
    20110729-090439 766 cmd5737 - Response: [15] 138-obj-application-enable=1.1
    20110729-090439 766 cmd5737 - Response: [15] 138-obj-application-ereport=1.2
    20110729-090439 804 cmd5737 - Host:localhost/127.0.0.1 closed command interface from port: 54713
    20110729-090439 766 cmd5737 - Response: [15] 138-obj-application-identify=1.1
    20110729-090440 766 cmd5737 - Response: [15] 138-obj-application-lighting=1.1
    20110729-090440 766 cmd5737 - Response: [15] 138-obj-application-measurement=2.0
    20110729-090441 766 cmd5737 - Response: [15] 138-obj-application-mediatransport=1.0
    20110729-090441 766 cmd5737 - Response: [15] 138-obj-application-network=1.1
    20110729-090441 766 cmd5737 - Response: [15] 138-obj-application-security=1.1
    20110729-090441 766 cmd5737 - Response: [15] 138-obj-application-shortmessage=1.0
    20110729-090442 766 cmd5737 - Response: [15] 138-obj-application-telephony=1.1
    20110729-090442 766 cmd5737 - Response: [15] 138-obj-application-temperature=1.1
    20110729-090442 766 cmd5737 - Response: [15] 138-obj-application-trigger=1.1
    20110729-090442 766 cmd5737 - Response: [15] 138-obj-unit=1.6
    20110729-090442 766 cmd5737 - Response: [15] 138-obj-network=1.1
    20110729-090443 766 cmd5737 - Response: [15] 138-cmd-syntax=1.0
    20110729-090443 766 cmd5737 - Response: [15] 138-cmd-project=1.3
    20110729-090443 766 cmd5737 - Response: [15] 138-cmd-pp=1.5
    20110729-090443 766 cmd5737 - Response: [15] 138-cmd-net=1.3
    20110729-090443 766 cmd5737 - Response: [15] 138-cmd-port=1.1
    20110729-090443 766 cmd5737 - Response: [15] 138-cmd-file=1.2
    20110729-090443 766 cmd5737 - Response: [15] 138-cmd-general=2.0
    20110729-090444 766 cmd5737 - Response: [15] 138-cmd-cbus=1.0
    20110729-090444 766 cmd5737 - Response: [15] 138-cmd-db=1.4
    20110729-090444 766 cmd5737 - Response: [15] 138-cmd-transform=1.2
    20110729-090445 766 cmd5737 - Response: [15] 138-cmd-label=1.0
    20110729-090445 766 cmd5737 - Response: [15] 138-topology=1.0
    20110729-090445 766 cmd5737 - Response: [15] 138-calculator=1.2
    20110729-090445 766 cmd5737 - Response: [15] 138-config=1.4
    20110729-090446 766 cmd5737 - Response: [15] 138-access=1.0
    20110729-090446 766 cmd5737 - Response: [15] 207 Stopped
    
    Looks like Toolkit expects a response to the apiver command within 2s but it's taking longer than that to send the response? It's tempting to blame the SSH connection but if I run "telnet localhost 20023" at the remote end then "apiver" it's taking a very long time to respond.
     
    more-solutions, Jul 29, 2011
    #1
  2. more-solutions

    more-solutions

    Joined:
    Apr 23, 2006
    Messages:
    283
    Likes Received:
    4
    Location:
    Peterborough, UK
    PS: As would be expected, restarted CGate has "fixed" this, but CGate needs to be running 24/7 on this site so I need to find the cause if I can.
     
    more-solutions, Jul 29, 2011
    #2
  3. more-solutions

    Darren Senior Member

    Joined:
    Jul 29, 2004
    Messages:
    2,361
    Likes Received:
    0
    Location:
    Adelaide, South Australia
    The only time I have seen this is if the computer is very busy. I think that ToolKit is probably not waiting for quite long enough. There is not necessarily an issue with C-Gate.
     
    Darren, Jul 30, 2011
    #3
  4. more-solutions

    more-solutions

    Joined:
    Apr 23, 2006
    Messages:
    283
    Likes Received:
    4
    Location:
    Peterborough, UK
    I think in this case the problem was cgate, it was repeatable over several minutes but restarting cgate cured it. Toolkit seems to wait 2s, it looks like well over that was needed (10+s).

    Assuming Cgate was busy and doing something useful, what can I do to "help" it along? How can I test if it's memory or cpu bound?
     
    more-solutions, Jul 30, 2011
    #4
  5. more-solutions

    ashleigh Moderator

    Joined:
    Aug 4, 2004
    Messages:
    2,392
    Likes Received:
    24
    Location:
    Adelaide, South Australia
    It would help to know what the processor performance is that you have cgate running on.

    Can you tell us, what processor, what speed, is it multi-threaded or multi-cored, how much RAM. That will at least give a rough idea of what you are dealing with.

    The question "how much is enough" is a tough one.

    There are some (old now) machine specs up on the clipsal downloads wed site, and those should be considered a minimum.

    As to cgate being "busy", well, it depends on what "busy" actually is, ie what its doing at the time. C-Gate makes extensive use of threads, so a multi-core machine will in general give a better opportunity to un-stuck things than a single core machine.
     
    ashleigh, Jul 31, 2011
    #5
  6. more-solutions

    more-solutions

    Joined:
    Apr 23, 2006
    Messages:
    283
    Likes Received:
    4
    Location:
    Peterborough, UK
    AMD Athlon(tm) II Neo N36L Dual-Core Processor
    1GB RAM for the system (running Linux).

    CGate currently using 30% of available RAM, and at time of writing CPU usage is minimal, cgate is doing "nothing" (see event.log snippet) but apiver is still taking 5s to complete.

    When I checked earlier this morning CPU usage was nearer 80-90% (same memory).

    What can I do to determine how "busy" cgate is? A quick analysis of the logs files shows that some minutes there'll be 2-300 events written in any one minute, at other times it'll be 5-10. (Since I have a script that runs every minute I'd never expect it to go to zero!)

    There are 8 networks on site (joined via bridges, we connect to the backbone via CNI), around 20-30 units per network, and most of the time very little should be happening (there are no inputs on site so unless someone requests a lights change from our end there should be nothing changing on site), although depending on the screen being displayed at our end we may be querying anything up to 100 group address statuses every few seconds.

    Note that it's not just apiver which is slow; the whole system becomes sluggish, so (for example) "net list" will take a couple of seconds to churn out its responses instead of being instant. I've just restarted cgate, and memory usage is now 5%, CPU 4%. Responses to things like apiver and "net list" are now instant again.

    I'm happy to email or upload some event logs somewhere if someone wants to look at this? I have a 3MB compressed file comprising 54MB of level 9 logs over three days from one C-Gate start to the next, therefore showing (I assume, and if you know where to look) the deterioration from fresh start to slowdown.

    Code:
    20110731-111857 804 cmd6139 - Host:/127.0.0.1 closed command interface from port: 36332
    20110731-111857 803 cmd6141 - Host:/127.0.0.1 opened command interface from port: 36334
    20110731-111857 766 cmd6141 - Response: 201 Service ready: Clipsal C-Gate Version: v2.8.0 (build 2316) #cmd-syntax=1.0
    20110731-111858 761 cmd6141 - Command: SESSION_ID
    20110731-111858 766 cmd6141 - Response: 300 sessionID=cmd6141
    20110731-111858 803 null - Host:/127.0.0.1 opened event interface from port: 43064 on local port: 20024
    20110731-111958 804 cmd6141 - Host:/127.0.0.1 closed command interface from port: 36334
    20110731-111958 803 cmd6143 - Host:/127.0.0.1 opened command interface from port: 42522
    20110731-111958 766 cmd6143 - Response: 201 Service ready: Clipsal C-Gate Version: v2.8.0 (build 2316) #cmd-syntax=1.0
    20110731-111958 761 cmd6143 - Command: SESSION_ID
    20110731-111958 766 cmd6143 - Response: 300 sessionID=cmd6143
    20110731-111959 803 null - Host:/127.0.0.1 opened event interface from port: 42119 on local port: 20024
    20110731-112059 804 cmd6143 - Host:/127.0.0.1 closed command interface from port: 42522
    20110731-112059 803 cmd6145 - Host:/127.0.0.1 opened command interface from port: 42524
    20110731-112059 766 cmd6145 - Response: 201 Service ready: Clipsal C-Gate Version: v2.8.0 (build 2316) #cmd-syntax=1.0
    20110731-112059 761 cmd6145 - Command: SESSION_ID
    20110731-112059 803 null - Host:/127.0.0.1 opened event interface from port: 42121 on local port: 20024
    20110731-112059 766 cmd6145 - Response: 300 sessionID=cmd6145
    20110731-112200 804 cmd6145 - Host:/127.0.0.1 closed command interface from port: 42524
    20110731-112200 803 cmd6147 - Host:/127.0.0.1 opened command interface from port: 42526
    20110731-112200 766 cmd6147 - Response: 201 Service ready: Clipsal C-Gate Version: v2.8.0 (build 2316) #cmd-syntax=1.0
    20110731-112200 761 cmd6147 - Command: SESSION_ID
    20110731-112200 766 cmd6147 - Response: 300 sessionID=cmd6147
    20110731-112200 803 null - Host:/127.0.0.1 opened event interface from port: 42123 on local port: 20024
    20110731-112227 803 cmd6149 - Host:/127.0.0.1 opened command interface from port: 42528
    20110731-112227 766 cmd6149 - Response: 201 Service ready: Clipsal C-Gate Version: v2.8.0 (build 2316) #cmd-syntax=1.0
    20110731-112230 761 cmd6149 - Command: apiver
    20110731-112230 766 cmd6149 - Response: 138-event=2.0
    20110731-112230 766 cmd6149 - Response: 138-ccp=1.1
    20110731-112230 766 cmd6149 - Response: 138-scp=1.0
    20110731-112230 766 cmd6149 - Response: 138-schema-tag=1.3
    20110731-112230 766 cmd6149 - Response: 138-schema-unitspec=1.0
    20110731-112231 766 cmd6149 - Response: 138-schema-patch=1.0
    20110731-112231 766 cmd6149 - Response: 138-schema-cbusunits=1.0
    20110731-112231 766 cmd6149 - Response: 138-schema-transform=1.0
    20110731-112231 766 cmd6149 - Response: 138-schema-applications=2.0
    20110731-112231 766 cmd6149 - Response: 138-obj-application=1.4
    20110731-112231 766 cmd6149 - Response: 138-obj-application-accesscontrol=1.1
    20110731-112232 766 cmd6149 - Response: 138-obj-application-aircon=1.1
    20110731-112232 766 cmd6149 - Response: 138-obj-application-clock=1.1
    20110731-112232 766 cmd6149 - Response: 138-obj-application-enable=1.1
    20110731-112232 766 cmd6149 - Response: 138-obj-application-ereport=1.2
    20110731-112232 766 cmd6149 - Response: 138-obj-application-identify=1.1
    20110731-112232 766 cmd6149 - Response: 138-obj-application-lighting=1.1
    20110731-112232 766 cmd6149 - Response: 138-obj-application-measurement=2.0
    20110731-112232 766 cmd6149 - Response: 138-obj-application-mediatransport=1.0
    20110731-112233 766 cmd6149 - Response: 138-obj-application-network=1.1
    20110731-112233 766 cmd6149 - Response: 138-obj-application-security=1.1
    20110731-112233 766 cmd6149 - Response: 138-obj-application-shortmessage=1.0
    20110731-112233 766 cmd6149 - Response: 138-obj-application-telephony=1.1
    20110731-112233 766 cmd6149 - Response: 138-obj-application-temperature=1.1
    20110731-112233 766 cmd6149 - Response: 138-obj-application-trigger=1.1
    20110731-112233 766 cmd6149 - Response: 138-obj-unit=1.6
    20110731-112234 766 cmd6149 - Response: 138-obj-network=1.1
    20110731-112234 766 cmd6149 - Response: 138-cmd-syntax=1.0
    20110731-112234 766 cmd6149 - Response: 138-cmd-project=1.3
    20110731-112234 766 cmd6149 - Response: 138-cmd-pp=1.5
    20110731-112234 766 cmd6149 - Response: 138-cmd-net=1.3
    20110731-112234 766 cmd6149 - Response: 138-cmd-port=1.1
    20110731-112234 766 cmd6149 - Response: 138-cmd-file=1.2
    20110731-112234 766 cmd6149 - Response: 138-cmd-general=2.0
    20110731-112235 766 cmd6149 - Response: 138-cmd-cbus=1.0
    20110731-112235 766 cmd6149 - Response: 138-cmd-db=1.4
    20110731-112235 766 cmd6149 - Response: 138-cmd-transform=1.2
    20110731-112235 766 cmd6149 - Response: 138-cmd-label=1.0
    20110731-112235 766 cmd6149 - Response: 138-topology=1.0
    20110731-112235 766 cmd6149 - Response: 138-calculator=1.2
    20110731-112235 766 cmd6149 - Response: 138-config=1.4
    20110731-112235 766 cmd6149 - Response: 138 access=1.0
    20110731-112238 761 cmd6149 - Command: quit
    20110731-112238 766 cmd6149 - Response: 204 Closing connection.
     
    more-solutions, Jul 31, 2011
    #6
  7. more-solutions

    daniel C-Busser Moderator

    Joined:
    Jul 26, 2004
    Messages:
    766
    Likes Received:
    20
    Location:
    Adelaide
    Hi Mark,

    Regarding the APIVER timeout, Toolkit gives it 10 seconds to complete, but it should also reset the timeout with each response line received. I've reported that internally (Ref: 21787)

    However, that is still a very long time for the command to complete, and it seems that the general C-Gate slowdown is the underlying cause.

    Two things you can do:

    1. Level 9 logging is intensive and adds substantially to the load on the system - so for day to day operations you should restore it to level 5 logging.

    2. Check the amount of memory you are allocating to C-Gate, you can do this in Toolkit preferences.

    I certainly would like to have a look at your logs if that's ok, there may be certain operations in there that blow out the memory usage in unexpected ways. I've sent you a PM with where to send them. Thanks!
     
    Last edited by a moderator: Aug 1, 2011
    daniel, Aug 1, 2011
    #7
  8. more-solutions

    more-solutions

    Joined:
    Apr 23, 2006
    Messages:
    283
    Likes Received:
    4
    Location:
    Peterborough, UK
    Indeed, I only upped it to level 9 to see if I could track down the reason we sometimes lose connectivity to C-Gate (ie it's still running, but connections to port 20023 do not result in the welcome prompt).

    I did wonder whether I maybe have issues at the disk level handling the volume of data, but recent tests (including the above) have been on occasions when there's been nothing happening - the log attached to this thread wasn't filtered, the only thing C-Gate was having to worry about was the apiver command at that time, so whatever was wrong was something that had happened previously and left C-Gate in a mess.

    What are the symptoms of (eg) running out of memory?

    I thought I was specifying an amount of RAM at C-Gate startup but it appears not at the moment. (I thought that the settings in Toolkit only influenced C-Gate when started by Toolkit though?)

    I'm happy to increase the amount of RAM allocated, and (if necessary) I can go to site and install more RAM in the server, but I'd like to do this because I can see C-Gate is using the RAM not just because it might help. If, for example, I'm doing something at an application level that's causing C-Gate to leak memory, then doubling the memory won't fix the problem, it'll just double the time between failures, which for something that's supposed to run 24/365 isn't really a solution. If anything I should probably reduce the RAM to see if I can make it break faster to help me fix it (except that it's now a live system and without it working they don't have lights....)

    Thanks for the PM, email sent. Happy reading!
     
    more-solutions, Aug 1, 2011
    #8
  9. more-solutions

    daniel C-Busser Moderator

    Joined:
    Jul 26, 2004
    Messages:
    766
    Likes Received:
    20
    Location:
    Adelaide
    Thanks Mark, flagged for investigation. (#21816)
     
    daniel, Aug 5, 2011
    #9
  10. more-solutions

    daniel C-Busser Moderator

    Joined:
    Jul 26, 2004
    Messages:
    766
    Likes Received:
    20
    Location:
    Adelaide
    Hi Mark,

    I'm just following up on this report. Can you confirm whether this problem still exists with the Toolkit 1.12.x / C-Gate 2.9.x series?
     
    daniel, May 29, 2013
    #10
  11. more-solutions

    more-solutions

    Joined:
    Apr 23, 2006
    Messages:
    283
    Likes Received:
    4
    Location:
    Peterborough, UK
    We haven't been able to upgrade this site yet; it's a Linux box and my attempts at getting the latest C-gate running on Linux haven't been particularly successful (to be fair I haven't had the opportunity to spend all that much time on it). Is there a reason to expect a change?
     
    more-solutions, May 29, 2013
    #11
  12. more-solutions

    daniel C-Busser Moderator

    Joined:
    Jul 26, 2004
    Messages:
    766
    Likes Received:
    20
    Location:
    Adelaide
    I think so. C-Gate 2.9.0 fixed 3 leaks:

    fix a thread leak where 'events' is executed many times (#21860)
    fix a thread/memory leak with backgrounded commands (#21474)
    fix a thread leak with repeated NET OPENs (#20150)

    Also there is a new property, 'get cgate threads' which will tell you if there is a runaway thread count.

    Hope you get a chance to update the site, if so let me know how it went.
     
    daniel, May 29, 2013
    #12
Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.