Skip to content

DO NOT MERGE - do not leak connection file descriptors - delete connection from connection pool map only when connection is closed#5

Closed
w8ai wants to merge 2 commits intodevelopfrom
feature/2357-remove-leak-on-error
Closed

DO NOT MERGE - do not leak connection file descriptors - delete connection from connection pool map only when connection is closed#5
w8ai wants to merge 2 commits intodevelopfrom
feature/2357-remove-leak-on-error

Conversation

@w8ai
Copy link
Copy Markdown
Contributor

@w8ai w8ai commented Nov 10, 2015

======== DO NOT MERGE - WORK IN PROGRESS ===========

refs https://github.com/stackengine/controller/issues/2357

ISSUE - upon error:

  • the map entry for the key for the connection is removed
  • client shutdown for the connection is called
  • client release for the connection is called
    • since there are more than 0 references (1)
    • the connection is not closed

leakage

[ admin              ] [ Debug  ] 2015/11/10 22:52:43 v2imagehandlers.go:50 API:  /api/v2/images/non-existent:latest/hosts/2f91a299-73ae-5493-d13e-f7de9dce2e24/pull
[ admin              ] [ Debug  ] 2015/11/10 22:52:43 v2imagehandlers.go:102 performing pull action for repo_id non-existent:latest
[ clntrpc            ] [ Debug  ] 2015/11/10 22:52:43 pool.go:138 ========= get Clnt: 0x0 for key 172.16.44.191:8001/REG
[ clntrpc            ] [ Debug  ] 2015/11/10 22:52:43 pool.go:144 ========= new conn: 0xc20898aa20 for key 172.16.44.191:8001/REG
[ svcrpc             ] [ Debug  ] 2015/11/10 22:52:43 server.go:232 Processing connection from 172.16.44.191:45135
[ clntrpc            ] [ Debug  ] 2015/11/10 22:52:43 pool.go:147 ========= get Clnt: 0xc20898aa20 for key 172.16.44.191:8001/REG  172.16.44.191:45135<==>172.16.44.191:8001
[ clntrpc            ] [ Debug  ] 2015/11/10 22:52:43 pool.go:138 ========= get Clnt: 0xc20898aa20 for key 172.16.44.191:8001/REG
[ clntrpc            ] [ Debug  ] 2015/11/10 22:52:43 pool.go:147 ========= get Clnt: 0xc20898aa20 for key 172.16.44.191:8001/REG  172.16.44.191:45135<==>172.16.44.191:8001
[ status-sender      ] [ Debug  ] 2015/11/10 22:52:44 sender.go:100 sent 1 events

[ wharf              ] [ Warn   ] 2015/11/10 22:52:45 image_server.go:50 pull image non-existent:latest on host mesh-02 due to error: Error: image library/non-existent:latest not found
[ clntrpc            ] [ Debug  ] 2015/11/10 22:52:45 pool.go:114 Shutdown: 0xc2084a6cc0
[ clntrpc            ] [ Debug  ] 2015/11/10 22:52:45 client.go:136 ========= not closing 0xc20898aa20 - refCount: 2  shutdown: 1
[ clntrpc            ] [ Debug  ] 2015/11/10 22:52:45 pool.go:169 error on Call():  Error: image library/non-existent:latest not found
[ clntrpc            ] [ Debug  ] 2015/11/10 22:52:45 client.go:136 ========= not closing 0xc20898aa20 - refCount: 1  shutdown: 1

[ storelib           ] [ Debug  ] 2015/11/10 22:52:46 fts.go:16 indexing status.Record:task|6affd2c4-15c6-4dbe-a23c-cd855c5d3dd8|||container_state
[ status-sender      ] [ Debug  ] 2015/11/10 22:52:46 sender.go:100 sent 2 events

@w8ai w8ai force-pushed the feature/2357-remove-leak-on-error branch from c3acc4e to fb2d900 Compare November 10, 2015 23:09
@w8ai w8ai changed the title added debug messages do not leak connection file descriptors - delete connection from connection pool map only when connection is closed Nov 10, 2015
@w8ai
Copy link
Copy Markdown
Contributor Author

w8ai commented Nov 10, 2015

The problem with this approach may be that we never spawn a new connection when there are actual rpc errors? Thinking...

@w8ai
Copy link
Copy Markdown
Contributor Author

w8ai commented Nov 10, 2015

DEMO

[ admin              ] [ Debug  ] 2015/11/10 23:13:16 v2imagehandlers.go:50 API:  /api/v2/images/non-existent:latest/hosts/2f91a299-73ae-5493-d13e-f7de9dce2e24/pull
[ admin              ] [ Debug  ] 2015/11/10 23:13:16 v2imagehandlers.go:102 performing pull action for repo_id non-existent:latest
[ clntrpc            ] [ Debug  ] 2015/11/10 23:13:16 pool.go:139 get Clnt: 0xc20843c090 for key 172.16.44.191:8001/REG
[ clntrpc            ] [ Debug  ] 2015/11/10 23:13:16 pool.go:148 get Clnt: 0xc20843c090 for key 172.16.44.191:8001/REG  172.16.44.191:45174<==>172.16.44.191:8001
[ clntrpc            ] [ Debug  ] 2015/11/10 23:13:16 pool.go:139 get Clnt: 0xc20843c090 for key 172.16.44.191:8001/REG
[ clntrpc            ] [ Debug  ] 2015/11/10 23:13:16 pool.go:148 get Clnt: 0xc20843c090 for key 172.16.44.191:8001/REG  172.16.44.191:45174<==>172.16.44.191:8001
[ status-sender      ] [ Debug  ] 2015/11/10 23:13:17 sender.go:100 sent 1 events
[ wharf              ] [ Warn   ] 2015/11/10 23:13:17 image_server.go:50 pull image non-existent:latest on host mesh-02 due to error: Error: image library/non-existent:latest not found
[ clntrpc            ] [ Debug  ] 2015/11/10 23:13:17 pool.go:114 Shutdown: 0xc208660360
[ clntrpc            ] [ Debug  ] 2015/11/10 23:13:17 client.go:139 Release: not calling Close() 0xc20843c090 with refCount: 2  shutdown: 1
[ clntrpc            ] [ Debug  ] 2015/11/10 23:13:17 pool.go:170 error on Call():  Error: image library/non-existent:latest not found
[ clntrpc            ] [ Debug  ] 2015/11/10 23:13:17 client.go:139 Release: not calling Close() 0xc20843c090 with refCount: 1  shutdown: 1
[ storelib           ] [ Debug  ] 2015/11/10 23:13:18 fts.go:16 indexing status.Record:task|f6e8599a-0360-4fa6-b838-267b515f2dbb|||container_state
[ status-sender      ] [ Debug  ] 2015/11/10 23:13:18 sender.go:100 sent 2 events

…ection pool map only when connection is closed
@w8ai w8ai force-pushed the feature/2357-remove-leak-on-error branch from fb2d900 to 3dc2ad2 Compare November 11, 2015 01:32
@w8ai
Copy link
Copy Markdown
Contributor Author

w8ai commented Nov 11, 2015

closing for now until panic from https://github.com/stackengine/controller/pull/2373 is resolved

@w8ai w8ai closed this Nov 11, 2015
@w8ai
Copy link
Copy Markdown
Contributor Author

w8ai commented Nov 11, 2015

easier repro steps - latest debug logging for panic:

vagrant@mesh-02:~/src/github.com/stackengine/controller$ cat ~/src/mice-conflict.yaml 
mickey:
    image: ubuntu:14.04
    command: bash -c 'echo this is a message from {{.ContainerName}}; echo sleeping 42m; sleep 42m'
    ports:
      - 8989:8989
minnie:
    image: ubuntu:14.04
    command: bash -c 'echo this is a message from {{.ContainerName}}; echo sleeping 99m; sleep 99m'
    ports:
      - 8989:8989   # guaranteed to cause port conflict and will cause minnie to never start
    links:
        - mickey
vagrant@mesh-02:~/src/github.com/stackengine/controller$ ./build/dev/stack s run ~/src/mice-conflict.yaml 
vagrant-20151111-132800
[ crpc               ] [ Debug  ] 2015/11/11 13:29:20 leaderclient.go:40 Calling: 172.16.44.191:8001
[ clntrpc            ] [ Debug  ] 2015/11/11 13:29:20 pool.go:157 getClnt: 0xc2086f4090 for key 172.16.44.191:8001/REG  172.16.44.191:50419<==>172.16.44.191:8001
[ clntrpc            ] [ Debug  ] 2015/11/11 13:29:20 pool.go:157 getClnt: 0xc2086f4090 for key 172.16.44.191:8001/REG  172.16.44.191:50419<==>172.16.44.191:8001
[ wharf              ] [ Debug  ] 2015/11/11 13:29:20 container_server.go:97 Inspected container: &docker.Container{ID:"50d763bcac7beaf5bbcbf5774e8478039ac0c1e0a28c63be26ef6c5c321d040d", Created:time.Time{sec:63582845360, nsec:53735382, loc:(*time.Location)(0x1d3bf40)}, Path:"bash", Args:[]string{"-c", "echo this is a message from 0.minnie.vagrant-20151111-132800; echo sleeping 99m; sleep 99m"}, Config:(*docker.Config)(0xc2085b3500), State:docker.State{Running:false, Paused:false, Restarting:false, OOMKilled:false, Pid:0, ExitCode:0, Error:"", StartedAt:time.Time{sec:0, nsec:0, loc:(*time.Location)(0x1d3bf40)}, FinishedAt:time.Time{sec:0, nsec:0, loc:(*time.Location)(0x1d3bf40)}}, Image:"e9ae3c220b23b699cb5e6914af806219b028c78d5cf6fed8eeca98ffaa8c9b43", Node:(*docker.SwarmNode)(nil), NetworkSettings:(*docker.NetworkSettings)(0xc20854f720), SysInitPath:"", ResolvConfPath:"", HostnamePath:"", HostsPath:"", Name:"/0.minnie.vagrant-20151111-132800", Driver:"aufs", Volumes:map[string]string(nil), VolumesRW:map[string]bool(nil), HostConfig:(*docker.HostConfig)(0xc208931b00), ExecIDs:[]string(nil), RestartCount:0, AppArmorProfile:""}
[ clntrpc            ] [ Debug  ] 2015/11/11 13:29:20 client.go:140 Release: not calling Close() 0xc2086f4090 for key 172.16.44.191:8001/REG with refCount: 3 shutdown: 1
[ storelib           ] [ Debug  ] 2015/11/11 13:29:20 sd_callback.go:153 ignoring service, nil NetworkSettings or Ports on 50d763bcac7beaf5bbcbf5774e8478039ac0c1e0a28c63be26ef6c5c321d040d
[ storelib           ] [ Debug  ] 2015/11/11 13:29:20 sd_callback.go:153 ignoring service, nil NetworkSettings or Ports on 50d763bcac7beaf5bbcbf5774e8478039ac0c1e0a28c63be26ef6c5c321d040d
[ wharf              ] [ Debug  ] 2015/11/11 13:29:20 container_leader.go:657 rpc client for start container of container 50d763bcac7beaf5bbcbf5774e8478039ac0c1e0a28c63be26ef6c5c321d040d has hostTCPAddress 172.16.44.191:8001
[ clntrpc            ] [ Debug  ] 2015/11/11 13:29:20 pool.go:157 getClnt: 0xc2086f4090 for key 172.16.44.191:8001/REG  172.16.44.191:50419<==>172.16.44.191:8001
[ plugins:docker     ] [ Debug  ] 2015/11/11 13:29:20 docker.go:674 
 >>>>>> EVENT: &docker.APIEvents{Status:"die", ID:"50d763bcac7beaf5bbcbf5774e8478039ac0c1e0a28c63be26ef6c5c321d040d", From:"ubuntu:14.04", Time:1447248560}
[ wharf              ] [ Warn   ] 2015/11/11 13:29:20 container_server.go:117 Unable to start container on host 172.16.44.191: &docker.HostConfig{Binds:[]string{"/var/lib/stackengine/containers/env:/var/lib/stackengine/containers/env"}, CapAdd:[]string(nil), CapDrop:[]string(nil), ContainerIDFile:"", LxcConf:[]docker.KeyValuePair(nil), Privileged:false, PortBindings:map[docker.Port][]docker.PortBinding{"8989/tcp":[]docker.PortBinding{docker.PortBinding{HostIP:"", HostPort:"8989"}}}, Links:[]string{"0.mickey.vagrant-20151111-132800:mickey"}, PublishAllPorts:false, DNS:[]string(nil), DNSSearch:[]string(nil), ExtraHosts:[]string(nil), VolumesFrom:[]string(nil), NetworkMode:"", IpcMode:"", PidMode:"", UTSMode:"", RestartPolicy:docker.RestartPolicy{Name:"", MaximumRetryCount:0}, Devices:[]docker.Device(nil), LogConfig:docker.LogConfig{Type:"", Config:map[string]string(nil)}, ReadonlyRootfs:false, SecurityOpt:[]string(nil), CgroupParent:"", Memory:0, MemorySwap:0, CPUShares:0, CPUSet:"", CPUQuota:0, CPUPeriod:0, Ulimits:[]docker.ULimit(nil)} due to error: API error (500): Cannot start container 50d763bcac7beaf5bbcbf5774e8478039ac0c1e0a28c63be26ef6c5c321d040d: failed to create endpoint 0.minnie.vagrant-20151111-132800 on network bridge: Bind for 0.0.0.0:8989 failed: port is already allocated

[ clntrpc            ] [ Debug  ] 2015/11/11 13:29:20 pool.go:115 Shutdown: 0xc208404ba0
[ clntrpc            ] [ Debug  ] 2015/11/11 13:29:20 client.go:140 Release: not calling Close() 0xc2086f4090 for key 172.16.44.191:8001/REG with refCount: 3 shutdown: 1
[ clntrpc            ] [ Debug  ] 2015/11/11 13:29:20 pool.go:179 error on Call():  Unable to start container 50d763bcac7b on host mesh-02 172.16.44.191 due to error API error (500): Cannot start container 50d763bcac7beaf5bbcbf5774e8478039ac0c1e0a28c63be26ef6c5c321d040d: failed to create endpoint 0.minnie.vagrant-20151111-132800 on network bridge: Bind for 0.0.0.0:8989 failed: port is already allocated

[ clntrpc            ] [ Debug  ] 2015/11/11 13:29:20 client.go:140 Release: not calling Close() 0xc2086f4090 for key 172.16.44.191:8001/REG with refCount: 2 shutdown: 1
[ clntrpc            ] [ Debug  ] 2015/11/11 13:29:20 pool.go:115 Shutdown: 0xc208404ba0
[ clntrpc            ] [ Debug  ] 2015/11/11 13:29:20 client.go:140 Release: not calling Close() 0xc2086f4090 for key 172.16.44.191:8001/REG with refCount: 1 shutdown: 1
[ clntrpc            ] [ Debug  ] 2015/11/11 13:29:20 pool.go:179 error on Call():  Unable to start container 50d763bcac7b on host mesh-02 172.16.44.191 due to error API error (500): Cannot start container 50d763bcac7beaf5bbcbf5774e8478039ac0c1e0a28c63be26ef6c5c321d040d: failed to create endpoint 0.minnie.vagrant-20151111-132800 on network bridge: Bind for 0.0.0.0:8989 failed: port is already allocated

[ clntrpc            ] [ Debug  ] 2015/11/11 13:29:20 client.go:134 Release: calling Close() 0xc2086f4090 Conn:0xc2086f4090 type: REG
 ref: 0 key: 172.16.44.191:8001/REG addr: 172.16.44.191:8001 shutdown: 1
[ replication        ] [ Err    ] 2015/11/11 13:29:20 agent.go:413 unable to start deployment container '0.minnie.vagrant-20151111-132800'/default - StartContainer 50d763bcac7beaf5bbcbf5774e8478039ac0c1e0a28c63be26ef6c5c321d040d failed: Unable to start container 50d763bcac7b on host mesh-02 172.16.44.191 due to error API error (500): Cannot start container 50d763bcac7beaf5bbcbf5774e8478039ac0c1e0a28c63be26ef6c5c321d040d: failed to create endpoint 0.minnie.vagrant-20151111-132800 on network bridge: Bind for 0.0.0.0:8989 failed: port is already allocated

: Unable to start container 50d763bcac7b on host mesh-02 172.16.44.191 due to error API error (500): Cannot start container 50d763bcac7beaf5bbcbf5774e8478039ac0c1e0a28c63be26ef6c5c321d040d: failed to create endpoint 0.minnie.vagrant-20151111-132800 on network bridge: Bind for 0.0.0.0:8989 failed: port is already allocated

[ svcrpc             ] [ Debug  ] 2015/11/11 13:29:20 server.go:238 Close connection from 172.16.44.191:50419
[ LogLinesOnDie      ] [ Debug  ] 2015/11/11 13:29:20 log_lines_on_die.go:69 sending 1000 log lines for container 0.minnie.vagrant-20151111-132800 50d763bcac7beaf5bbcbf5774e8478039ac0c1e0a28c63be26ef6c5c321d040d
[ storelib           ] [ Debug  ] 2015/11/11 13:29:20 sd_callback.go:153 ignoring service, nil NetworkSettings or Ports on 50d763bcac7beaf5bbcbf5774e8478039ac0c1e0a28c63be26ef6c5c321d040d
...
[ storelib           ] [ Debug  ] 2015/11/11 13:29:20 fts.go:16 indexing status.Record:deployment|vagrant-20151111-132800|service/host_group/slot|minnie/default/0|container_death_last_logs
[ status-sender      ] [ Debug  ] 2015/11/11 13:29:20 sender.go:100 sent 7 events
[ replication        ] [ Debug  ] 2015/11/11 13:29:30 agent.go:154 polling for change after 10 seconds
[ replication        ] [ Debug  ] 2015/11/11 13:29:30 agent.go:159 ========== WORKING vagrant-20151111-132800
[ replication        ] [ Debug  ] 2015/11/11 13:29:30 agent.go:162 getting all containers for stack vagrant-20151111-132800
[ crpc               ] [ Debug  ] 2015/11/11 13:29:30 leaderclient.go:40 Calling: 172.16.44.191:8001
[ clntrpc            ] [ Debug  ] 2015/11/11 13:29:30 pool.go:157 getClnt: 0xc2086f4090 for key 172.16.44.191:8001/REG    connection is nil
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0xb6ab4f]

goroutine 344 [running]:
net/rpc.(*Client).send(0x0, 0xc208457f90)
    /usr/local/go/src/net/rpc/client.go:72 +0x3af
net/rpc.(*Client).Go(0x0, 0x15871d0, 0x23, 0x12cbd00, 0xc20888e5c0, 0xf3f1a0, 0xc20854c0a0, 0xc2087d8ba0, 0xfbd680)
    /usr/local/go/src/net/rpc/client.go:309 +0x13e
net/rpc.(*Client).Call(0x0, 0x15871d0, 0x23, 0x12cbd00, 0xc20888e5c0, 0xf3f1a0, 0xc20854c0a0, 0x0, 0x0)
    /usr/local/go/src/net/rpc/client.go:315 +0xa5
github.com/stackengine/serpc/client.(*ConnPool).RPC(0xc208404ba0, 0x7ffcc3e16b70, 0xc2088eacc0, 0x13c76f0, 0x3, 0xc208112602, 0x15871d0, 0x23, 0x12cbd00, 0xc20888e5c0, ...)
    /home/vagrant/src/github.com/stackengine/serpc/client/pool.go:176 +0x258
github.com/stackengine/serpc/client.(*ConnPool).Call(0xc208404ba0, 0x7ffcc3e16b70, 0xc2088eacc0, 0x13c76f0, 0x3, 0x2, 0x15871d0, 0x23, 0x12cbd00, 0xc20888e5c0, ...)
    /home/vagrant/src/github.com/stackengine/serpc/client/pool.go:189 +0xce
github.com/stackengine/controller/crpc.(*LeaderClient).Call(0x1d8f7d0, 0x15871d0, 0x23, 0x12cbd00, 0xc20888e5c0, 0xf3f1a0, 0xc20854c0a0, 0x0, 0x0)
    /home/vagrant/src/github.com/stackengine/controller/crpc/leaderclient.go:41 +0x449
github.com/stackengine/controller/wharf.(*client).ListDetailedContainers(0xc20863ed70, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/vagrant/src/github.com/stackengine/controller/wharf/container_client.go:20 +0x1a1
github.com/stackengine/controller/replication.(*agent).run(0xc2086129a0)
    /home/vagrant/src/github.com/stackengine/controller/replication/agent.go:163 +0x106e
created by github.com/stackengine/controller/replication.newAgent
    /home/vagrant/src/github.com/stackengine/controller/replication/agent.go:59 +0x3ec


@w8ai w8ai reopened this Nov 11, 2015
@w8ai w8ai changed the title do not leak connection file descriptors - delete connection from connection pool map only when connection is closed DO NOT MERGE - do not leak connection file descriptors - delete connection from connection pool map only when connection is closed Nov 11, 2015
@w8ai w8ai closed this Nov 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant