Home > Debugging, Operations > File Handle Leaks in Hudson

File Handle Leaks in Hudson

12/17/2010

Hudson is Awesome
We recently switched from cruisecontrol.rb to Hudson and have been much happier. It’s more reliable and we get much better resource management using build queues.

Hudson Failure
However, this week Hudson has stopped responding several times with the following error:

Dec 17, 2010 12:41:29 PM hudson.triggers.SCMTrigger$Runner runPolling
SEVERE: Failed to record SCM polling
hudson.plugins.git.GitException: Error retrieving tag names
        at hudson.plugins.git.GitAPI.getTagNames(GitAPI.java:650)
   ... snip ...
Caused by: java.io.IOException: Cannot run program "git" (in directory "/home/cruise/.hudson/server/jobs/plm-website-master/workspace"): java.io.IOException: error=24, Too many open files
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
   ... snip ...
Caused by: java.io.IOException: java.io.IOException: error=24, Too many open files
        at java.lang.UNIXProcess.<init>(UNIXProcess.java:164)
   ... snip ...

Proximate Cause
Hudson is bound by the default process file limit, of 1024 on this linux box. When it hits the limit, these type of failures occur preventing forking off child processes. Something was leaking file handles. Using lsof showed file handles allocated up the the limit, 99% of which were pipes.

Root Cause
Whenever Hudson forks off a child process, it monitors the stdout/stderr streams even after they finish executing. So if you spawn off a daemonized process and don’t close out your output streams, you will leak 2-3 file handles on every execution. We have a simple ruby script that spawns off to report build status in our Campfire chatroom.

Being a naive script, we weren’t properly closing out our file handles.

We switched the Post-build script execution from:

ruby campfire.rb &

To:

ruby scrub_fds.sh campfire.rb &

Where this is scrub_fds.sh:

#!/bin/sh

eval exec {0..2}\>/dev/null
eval exec {3..255}\>\&- # Close ALL file descriptors

$*

Reference Links

UPDATE: Looks like this is mostly caused by a bug in the Git plugin for Hudson (Fixed in 1.390).

Advertisements
Categories: Debugging, Operations Tags:
%d bloggers like this: