Results 1 to 2 of 2

Thread: Process being killed mysteriously

  1. #1
    Join Date
    Nov 2006
    Posts
    3

    Default Process being killed mysteriously

    Hi,

    We have a long running daemon on a couple of our systems that has been running fine for the last 12 months, but over the last week this process has suddenly started being killed. There are no errors in the process log. Sometimes the process shows that it has been shut down cleanly (it reports receiving a shutdown signal, then closes the log and exits), but at other times it is just terminated with no evidence in the log.

    There is some evidence of problems in the failcnt values in /proc/user_beancounters, but values there certainly do not change every time the process is killed. Failures are showing in the privvmpages and numtcpsock rows.

    The daemon in question is being run on two separate OpenVZ instances, and we are seeing the mystery shutdowns on both machines (not at the same time, it seems to happen randomly).

    I've included the last 100 lines of output from strace which recorded an instance of the process being killed.

    Code:
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_SETMASK, [], NULL)      = 0
    sigprocmask(SIG_SETMASK, [], NULL)      = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    gettimeofday({1269197890, 87030}, NULL) = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
    stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    write(3, "2010-03-21 18:58:10.87030 axon(9"..., 82) = 82
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    sigprocmask(SIG_SETMASK, [], NULL)      = 0
    sigprocmask(SIG_SETMASK, [], NULL)      = 0
    sigprocmask(SIG_SETMASK, [], NULL)      = 0
    sigprocmask(SIG_SETMASK, [], NULL)      = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    unlink("/u/apps/axon/releases/20100127135105/pids/axon.pid") = 0
    sigprocmask(SIG_BLOCK, NULL, [])        = 0
    rt_sigaction(SIGINT, {SIG_DFL}, {0xb7f478f0, [], SA_RESTORER, 0xb7eade78}, 8) = 0
    rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
    write(9, "\27\3\1\0 \312V\201f\323\376\215J\'M\25\344\323\227$\v"..., 74) = 74
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
    write(9, "\25\3\1\0 \307\253\301L\326n\371\227\0\r\303!p\225\000"..., 37) = 37
    rt_sigpending([])                       = 0
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    close(9)                                = 0
    close(8)                                = 0
    munmap(0xb6342000, 4096)                = 0
    close(8)                                = -1 EBADF (Bad file descriptor)
    munmap(0xb6343000, 4096)                = 0
    close(7)                                = 0
    munmap(0xb6344000, 4096)                = 0
    close(7)                                = -1 EBADF (Bad file descriptor)
    munmap(0xb7232000, 4096)                = 0
    close(6)                                = 0
    munmap(0xb7233000, 4096)                = 0
    close(6)                                = -1 EBADF (Bad file descriptor)
    munmap(0xb7234000, 4096)                = 0
    close(5)                                = 0
    munmap(0xb7235000, 4096)                = 0
    close(5)                                = -1 EBADF (Bad file descriptor)
    munmap(0xb7236000, 4096)                = 0
    rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
    write(4, "\27\3\1\0 ~\232\226X#\374t\367\263D\241\277\201\211\225"..., 74) = 74
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    rt_sigprocmask(SIG_BLOCK, [PIPE], [], 8) = 0
    write(4, "\25\3\1\0 <f0\262\217\7\220\345\344JE\373{Mg\252\22\233"..., 37) = 37
    rt_sigpending([])                       = 0
    rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
    close(4)                                = 0
    close(3)                                = 0
    munmap(0xb74fb000, 4096)                = 0
    write(2, "/u/apps/axon/releases/2010012713"..., 2400) = 2400
    exit_group(0)
    Can anyone help with any interpretation of this strace output?

    My feeling (not based on any good evidence), is that some process monitor in the OpenVZ container is killing our jobs. Does this sound likely? Would we see the same problem if our machines were hosted on a Xen system?

    Any help I could get with this would be gratefully appreciated. Currently we are able to get round this problem by restarting the processes whenever they stop, but this isn't really a sustainable solution for the future, so if we can't resolve this issue we may have to think about alternative hosting providers;

  2. #2
    Join Date
    Jul 2009
    Location
    New York
    Posts
    465

    Default

    well both failures indicate high memory use and too many connections. either ask to increase those limits (what is the cat beancounters for numtcpsock on openvz?).

    As an alternative maybe disable keepalive to limit your tcp connections or maybe ask to increase your port speed.
    Happily on a Link3. Feel free to request a refferal code that takes a 10% lifetime discount off your hosting.
    Please open a SUPPORT TICKET AND CALL vpslink to resolve your vpslink problems FIRST
    I am a CUSTOMER Only. Any vpslink issues should be directed to Matt@VPSLink and/or Michael@VPSLink

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •