1 2 Previous Next 27 Replies Latest reply on Aug 30, 2010 2:07 PM by dmlloyd

    Process Manager

    kabirkhan

      I've started writing some tests in PM.

       

      1) I sometimes get the following exceptions when stopping a process:

      java.io.EOFException

      at org.jboss.as.process.StreamUtils.readChar(StreamUtils.java:59)

      at org.jboss.as.process.StreamUtils.readWord(StreamUtils.java:44)

      at org.jboss.as.process.ManagedProcess$OutputStreamHandler.run(ManagedProcess.java:155)

      at java.lang.Thread.run(Thread.java:637)

       

      The code is:

       

          static Status readWord(final InputStream input, final StringBuilder dest) throws IOException {
              dest.setLength(0);
              int c;
              for (;;) {
                  c = readChar(input);
                  switch (c) {
                      case -1: return Status.END_OF_STREAM;
                      case 0: return Status.MORE;
                      case '\n': return Status.END_OF_LINE;
                      default: dest.append((char) c);
                  }
              }
          }
      
          private static final String INVALID_BYTE = "Invalid byte";
      
          static int readChar(final InputStream input) throws IOException {
              final int a = input.read();
              if (a < 0) {
                  throw new EOFException(); // Here
              } else if (a == 0) {
                  return -1
       
      

       

      Why is EOFException thrown here? If I change it to return -1, I get no exceptions.

       

      2) If I call PM.removeProcess() on a started process the process does not get stopped. I suggest either making it stop the process and removing it, or throwing some exception.

       

      3) Calling

      PM.startProcess("Test");

      PM.stopProcess("Test");

      PM.startProcess("Test");  //*

      PM.stopProcess("Test");

       

      does not seem to start the server again at *, I believe that should happen?

       

      4) For my purposes It would be useful to have some methods in PM to list started and added processes. Can I add those?

        • 1. Re: Process Manager
          brian.stansberry

          Re:

           

          1) Yeah, I noticed that yesterday myself. Better ping David; I think this code came from JBoss Marshalling.

           

          If readChar() now returns -1 we have to make sure other callers of that method deal with that properly. An alternative is to have readWord() catch the EOFException and return Status.END_OF_STREAM. Or, just get rid of Status.END_OF_STREAM. There are other methods, like readInt(), readLong() or even directly reading from the stream like CheckedBytes does w/ a readFully() call that will throw an EOFException.

           

          (Background -- David prototyped this as a character-based messaging system. Last week I hacked in the SEND_BYTES/BROADCAST_BYTES/MSG_BYTES stuff because it was clear we needed to send serialized objects. So now things are a bit inconsistent.)

           

          2) Agreed. I think logging a WARN and trying to stop sounds good.

           

          3) Yes, that should work. TBH I don't know if we'll ever actually use the API that way, but if it's not working that smells like a bug.

           

          4) Go for it.

          • 2. Re: Process Manager
            kabirkhan

            3) Was my fault, I've fixed my tests

            • 3. Re: Process Manager
              kabirkhan

              I've got to move on to some EAP stuff for a while.

               

              I now have a framework for the tests running, and now have some tests working and a good idea of what more is needed:

               

              The tests include

              -using ProcessManagerMaster to add/start/stop/remove processes

              -using PMM to send and broadcast string and byte[] messages to processes

              -using ProcessManagerSlave to send string and byte[] messages to processes

              -using ProcessManagerSlave to send broadcast string and byte[] messages to processes (TBD)

              -using ProcessManagerSlave to add/start/stop/remove processes (TBD)

               

              I should be able to start working on improving PM tomorrow or Thursday.

               

              From testing I have found a few bugs that I need to fix:

              - 1) and 2) in my original post

              - If I have a few processes started and broadcast a message via PMM, and then remove a process and broadcast again I get some problems

               

              The next step is to remove the System.in/out communication mechanism between the processes with sockets. I have implemented something to gather data from my test processes, and should be able to reuse the concept:

               

              -PMM starts and listens on a given address and port

              -When PMM starts the processes it passes through the address and port in the args

              -Processes PMS opens a socket to the PMM on the given address/port which is then used for communication

               

              Do I need two sockets? One for PM->process and one for process->PM, or can the same one be used?

              • 4. Re: Process Manager
                emuckenhuber

                Kabir Khan wrote:

                 

                The next step is to remove the System.in/out communication mechanism between the processes with sockets. I have implemented something to gather data from my test processes, and should be able to reuse the concept:


                 

                AFAIK, Stdio should be the default and socket communication will only be used if you configure -interprocess port and address. So we would need both.

                • 5. Re: Process Manager
                  kabirkhan

                  Emanuel Muckenhuber wrote:

                   

                  AFAIK, Stdio should be the default and socket communication will only be used if you configure -interprocess port and address. So we would need both.

                  Really? Stdio sounds a bit crappy and fragile to me. I've not tested this yet, but doesn't that mean that if somebody writes to System.out in the child process that we'll end up with an invalid message received in the PM? Anyway, for a proper AS set up we would probably use sockets?

                  • 6. Re: Process Manager
                    emuckenhuber

                    Kabir Khan wrote:

                     

                    Really? Stdio sounds a bit crappy and fragile to me. I've not tested this yet, but doesn't that mean that if somebody writes to System.out in the child process that we'll end up with an invalid message received in the PM? Anyway, for a proper AS set up we would probably use sockets?

                    System.in, .out, .err are getting replaced  to avoid what you described, using a Stdio project David created. I do agree that it sounds a bit odd, but this is the reason we have PM - so in case the child process fails and stdio becomes unavailable it can respawn the process without affecting other managed processes. I think the main reason for having Stdio as default is to avoid opening a port by default. I'm not sure if socket based communication would be that less fragile, or if we recommend using sockets as proper setup?

                    • 7. Re: Process Manager
                      dmlloyd

                      Emanuel Muckenhuber wrote:

                       

                      Kabir Khan wrote:

                       

                      Really? Stdio sounds a bit crappy and fragile to me. I've not tested this yet, but doesn't that mean that if somebody writes to System.out in the child process that we'll end up with an invalid message received in the PM? Anyway, for a proper AS set up we would probably use sockets?

                      System.in, .out, .err are getting replaced  to avoid what you described, using a Stdio project David created. I do agree that it sounds a bit odd, but this is the reason we have PM - so in case the child process fails and stdio becomes unavailable it can respawn the process without affecting other managed processes. I think the main reason for having Stdio as default is to avoid opening a port by default. I'm not sure if socket based communication would be that less fragile, or if we recommend using sockets as proper setup?

                       

                      Both IPC solutions have pros and cons.

                       

                      The advantages to stdio are:

                      1. No file descriptors need to be opened outside of the ones opened by Runtime.exec().  This is good because Runtime.exec() is a little fragile; if you start a process right when you create an FD, that FD may leak into the new process.
                      2. Fewer moving parts.  Some situations just don't have to be dealt with - like what to do if the server doesn't connect to the socket?
                      3. No port # needs to be configured or opened.

                       

                      The advantages to sockets are:

                      1. If some badly-behaved JNI extension is dumping garbage to stdout, it can cause message corruption; thus a stdio-based solution needs a reliable delivery layer.
                      2. If a bug crops up in the IPC layer then it's possible to just dump the connection and create a new one.

                       

                      Ultimately we decided that the user shouldn't have to configure a port by default, but we can't ignore the possibility of the JNI thing so sockets should be an option as well.

                      • 8. Re: Process Manager
                        kabirkhan

                         

                        kkhan:I've got a stable set of tests now, I see there is a TODO in ManagedProcess for "detect crash & respawn logic". I take that to mean that if a managed process goes down without being initiated by PMM we should start it again?
                        [19:22]dmlloyd:kkhan: yes, it should come back up - subject to some rules
                        [19:23]kkhan:dmlloyd: And those rules are?
                        [19:23]dmlloyd:i.e. we don't want to get into a situation where the process crashes instantly and we respan it 100 times in a second
                        [19:23]dmlloyd:that kind of thing can bog down a system, not to mention filling up log files
                        [19:23]kkhan:dmlloyd: ok, makes sense
                        [19:24]dmlloyd:maybe we should always have a crash restart delay, which grows in relation to the rate of restart or something
                        [19:24]dmlloyd:a non-crash restart should be instant though, so we need to differentiate

                         

                        Branch 'process-mgr-brian' on git://github.com/kabir/jboss-as.git now contains the respawn logic. It has a pluggable RespawnPolicy, someone may want to check the defaults in RespawnPolicy.DefaultRespawnPolicy. I am differentiating between if a process was shutdown via ProcessManagerMaster or if it was killed.

                        [19:25]dmlloyd:not to mention detecting the case where a server comes down because the user wants it down

                         

                        Here I am unsure if David means

                        • the server was shutdown via PMM - I am handling this already
                        • the server was shutdown some other way (e.g. similar to shutdown via the JMXConsole). If we are meant to handle this, I don't really see any other way than to check the exit code of the process with 0 considered a clean shutdown and all other values a crash.
                        • 9. Re: Process Manager
                          brian.stansberry

                          On the "server shut down because the user wants it shut down", can that be handled by the server sending a message to PM telling it it's going down deliberately?

                           

                          The server-manager Main class has something in that direction, i.e. method that was meant to be called early in the SM bootstrap if there were an unrecoverable condition:

                           

                          private void abort(Throwable t) {
                                  try {
                                      // Inform the process manager that we are shutting down on purpose
                                      // so it doesn't try to respawn us
                          
                                      // FIXME implement abort()
                                      throw new UnsupportedOperationException("implement me");
                                      
                          //            if (t != null) {
                          //                t.printStackTrace(System.err);
                          //            }
                                      
                                  } finally {
                                      System.exit(1);
                                  }
                              }
                          
                          • 10. Re: Process Manager
                            dmlloyd

                            Brian Stansberry wrote:

                             

                            On the "server shut down because the user wants it shut down", can that be handled by the server sending a message to PM telling it it's going down deliberately?

                             

                            The server-manager Main class has something in that direction, i.e. method that was meant to be called early in the SM bootstrap if there were an unrecoverable condition:

                             

                            private void abort(Throwable t) {
                                    try {
                                        // Inform the process manager that we are shutting down on purpose
                                        // so it doesn't try to respawn us
                             
                                        // FIXME implement abort()
                                        throw new UnsupportedOperationException("implement me");
                                        
                            //            if (t != null) {
                            //                t.printStackTrace(System.err);
                            //            }
                                        
                                    } finally {
                                        System.exit(1);
                                    }
                                }
                            

                             

                            I dunno.  Maybe.  You'd have to guarantee that the server/SM's shutdown hook doesn't exit until the PM acknowledges the message though.

                             

                            Might be easier to use specific exit values: 0 means shut down on purpose, anything else means crash?

                             

                            Either way the SM should be notified if a Server exits, regardless of the cause: if the server is not configured to be shut down but it returns 0, the SM needs to kick it back alive again.  Also the same respawn logic has to apply; otherwise a poorly-placed System.exit(0) can bog down the system with fork/execs.

                            • 11. Re: Process Manager
                              brian.stansberry

                              David Lloyd wrote:

                               

                              Might be easier to use specific exit values: 0 means shut down on purpose, anything else means crash?

                               

                              That certainly seems simpler.

                               

                              Either way the SM should be notified if a Server exits, regardless of the cause: if the server is not configured to be shut down but it returns 0, the SM needs to kick it back alive again.  Also the same respawn logic has to apply; otherwise a poorly-placed System.exit(0) can bog down the system with fork/execs.

                               

                              +1 on notifying the SM and the restart rules. Whether the SM should automatically restart the server following getting a signal that one shut down cleanly, I'm not so sure. The signal should imply some human action was taken to cause the shutdown, and if that's the case I think requiring human action to restart makes sense. The SM should squawk in the logs though if the shutdown happens via something other than the intended managment API.

                              • 12. Re: Process Manager
                                brian.stansberry

                                David Lloyd wrote:

                                 

                                The advantages to sockets are:

                                1. If some badly-behaved JNI extension is dumping garbage to stdout, it can cause message corruption; thus a stdio-based solution needs a reliable delivery layer.

                                 

                                Found another one. List the processes to find the pid for a server and then

                                 

                                kill -3 40547

                                 

                                The thread dump goes to stdout, resulting in this in the PM's log:

                                 

                                17:32:31,897 ERROR [Server:server-one] Received unknown command: java.lang.IllegalArgumentException: No enum const class org.jboss.as.process.Command.2010-08-12 17:32:31
                                    at java.lang.Enum.valueOf(Enum.java:196) [:1.6.0_20]
                                    at org.jboss.as.process.Command.valueOf(Command.java:28)
                                    at org.jboss.as.process.ManagedProcess$OutputStreamHandler.run(ManagedProcess.java:182)
                                    at java.lang.Thread.run(Thread.java:637) [:1.6.0_20]
                                
                                17:32:31,902 ERROR [Server:server-one] Received unknown command: java.lang.IllegalArgumentException: No enum const class org.jboss.as.process.Command.Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01-279 mixed mode):
                                    at java.lang.Enum.valueOf(Enum.java:196) [:1.6.0_20]
                                    at org.jboss.as.process.Command.valueOf(Command.java:28)
                                    at org.jboss.as.process.ManagedProcess$OutputStreamHandler.run(ManagedProcess.java:182)
                                    at java.lang.Thread.run(Thread.java:637) [:1.6.0_20]
                                
                                17:32:31,903 ERROR [Server:server-one] Received unknown command: java.lang.IllegalArgumentException: No enum const class org.jboss.as.process.Command.
                                    at java.lang.Enum.valueOf(Enum.java:196) [:1.6.0_20]
                                    at org.jboss.as.process.Command.valueOf(Command.java:28)
                                    at org.jboss.as.process.ManagedProcess$OutputStreamHandler.run(ManagedProcess.java:182)
                                    at java.lang.Thread.run(Thread.java:637) [:1.6.0_20]
                                
                                17:32:31,904 ERROR [Server:server-one] Received unknown command: java.lang.IllegalArgumentException: No enum const class org.jboss.as.process.Command."DestroyJavaVM" prio=5 tid=12f1c2800 nid=0x100501000 waiting on condition [00000000]
                                    at java.lang.Enum.valueOf(Enum.java:196) [:1.6.0_20]
                                    at org.jboss.as.process.Command.valueOf(Command.java:28)
                                    at org.jboss.as.process.ManagedProcess$OutputStreamHandler.run(ManagedProcess.java:182)
                                    at java.lang.Thread.run(Thread.java:637) [:1.6.0_20]
                                

                                 

                                That actually wouldn't be so bad if we just wrote an ERROR message with the unknown command string, no exception message or stack trace.

                                • 13. Re: Process Manager
                                  brian.stansberry

                                  Getting rid of the exception class and the stack trace actually results in fairly nicely formatted info in the PM logs:

                                   

                                  18:34:31,134 ERROR [ServerManager] Received unknown command: 2010-08-12 18:34:31
                                  18:34:31,135 ERROR [ServerManager] Received unknown command: Full thread dump Java HotSpot(TM) 64-Bit Server VM (16.3-b01-279 mixed mode):
                                  18:34:31,135 ERROR [ServerManager] Received unknown command: 
                                  18:34:31,135 ERROR [ServerManager] Received unknown command: "DestroyJavaVM" prio=5 tid=101930000 nid=0x100501000 waiting on condition [00000000]
                                  18:34:31,136 ERROR [ServerManager] Received unknown command:    java.lang.Thread.State: RUNNABLE
                                  18:34:31,136 ERROR [ServerManager] Received unknown command: 
                                  18:34:31,136 ERROR [ServerManager] Received unknown command: "Poller SunPKCS11-Darwin" daemon prio=1 tid=103186800 nid=0x1178ef000 waiting on condition [1178ee000]
                                  18:34:31,136 ERROR [ServerManager] Received unknown command:    java.lang.Thread.State: TIMED_WAITING (sleeping)
                                  18:34:31,136 ERROR [ServerManager] Received unknown command:     at java.lang.Thread.sleep(Native Method)
                                  18:34:31,137 ERROR [ServerManager] Received unknown command:     at sun.security.pkcs11.SunPKCS11$TokenPoller.run(SunPKCS11.java:692)
                                  18:34:31,137 ERROR [ServerManager] Received unknown command:     at java.lang.Thread.run(Thread.java:637)
                                  18:34:31,137 ERROR [ServerManager] Received unknown command: 
                                  18:34:31,137 ERROR [ServerManager] Received unknown command: "Server Manager Process" prio=5 tid=1030ca000 nid=0x117781000 runnable [117780000]
                                  18:34:31,137 ERROR [ServerManager] Received unknown command:    java.lang.Thread.State: RUNNABLE
                                  18:34:31,137 ERROR [ServerManager] Received unknown command:     at java.io.FileInputStream.readBytes(Native Method)
                                  18:34:31,138 ERROR [ServerManager] Received unknown command:     at java.io.FileInputStream.read(FileInputStream.java:199)
                                  18:34:31,138 ERROR [ServerManager] Received unknown command:     at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
                                  18:34:31,138 ERROR [ServerManager] Received unknown command:     at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
                                  18:34:31,138 ERROR [ServerManager] Received unknown command:     - locked <107c454c0> (a java.io.BufferedInputStream)
                                  18:34:31,138 ERROR [ServerManager] Received unknown command:     at org.jboss.as.process.StreamUtils.readChar(StreamUtils.java:57)
                                  18:34:31,139 ERROR [ServerManager] Received unknown command:     at org.jboss.as.process.StreamUtils.readWord(StreamUtils.java:44)
                                  18:34:31,139 ERROR [ServerManager] Received unknown command:     at org.jboss.as.process.ProcessManagerSlave$Controller.run(ProcessManagerSlave.java:217)
                                  18:34:31,139 ERROR [ServerManager] Received unknown command:     at java.lang.Thread.run(Thread.java:637)
                                  18:34:31,139 ERROR [ServerManager] Received unknown command: 
                                  18:34:31,139 ERROR [ServerManager] Received unknown command: "ClassLoader Thread" daemon prio=5 tid=101805800 nid=0x11746d000 in Object.wait() [11746c000]
                                  18:34:31,139 ERROR [ServerManager] Received unknown command:    java.lang.Thread.State: WAITING (on object monitor)
                                  18:34:31,140 ERROR [ServerManager] Received unknown command:     at java.lang.Object.wait(Native Method)

                                   

                                  etc etc

                                  • 14. Re: Process Manager
                                    dmlloyd

                                    Blah.  Having stack traces go to stdout is probably a killer for this idea - that's a very basic troubleshooting technique.  Is it worth trying to use a clever protocol that can't be mistaken for this output?  It might be easier overall just to bind to 127.0.0.1:0 and pass the bound port number to each started up process as a command-line argument, and just forget the stdio idea.

                                    1 2 Previous Next