1 2 Previous Next 18 Replies Latest reply on May 14, 2005 9:28 AM by snowcr Go to original post
      • 15. Re: Integrating jASEN
        snowcr

        Hi all,

        I'm the founder of the jASEN project and thought I may be able to answer some of the questions posted.


        jASEN has two different directories of data that it relies on - one for configuration information and one is the "database" it uses to determine scoring.

        ...

        Where can/should these directories go for a production style deployment?


        The two directories referred to here need only be in the classpath of the application. They can physically reside anywhere as long as they can be referenced as relative paths by the engine and hence must be in the classpath.


        I need to do a bit more research but from what I currently read the configuration can be read-only (if you never want to change the config of course) but the "database" should be read-write to allow training.


        This is not quite correct. The "database" (which is really just a file) is loaded at start-up and referenced in a read-only fashion during operation. Training is currently an offline process and whilst this database can be updated dynamically (without restarting the engine) there is no facility for live "training" in the current release.


        5 - run out of memory when initializing the jasen confg :(


        This is likely happening due to the in-memory nature of the spam heuristics used in the engine. For this there are two solutions:

        1. Increase the default heap size (as you have done)
        2. Implement your own JasenMapStore class. This is the actual "database" of heuristics. The default implementation loads the DB info memory by you can implement it however you like.

        The simplest solution is to just increase the heap size.

        wrt MIME formatting. jASEN MUST be given a MIME formatted email. This is because it uses JavaMail to parse the email message and hence must conform to the relevant rfc's (822, 2822). If the message is not in MIME format you pretty much can't scan it. The assumption here is that jASEN scans only email.

        • 16. Re: Integrating jASEN

           

          2. Implement your own JasenMapStore class. This is the actual "database" of heuristics. The default implementation loads the DB info memory by you can implement it however you like.


          I would like to see this implemented, such that it can be loaded from an RDBMS (hibernate is your friend).

          What sort of tests would be appropriate?


          Mostly check that when a message is submitted to the engine, that a X-SpamScore is applied. Don't go overboard unit testing how appropriate the score is as we don't really need unit test jASEN. Perhaps one spam and one non-spam to test both cases.

          • 17. Re: Integrating jASEN
            snowcr

             


            I would like to see this implemented, such that it can be loaded from an RDBMS (hibernate is your friend).


            Agreed. I have used Hibernate in a previous project and have found it to be better than a cold beer on a hot day (not withstanding the time I was banned from the Hibernate forum for being a rude pr@#k)

            The only reason this is not the default implementation is so the engine can be immediately tested and usable. The objective is that jASEN provides a framework first and foremost, and a default (reference) implementation as a secondary consideration.

            Having said that, the default implementation is currently being used in a commercial production system...

            The implementation of a JasenMapStore class is a very simple task (it literally consists of two methods.. load and save). The creation of a compatible db schema is a separate issue but is still pretty simple (really just a list of words and associated probabilities obtained during training).



            • 18. Re: Integrating jASEN
              snowcr

              P.S.


              What sort of tests would be appropriate?


              If you get a false positive, or a spam get's through the best way to identify why is to test that single email (if you can) against the ScanFile test tool provided with jASEN. This will scan the message and display the results of each plugin in the engine (to standard out).

              1 2 Previous Next