1 Reply Latest reply on Jul 19, 2011 5:25 PM by Rogelio Sevilla

    Using regex on fuse esb

    Rogelio Sevilla Novice

      Hello everyone:

       

      I don't know if this is the right place to ask this question but here it goes.

       

      I'm building a camel route which gets an html code and then, it applies a regular expression match to extract certain data.

       

      I know the regex works right when deployed as a simple java app. But i know that fuse ESB is a multi threaded enviroment so, there are more difficulties when using non thread safe code.

       

      To be honest, i'm not a multi-thread advanced coder, so I was wondering if anyone here has some advice on using regex on fuse esb. this is the method i'm using:

       

       

       

          private static final Pattern regex = Pattern.compile("link href='(.*)'");

           

          public String getUrls(@Body String htmlcode) {

               

              Matcher matcher =  regex.matcher(htmlcode);

              StringBuffer urls = new StringBuffer("");

               

              synchronized(this){

              matcher.matches();

               

               

              matcher.find();

       

              while (matcher.find()) {

                  urls.append(matcher.group(1));

                  urls.append("\n");

              

              }

              matcher.reset();

      }

              return urls.toString();

           

      }

       

       

      What i want to get in here is a list of urls. Curiously, when i extract the url list, around 8 links from 150 are totally wrong. something like:

       

      http://right_url.com?a=1111

      http://right_url.com?a=2222

      http://right_url.com?a=3333

      htp:/wrong_url.com?a=asdsa

      http://right_url.com?a=4444

      http://right_url.com?a=5555

      h:/wrong_url.com?asdsa

       

       

      I know my regex is fine because i'm executing this code every 5 minutes using a quartz component, and the next time the match is executed (on the exact same html code), the wrong urls are extracted correctly, and then, some others are now extracted incorrectly :-S .I thought the synchronized block would fix this, but not.

       

      I've been dealing with this problem for a couple of days without  success; does anyone has any experience  on fuse esb regex usage  that could shed some light on this??.

       

      thanks a lot in advance.

       

      Edited by: rogelio_sevilla1 on Jul 19, 2011 5:56 PM