Hello everyone,
I was using the camel-http component to visit a site (for getting its html code) using this route (after a 8 mins quartz component)
.to("http://alerts.weather.gov/cap/us.php?x=0");
I noticed that some of the extracted content was wrong, and i don't really thing it was about the encoding, I was getting things like:
original:
This really made me go through a lot of trouble. Every time the quartz component triggered the http component, the extracted code had different errors on different lines, it's not a consistent issue . Anyway, if anyone else is using the http component for html code extracting, i solved it replacing the component for a method on an bean:
public String getHtmlCode() throws MalformedURLException, IOException {
URL capAlertAtomFeedSite = new URL("http://alerts.weather.gov/cap/us.php?x=0");
StringBuffer code = new StringBuffer("");
URLConnection yc = capAlertAtomFeedSite.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
code.append(inputLine + "\n");
}
in.close();
return code.toString();
}
Hope it helps anyone with the same problem :-D
You may check this blog
http://blog.nanthrax.net/2011/07/website-mashup-with-apache-camel/
It uses the tidy markup to cleanup the returned html pages. It may help in your situations as well.