I'm still trying to get a better understanding of why my pages aren't getting transformed properly with the portlet bridge XSLT transform.
At first, I thought it might have had something to do with the uppercase tag names in the XSLT, but I found out that was a conscious choice by the portletbridge folks. Apparently NekoHTML, the library the team uses to try and clean up the HTML code, uppercases all tag names by default. Apparently, this is actually what is defined by the HTML 4 specification, according to their FAQ at http://people.apache.org/~andyc/neko/doc/html/faq.html#uppercase. I'm still trying to get to the root of this issue. If anyone else is interested, I can try to post an example here of the source, and the XSLT.
In any event. I got to thinking about this a bit, and I'm wondering if XSLT will be feasible for a portal user. Although I think that transforming to XHTML and then using XSLT is technically the best way to go for web-clipping, I doubt whether most portal admins will be able to use it. Also, I'm not sure if that the default behavior of simply grabbing a whole page and displaying it in a portlet is the right approach, either. I would think that in a lot of situations, you really want to grab some small piece of a page instead, like a particular paragraph, or a table or something. It would be great if you could have a UI that would let you simply select the sections of the page you wanted to clip out, and let the portlet figure out the best way to grab those elements (by "id" attribute, regexp, XPath, etc).
I think, for now, the portletbridge structurally still makes sense, but it may need to be adjusted to add in some better ease of use and alternate transformation methods.