-
1. Re: UTF-8 encoding in Stomp frames
jmesnil Feb 16, 2010 10:33 AM (in response to mjustin)you're right, it was a bug: if the frame body is a String it was not properly encoded with UTF-8.
I've just fixed it in the trunk (r8882).
thanks for the heads up
-
2. Re: UTF-8 encoding in Stomp frames
timfox Feb 16, 2010 10:46 AM (in response to jmesnil)Actually the fix is not right.
If a core message is of type text, then the data won't necessarily be encoded as UTF-8.
The encoding is defined in ChannelBufferWrapper::readStringInternal, depending on the length it is encoded in different ways for optimal performance.
I think the problem here is you are trying to mix and match STOMP and core messages without defining any proper mapping, and assuming a text message sent by core will available as a STOMP text message.
Also I can't see anywhere in the STOMP protocol definition where it says the text is encoded as UTF-8 on the wire, for all we know it might just be ascii, or some other encoding.
-
3. Re: UTF-8 encoding in Stomp frames
jmesnil Feb 16, 2010 10:49 AM (in response to timfox)timfox wrote:
Also I can't see anywhere in the STOMP protocol definition where it says the text is encoded as UTF-8 on the wire, for all we know it might just be ascii, or some other encoding.
It's implied that UTF-8 is the default encoding: http://activemq.apache.org/stomp/stomp10/additional.html#character_encoding
-
4. Re: UTF-8 encoding in Stomp frames
timfox Feb 16, 2010 10:55 AM (in response to jmesnil)jmesnil wrote:
timfox wrote:
Also I can't see anywhere in the STOMP protocol definition where it says the text is encoded as UTF-8 on the wire, for all we know it might just be ascii, or some other encoding.
It's implied that UTF-8 is the default encoding: http://activemq.apache.org/stomp/stomp10/additional.html#character_encoding
OK, so that's what the activemq guys have assumed, as it's ommitted from the spec, so let's go with that.
However, using the TEXT_TYPE is incorrect - this will fail if you try to consume a message that has been sent by a core client with a size < 9 or > 0xfff bytes
You need to define your own types for STOMP messages, not hijack the core types.
Also, this seems very convoluted:
byte[] content = frame.getContent();
if (type == Message.TEXT_TYPE)
{
message.getBodyBuffer().writeNullableSimpleString(SimpleString.toSimpleString(new String(content)));
}Why not write the content directly in the buffer?
message.getBodyBuffer().writeBytes(content) ?
-
5. Re: UTF-8 encoding in Stomp frames
jmesnil Feb 17, 2010 8:57 AM (in response to timfox)I don't understand how it is different from JMS HornetQTextMessage "hijacking" the core TEXT_TYPE.
The idea was to provide interoperability between Stomp messages and our Core/JMS messages:
- if the Stomp message has no content-length, treats its body as a String => convert it to a TEXT_TYPE core message so that we can consume it as a JMS TextMessage
- else treat it as a BYTES_TYPE, so we can consume it as a JMS BytesMessage
Am I missing a more obvious way to do this?
-
6. Re: UTF-8 encoding in Stomp frames
mjustin Feb 17, 2010 10:24 AM (in response to jmesnil)Hello Jeff,
many thanks for the information, the last change causes problems with other languages (for example expected: <አማርኛ> but was: <አማáˆáŠ›> for the code sequence am = 'አማርኛ'). It is not a high priority for me at the moment and I see it is work in progress.
Regards,
Michael
-
7. Re: UTF-8 encoding in Stomp frames
jmesnil Feb 18, 2010 9:56 AM (in response to timfox)Also, this seems very convoluted:
byte[] content = frame.getContent();
if (type == Message.TEXT_TYPE)
{
message.getBodyBuffer().writeNullableSimpleString(SimpleString.toSimpleString(new String(content)));
}Why not write the content directly in the buffer?
message.getBodyBuffer().writeBytes(content) ?
This is for interoperability with JMS.
If I was directly writing the content, the message would not be readable as a JMS TextMessage (which expects a nullable string from its
body buffer).
Either we keep this or we remove all this code and tells our user that they must use only JMS BytesMessage if they want to interact with messages
send/consumed by Stomp (not very friendly for a text-orientated protocol).
I'd prefer to be able to use by default JMS TextMessage to interoperate with Stomp messages.
wdyt?
-
8. Re: UTF-8 encoding in Stomp frames
timfox Feb 18, 2010 10:30 AM (in response to jmesnil)jmesnil wrote:
I don't understand how it is different from JMS HornetQTextMessage "hijacking" the core TEXT_TYPE.
The idea was to provide interoperability between Stomp messages and our Core/JMS messages:
- if the Stomp message has no content-length, treats its body as a String => convert it to a TEXT_TYPE core message so that we can consume it as a JMS TextMessage
- else treat it as a BYTES_TYPE, so we can consume it as a JMS BytesMessage
Am I missing a more obvious way to do this?
Even if we were to provide some automatic transformation between jms text messages and stomp messages, which is not required to implement the STOMP protocol, then the way you have done it wouldn't work anyway. Strings are encoded in core messages in a more complex way than a NullableSimpleString, like I mentioned in a previous post.
Let's get the basic STOMP protocol implemented first and we can think about implementing "extras" like mappings between stomp and jms later.
-
9. Re: UTF-8 encoding in Stomp frames
mjustin Feb 18, 2010 11:00 AM (in response to timfox)Hi Tim,
I agree here, Stomp-JMS mapping is nice to have but clients can also set a user defined property like 'content-type' to detect text or binary messages. In this case, the Stomp frame could always contain the content-length header, which does no longer indicate content type. Such a basic protocol implementation would be fine for most use cases. Until now I am very impressed by the Stomp transport and the ease of use of HornetQ.
btw I would like to post a short announcement for my (commercial) Delphi and Free Pascal client library for HornetQ, would this be allowed in this forum?
Regards,
Michael
-
10. Re: UTF-8 encoding in Stomp frames
timfox Feb 18, 2010 11:16 AM (in response to mjustin)mjustin wrote:
btw I would like to post a short announcement for my (commercial) Delphi and Free Pascal client library for HornetQ, would this be allowed in this forum?
Regards,
Michael
Sure, I don't mind
-
11. Re: UTF-8 encoding in Stomp frames
jmesnil Feb 18, 2010 11:19 AM (in response to mjustin)ok, i'll remove all the code I added to provide JMS interop and add a JIRA issue for JMS/Stomp interop in a next release.
I'll also rewrite the StompTest (they were using mixed JMS and Stomp messages).
Once this and the frame decoder code is done, the task should be finished.
-
12. Re: UTF-8 encoding in Stomp frames
timfox Feb 18, 2010 11:26 AM (in response to jmesnil)You can keep a test in there that validates a STOMP message can be received as JMS bytes message containing the UTF-8 encoded bytes, and a JMS BytesMessage can be received as STOMP message containing those bytes, which would be the default behaviour in the absence of any more complex stomp<->jms mapping.