I have implemented a two level cache built on top of JBoss Cache, which is mostly what I will document here. This can then serve as a starting point for designing a similar protocol for the Infinispan binary protocol.
I will start by describing the three main interactions.
- Getting Data
- Putting Data
- Flushing Data
The Get Request
The steps for getting data. If the data is not locally available (in the L1 cache), it is requested from the caching servers (the L2 cache).
- The application makes a request against the L1 cache.
- The L1 cache does not have the data, and forwards the request to a caching server.
- The receiving server determines which server owns the data.
- The request is forwarded to the appropriate server.
- The owning server retrieves the data.
- The data is returned to the client's server.
- The data is forwarded to the original requesting client.
- The L1 cache receives the response, adds the data to the local cache, and returns the data to the requesting process.
A get request retrieves data from the cache
Subsequent requests for the same data will be fulfilled by the L1 cache, until an application writes new data.
The Put Request
A put is the primary way to write data into the cache. Data is always written to the caching servers.
- A client executes a put request against the L1 cache.
- The put request is forwarded to one of the caching servers.
- The receiving server determines which server owns the data.
- The request is forwarded to the appropriate server.
- The owning server places the data in its cache.
- A response is sent back to the original server.
- The original server receives the put response.
- Data Modified events are sent to all connections (server and client) except the original submitter.
- Data Modified events are resent to all connected clients by each server.
- Each client removes the modified data from the local L1 cache.
- The original server sends a put response to the original client.
- The original client puts the data into its local cache.
At the end of this process, the original client, as well as the caching server, has the data that was just put into the cache. No other clients wil have copies of the data, and so will have to retrieve it from the central cache when they do a get.
The Flush Request
The last major functionality for the caching servers is a flush request. This removes the data from a section of the cache. For example, when a client sync is done, the caches that might have depended on the client data are flushed.
- A client executes a flush, or clearData, request against the L1 cache.
- The clearData request is forwarded to one of the caching servers.
- The clearData request is executed against the local cache on the receiving server.
- The clearData request is forwarded to each server.
- Each server clears the data from the named cache segment.
- Each server responds to the original server.
- The original server receives the responses
- NodeDataRemoved events are sent to each client, except the original client connected to the original server.
- NodeDataRemoved events are sent to each server.
- NodeDataRemoved events are sent from each server to each connected client.
- Each client removes all data within the named cache.
- A clearData response is sent to the original client.
- The data is removed from the L1 cache on the original client.
At the end of this process, neither the central servers nor the clients have copies of the cleared data. Any get of the data will return a null, and the data will be retrieved from the primary source, and put into the cache. Eventually these puts will repopulate the caching servers with up to date data.
The Messages
Requests, responses and events are sent between components using a binary protocol that provides both efficiency and extensibility. Each message is simply a sequence of bytes. Only the first five are required to follow a specific pattern. The remainder of the message will follow a form that best suits the specific message. For example a request will follow this form:
Byte | Description |
---|---|
Byte 0 | (0x90) Cache Request Marker |
Byte 1 | int containing request type. |
Byte 2 | |
Byte 3 | |
Byte 4 | |
Byte 5 | Data specific to the request type. |
. . . | |
Byte n |
A Cache Request contains a magic byte identifying it as a cache message, an int giving the specific message type, and then the message.
The Initial Byte
The initial byte marker serves two main purposes. First, it allows the parser to confirm that the current byte is the beginning of request or response as we have defined them. Secondly, in the longer term, it is possible that the cache can check the value of this byte and interpret the request as appropriate to the first byte. For example, the memcache binary protocol uses 0x80 to mark a request.
Value | Meaning |
---|---|
0x80 | memcached binary request |
0x81 | memcached binary response |
0x90 | binary request |
0x91 | binary response |
0x92 | cache event |
Message Types
This set of messages serves well to implement a second level cache (L2). The Fqn refers to a Fully qualified name from JBoss cache. In this implementation the Fqn is expected to be a string. It provides a useful way to segment data by category. For example, customer data will be under the customer Fqn, and system data will be under the system Fqn. It is easy to examine one set of data, limit the number of entries, retrieve all the entries in any one cache, and there is no need to be concerned about collisions between keys.
Message | Marker | Type | ID | Status | Payload |
---|---|---|---|---|---|
EchoRequest | 0x90 | 100 | id | 0 | UTF-8 String |
ExistsRequest | 0x90 | 110 | id | 0,1 | Fqn |
ExistsResponse | 0x91 | 111 | id | boolean | |
GetRequest | 0x90 | 104 | id | 0,1,2 | FQN, key |
GetResponse | 0x91 | 105 | id | value | |
GetChildrenRequest | 0x90 | 108 | id | 0,1 | Fqn |
GetChildrenResponse | 0x91 | 109 | id | int n followed by n Strings | |
GetSegmentRequest | 0x90 | 106 | id | 0,1 | Fqn |
GetSegmentResponse | 0x91 | 107 | id | key-value map | |
PutRequest | 0x90 | 102 | id | 0,1,2 | FQN, key, value |
PutResponse | 0x91 | 103 | id | value | |
RegistrationRequest | 0x90 | 112 | id | 0 | UTF-8 name, UTF-8 host, UTF-8 port, int weight |
RegistrationResponse | 0x91 | 113 | id | boolean | |
RemoveNodeData | 0x90 | 116 | id | 0,1 | UTF-8 Fqn, key |
RemoveNodeDataResponse | 0x91 | 117 | id | ||
RemoveRequest | 0x90 | 114 | id | 0,1 | UTF-8 Fqn, key |
RemoveResponse | 0x91 | 115 | id | value | |
ErrorResponse | 0x91 | 500 | id | UTF-8 Error message, UTF-8 Stack trace | |
DataModifiedEvent | 0x92 | 200 | id | UTF-8 Fqn, Key | |
NodeDataRemovedEvent | 0x92 | 201 | id | UTF-8 Fqn |
- The type
- A unique integer that identifies the message type. It is used to retrieve the encoders and decoders for messages, and everything after the type is completely dependant on the encoder and decoder. Thus arbitrary messages can be introduced over time.
- The id
- A four byte integer that is sent with each request, and the same id is returned by the corresponding response. This allows the client to match the response to the request. The id is ignored with events, which are by their nature asynchronous.
- The status
- Reflects the stage in processing of the request. When first generated by the client, and on initial arrival at the server, the status is 0. The request is then routed to the appropriate servers. The first, primary, request will have a status of 1, and is generally done synchronously. The additional requests, made for replication, will have a status of 2, and are generally handled asynchronously.
- The Fqn
- (and other strings) are UTF-8 encoded.
- The Keys and Values
- Can be arbitrary objects, with the sole condition that their encoding can be contained in a Java array. Practically, the keys should not be too large.
The keys, values, and other fields are encoded using a platform neutral encoding scheme to be detailed below.
The most interesting fields are the message payloads, especaially the keys and values that can have any type. Representing these fields is particularly challanging because they have to be represented in a platform neutral way.
Encoding Data
I have chosen an encoding that presents the length in bytes followed by the type of the field, then the data encoded as described by the first two fields. There are also alternatives which should also be investigated such as Google's protocol buffers.
The length of field is a four byte big endian int, and usually counts the bytes of data, as well as the bytes giving the data type. However, for maps and arrays, the size field represents the number of entries. This allows maps and arrays to be encoded, and sent before the full size of the message is known. Each field within the map or array will encode it's own size allowing it to be decoded consistantly.
The type is also an int (but might be expanded), where the type is encoded as a combination of bit fields.
Type | Value |
---|---|
ARRAY | 1 |
BYTE | 2 |
BOOLEAN | 4 |
CHARACTER | 8 |
COMPRESSED | 16 |
DATE | 32 |
DOUBLE | 64 |
FLOAT | 128 |
INTEGER | 256 |
LONG | 512 |
MAP | 1024 |
PRIMITIVE | 2048 |
SERIALIZED | 4096 |
SHORT | 8192 |
STRING | 16384 |
STRINGBUFFER | 32768 |
STRINGBUILDER | 65536 |
An example will help clerify this. Consider how a Boolean true would be encoded.
Length field | Type field | Data |
---|---|---|
Byte 0-3 | Byte 4-7 | Byte 8 |
00 00 00 05 | 00 00 00 04 | 0x01 |
An interesting variation is the representation of a primative byte. It is identical, except the primative bit is set in the type field.
Length field | Type field | Data |
---|---|---|
Byte 0-3 | Byte 4-7 | Byte 8 |
00 00 00 05 | 00 00 08 04 | 0x01 |
We can tie together the encoding for the messages as well as for the fields to produce a full message.
90 00 00 00 68 00 00 00 05 00 00 00 00 13 2F 43 6C 69 65 6E 74 52 65 67 69 73 74 72 61 74 69 6F 6E 00 00 00 27 00 00 20 00 31 30 31 38 2E 69 65 77 35 76 68 6E 43 46 79 4B 4B 4F 44 46 48 30 6A 58 57 53 61 30 4E 41 39 77 57 6A 38
Field | Value | Meaning | |
---|---|---|---|
Marker | 90 | This is a request | |
Request Type | 00 00 00 68 | 104, this is a get request. | |
Request ID | 00 00 00 05 | This is the fifth request from this client. | |
Request Status | 00 | This is a new request from a client. | |
Cache name character count | 00 00 00 13 | 19 characters, the cache name is always a string. | |
Cache name | 2F 43 6C 69 65 6E 74 52 65 67 69 73 74 72 61 74 69 6F 6E | / C l i e n t R e g i s t r a t i o n | |
Key Byte Count | 00 00 00 27 | The key is contqained in 39 bytes | |
Key Data Type | 00 00 40 00 | 16384, this is a string | |
The Key | 31 30 31 38 2E 69 65 77 35 76 68 6E 43 46 79 4B 4B 4F 44 46 48 30 6A 58 57 53 61 30 4E 41 39 77 57 6A 38 | 1 0 1 8 . i e w 5 v h n C F y K K O D F H 0 j X W S a 0 N A 9 w W j 8 |
Another interesting issue, is the performance of the server side routing.
If there are x connections spread over n servers, we can estimate the order of magnitude of the work done by each of the n servers.
There will be 1⁄n direct connections to each server, and the other (1-1⁄n) connections will be scattered over the other servers.
For requests comming in over the direct connections, 1⁄n will be handled directly. That is, the request will be read, and the response written.
1-1⁄n requests will be routed to another server. This involves an additional request-response pair to forward the request. Thus handling these requests requires roughoy twice the work as one handled directly.
Of the requests directed to other servers, 1⁄n of them will be routed back to our server, which will handle them with a single request reaponse pair.
We can add these terms to see that the workload on any one server scales as
1⁄n * (1⁄n + 2*(1- 1⁄n)) + (1-1⁄n) * 1⁄nWhich is just a little less than tripple the work that would be done if the client directed the request to the appropriate server initially. However, the solution still scales predominantly with n.
= 3⁄n -2⁄n2
Comments