Version 5

    I have implemented a two level cache built on top of JBoss Cache, which is mostly what I will document here. This can then serve as a starting point for designing a similar protocol for the Infinispan binary protocol.

    I will start by describing the three main interactions.

      • Getting Data
      • Putting Data
      • Flushing Data

    The Get Request

    The steps for getting data. If the data is not locally available (in the L1 cache), it is requested from the caching servers (the L2 cache).

      1. The application makes a request against the L1 cache.
      2. The L1 cache does not have the data, and forwards the request to a caching server.
      3. The receiving server determines which server owns the data.
      4. The request is forwarded to the appropriate server.
      5. The owning server retrieves the data.
      6. The data is returned to the client's server.
      7. The data is forwarded to the original requesting client.
      8. The L1 cache receives the response, adds the data to the local cache, and returns the data to the requesting process.
    GetSteps.png

    A get request retrieves data from the cache

     

    Subsequent requests for the same data will be fulfilled by the L1 cache, until an application writes new data.

    The Put Request

    A put is the primary way to write data into the cache. Data is always written to the caching servers.

      1. A client executes a put request against the L1 cache.
      2. The put request is forwarded to one of the caching servers.
      3. The receiving server determines which server owns the data.
      4. The request is forwarded to the appropriate server.
      5. The owning server places the data in its cache.
      6. A response is sent back to the original server.
      7. The original server receives the put response.
      8. Data Modified events are sent to all connections (server and client) except the original submitter.
      9. Data Modified events are resent to all connected clients by each server.
      10. Each client removes the modified data from the local L1 cache.
      11. The original server sends a put response to the original client.
      12. The original client puts the data into its local cache.
    PutSteps.png

    At the end of this process, the original client, as well as the caching server, has the data that was just put into the cache. No other clients wil have copies of the data, and so will have to retrieve it from the central cache when they do a get.

    The Flush Request

    The last major functionality for the caching servers is a flush request. This removes the data from a section of the cache. For example, when a client sync is done, the caches that might have depended on the client data are flushed.

      1. A client executes a flush, or clearData, request against the L1 cache.
      2. The clearData request is forwarded to one of the caching servers.
      3. The clearData request is executed against the local cache on the receiving server.
      4. The clearData request is forwarded to each server.
      5. Each server clears the data from the named cache segment.
      6. Each server responds to the original server.
      7. The original server receives the responses
      8. NodeDataRemoved events are sent to each client, except the original client connected to the original server.
      9. NodeDataRemoved events are sent to each server.
      10. NodeDataRemoved events are sent from each server to each connected client.
      11. Each client removes all data within the named cache.
      12. A clearData response is sent to the original client.
      13. The data is removed from the L1 cache on the original client.
    FlushSteps.png

    At the end of this process, neither the central servers nor the clients have copies of the cleared data. Any get of the data will return a null, and the data will be retrieved from the primary source, and put into the cache. Eventually these puts will repopulate the caching servers with up to date data.


    The Messages

    Requests, responses and events are sent between components using a binary protocol that provides both efficiency and extensibility. Each message is simply a sequence of bytes. Only the first five are required to follow a specific pattern. The remainder of the message will follow a form that best suits the specific message. For example a request will follow this form:

     

    ByteDescription
    Byte 0(0x90) Cache Request Marker
    Byte 1int containing request type.
    Byte 2
    Byte 3
    Byte 4
    Byte 5Data specific to the request type.
    .
    .
    .
    Byte n

    A Cache Request contains a magic byte identifying it as a cache message, an int giving the specific message type, and then the message.

     

    The Initial Byte

    The initial byte marker serves two main purposes. First, it allows the parser to confirm that the current byte is the beginning of request or response as we have defined them. Secondly, in the longer term, it is possible that the cache can check the value of this byte and interpret the request as appropriate to the first byte. For example, the memcache binary protocol uses 0x80 to mark a request.

    ValueMeaning
    0x80memcached binary request
    0x81memcached binary response
    0x90binary request
    0x91binary response
    0x92cache event
    The first byte identifies the protocol. 

    Message Types

    This set of messages serves well to implement a second level cache (L2). The Fqn refers to a Fully qualified name from JBoss cache. In this implementation the Fqn is expected to be a string. It provides a useful way to segment data by category. For example, customer data will be under the customer Fqn, and system data will be under the system Fqn. It is easy to examine one set of data, limit the number of entries, retrieve all the entries in any one cache, and there is no need to be concerned about collisions between keys.

    MessageMarkerTypeIDStatusPayload
    EchoRequest0x90100id0UTF-8 String
    ExistsRequest0x90110id0,1Fqn
    ExistsResponse0x91111idboolean
    GetRequest0x90104id0,1,2FQN, key
    GetResponse0x91105idvalue
    GetChildrenRequest0x90108id0,1Fqn
    GetChildrenResponse0x91109idint n followed by n Strings
    GetSegmentRequest0x90106id0,1Fqn
    GetSegmentResponse0x91107idkey-value map
    PutRequest0x90102id0,1,2FQN, key, value
    PutResponse0x91103idvalue
    RegistrationRequest0x90112id0UTF-8 name, UTF-8 host, UTF-8 port, int weight
    RegistrationResponse0x91113idboolean
    RemoveNodeData0x90116id0,1UTF-8 Fqn, key
    RemoveNodeDataResponse0x91117id
    RemoveRequest0x90114id0,1UTF-8 Fqn, key
    RemoveResponse0x91115idvalue
    ErrorResponse0x91500idUTF-8 Error message, UTF-8 Stack trace
    DataModifiedEvent0x92200idUTF-8 Fqn, Key
    NodeDataRemovedEvent0x92201idUTF-8 Fqn
    The type
    A unique integer that identifies the message type. It is used to retrieve the encoders and decoders for messages, and everything after the type is completely dependant on the encoder and decoder. Thus arbitrary messages can be introduced over time.
    The id
    A four byte integer that is sent with each request, and the same id is returned by the corresponding response. This allows the client to match the response to the request. The id is ignored with events, which are by their nature asynchronous.
    The status
    Reflects the stage in processing of the request. When first generated by the client, and on initial arrival at the server, the status is 0. The request is then routed to the appropriate servers. The first, primary, request will have a status of 1, and is generally done synchronously. The additional requests, made for replication, will have a status of 2, and are generally handled asynchronously.
    The Fqn
    (and other strings) are UTF-8 encoded.
    The Keys and Values
    Can be arbitrary objects, with the sole condition that their encoding can be contained in a Java array. Practically, the keys should not be too large.

    The keys, values, and other fields are encoded using a platform neutral encoding scheme to be detailed below.

    The most interesting fields are the message payloads, especaially the keys and values that can have any type. Representing these fields is particularly challanging because they have to be represented in a platform neutral way.

    Encoding Data

    I have chosen an encoding that presents the length in bytes followed by the type of the field, then the data encoded as described by the first two fields. There are also alternatives which should also be investigated such as Google's protocol buffers.

    The length of field is a four byte big endian int, and usually counts the bytes of data, as well as the bytes giving the data type. However, for maps and arrays, the size field represents the number of entries. This allows maps and arrays to be encoded, and sent before the full size of the message is known. Each field within the map or array will encode it's own size allowing it to be decoded consistantly.

    The type is also an int (but might be expanded), where the type is encoded as a combination of bit fields.

    TypeValue
    ARRAY1
    BYTE2
    BOOLEAN4
    CHARACTER8
    COMPRESSED16
    DATE32
    DOUBLE64
    FLOAT128
    INTEGER256
    LONG512
    MAP1024
    PRIMITIVE2048
    SERIALIZED4096
    SHORT8192
    STRING16384
    STRINGBUFFER32768
    STRINGBUILDER65536

      An example will help clerify this. Consider how a Boolean true would be encoded.

    Length fieldType fieldData
    Byte 0-3Byte 4-7Byte 8
    00 00 00 0500 00 00 040x01

    An interesting variation is the representation of a primative byte. It is identical, except the primative bit is set in the type field.

     

    Length fieldType fieldData
    Byte 0-3Byte 4-7Byte 8
    00 00 00 0500 00 08 040x01

    We can tie together the encoding for the messages as well as for the fields to produce a full message.

    90 00 00 00 68 00 00 00 05 00 00 00 00 13 2F 43 6C 69 65 6E 74 52 65 67 69 73 74 72 61 74 69 6F 6E 00 00 00 27 00 00 20 00 31 30 31 38 2E 69 65 77 35 76 68 6E 43 46 79 4B 4B 4F 44 46 48 30 6A 58 57 53 61 30 4E 41 39 77 57 6A 38
    FieldValueMeaning
    Marker90This is a request
    Request Type00 00 00 68104, this is a get request.
    Request ID00 00 00 05This is the fifth request from this client.
    Request Status00This is a new request from a client.
    Cache name character count00 00 00 1319 characters, the cache name is always a string.
    Cache name2F 43 6C 69 65 6E 74 52 65 67 69 73 74 72 61 74 69 6F 6E/  C  l  i  e  n  t  R  e  g  i  s  t  r  a  t  i  o  n
    Key Byte Count00 00 00 27The key is contqained in 39 bytes
    Key Data Type00 00 40 0016384, this is a string
    The Key31 30 31 38 2E 69 65 77 35 76 68 6E 43 46 79 4B 4B 4F 44 46 48 30 6A 58 57 53 61 30 4E 41 39 77 57 6A 381  0  1  8  .  i  e  w  5  v  h  n  C  F  y  K  K  O  D  F  H  0  j  X  W  S  a  0  N  A  9  w  W  j  8

    Another interesting issue, is the performance of the server side routing.

    If there are x connections spread over n servers, we can  estimate the order of magnitude of the work done by each of the n servers.

    There will be 1n direct connections  to each server, and the other (1-1n)  connections will be scattered over the other servers.

    For requests comming in over the direct connections,  1n will be handled  directly. That is, the request will be read, and the  response written.

    1-1n requests will be routed  to another server. This involves an additional request-response  pair to forward the request. Thus handling these requests requires  roughoy twice the work as one handled directly.

    Of the requests directed to other servers,  1n of them will be  routed back to our server, which will handle them with  a single request reaponse pair.

    We can add these terms to see that the workload on any one server scales as

      1n * (1n + 2*(1- 1n)) + (1-1n) * 1n

      = 3n -2n2
    Which is just a little less than tripple the work that  would be done if the client directed the request to the  appropriate server initially. However, the solution still  scales predominantly with n