David's Model offers 5 patterns to follow, and these have been around for a while. They certainly aren't very comprehensive, and I think a few of them are due to the particulars and limitations of the Jackrabbit implementation. 
 
In particular Rule #4 "Beware of Same Name Siblings" is appropriate to be aware of them, but I think the last sentence is a bit extreme. Yes, SNS paths are less stable/durable, but I think they're okay in some situations. For example, they're no worse than generating a name to include some counter -- that simply doesn't scale beyond a single cluster. SNS indexes, on the other hand, are processed by the backend and are guaranteed to be incremental. (This is a PITA from an implementation perspective, but there is literally no way a JCR client can do this when operating against a clustered repository.)
 
Also, I've never really understood Rule #5 "References considered harmful". I think this is where the constraints/limitations of implementation (Jackrabbit) are sneaking into the design. So while it may be a pragmatic rule when dealing with Jackrabbit, IMO the caution this rule raises is completely unnecessary with ModeShape (and especially in 3.x, where references are actually cheaper than a parent-child relationship). Additionally, REFERENCE properties work amazingly well to overcome the disadvantages of SNS paths (see Rule #4), because a REFERENCE will always point to the same name regardless of that node's path.
 
The rest of the rules are pretty good.
 
I've started documenting some patterns and best practices in our 3.0 documentation (see here). Unfortunately, there's a number of empty pages there, but a few have some good content. Feel free to comment here or on the documentation pages if there are other patterns or use cases worth documenting.
 
I'll give you my take on some of your questions:
 
 
I have a document object that has 0 or more states attached to it. The state of a document is represented as a set of key/value pairs (probably the simple context from an apache SCXML state engine). I could model this in a few different ways.
 
- a node property with an array of STRING attribute one per document state machine being represented. The string value contains the identity of the state machine and the serialised context from the state machine representing the current document state
- 0 or more child nodes that contain two attributes, one is the identity of the state machine and the other is an array of STRING attributes, one per context variable, content something like "name=value".
- as in 2 but without the array and having 0 or more child nodes, one per context key/value, containing attribute name and value.
 
There are questions about what I need at runtime that can help select which of the above I need (ie: what do I need to search on) but I have no idea of the impact on storage or speed of access which are equally valid concerns when desiging the runtime model.
You definitely could store a serialized STRING property for each state machine. This works, would certainly store the content, and might be sufficient if your application accesses the content by reading it all. But this is far less beneficial if you need to query for the content and/or access it in parts. 
 
One of the nice features of JCR nodes is that you can treat the node structure as syntax trees. For example, consider how you might represent a Java source file in JCR. You could represent each source file as a node with a "content" STRING property, but then you can't search for it and applications that want to access it need to get the "content" property value and then parse the Java source. But the parsing is the hard part, so its something you want to do once. Rather than store the Java source as a single STRING property, you might want to parse the file to create a Java syntax tree structure using nodes. So you'd have a node for each type in a source file, and each type node would have properties that describe the type (e.g., the name; whether its an interface, class, or enum; the type's visibility, maybe JavaDoc, etc.), and nodes for each member in the type, again with properties describing each member. This is actually what our Java sequencer does. When a JCR client uploads an "nt:file" node named "*.java", our sequencer kicks in (if configured) and parses the Java source, creates the node structure, and saves that node structure in the workspace. From that point forward, any JCR client can query for the content (e.g., find all class or method-level annotations with a name matching some pattern) or directly access the "syntax tree" without having to parse the Java source. You can see more detail about the kinds of node types and mixins we use by looking at the CND file . The JavaMetadata class is used to parse the source and create an object representation of the syntax tree, and the ClassFileRecorder converts the syntax tree objects into JCR nodes.
 
Does the "simple context" refer to the kind of XML structure described in the last code section in http://en.wikipedia.org/wiki/SCXML#Examples ? In other words:
 
<scxml xmlns="http://www.w3.org/2005/07/scxml" version="1.0" initial="ready">
    <state id="ready">
        <transition event="watch.start" target="running"/>
     </state>
</scxml>
 
If so, another approach you might consider is to use the close nature of JCR structure to XML structure (see JCR import and Document View export for details), which would correspond to:
 
- A node for the "scxml" element, with properties for each XML attribute
- A child node for each state element, possibly named by the value of the "id" attribute or if order is important "state" and using SNS indexes. If the latter naming style is used, then you'd want to store the "id" attribute on an "id" property. Any remaining attributes (e.g., name-value pairs) would each be represented as a property on the node.
- A child node (under the state node) for the transition element. Again there are a couple of options for the name, and any remaining attributes (e.g., name-value pairs) would each be represented as a property on the node.
 
I think this is a little different than your option 2 above, since IIUC it seems to imply using a single property to hold multiple name-value pairs. It's also different than your option 3 above, which breaks the name-value pairs up but uses a single node for each name-value pair (which I think is overkill).
 
There are questions about what I need at runtime that can help select which of the above I need (ie: what do I need to search on) but I have no idea of the impact on storage or speed of access which are equally valid concerns when desiging the runtime model.
 
It's hard to gauge how performance is impacted without measuring it on your own data. In general, nodes are accessed individually, so the more nodes the more accesses. In practice, you may not really notice. For example, if you're storing the entire state as a big STRING property on one node, that might be accessed quickly, but then your application will likely have to parse that String value into usable parts. OTOH, if you've already broken up the information into multiple nodes with a property for each name-value pairs, then your application doesn't have to do anything but navigate and call getProperty(String). 
 
Another things to consider is versioning. If you want it, you probably want to version at the state machine node and version the entire subgraph altogether. This would correspond to an on-parent version (OPV) of COPY in the node type definitions.
 
Another modelling question is: How detailed and type specific should I go?
 
Ie: I don't actually need a (CND) model at all. nt:unstructured is a fine node type and I could just have an implicit model realised by the usage in the code. Maybe that is more efficient at runtime? 
 
I like type safety and explict models so I tend to avoid the unstructured approach, particularly for product development where the code has to live for a long time (the system I'm re-architecting is about 11 years old). What I don't know are the tradeoffs and costs associated with this in a JCR model.
Personally, I like using "nt:unstructured", which gives ultimate flexibility. But I also like using mixins to describe the sets of properties useful for a "facet", and then apply the facet to nodes that have that facet. For example, a "ex:describable" mixin might add a "ex:description" property. Similarly, a "ex:hashed" mixin might define properties for SHA-1, MD5, and SHA-128 hashes. Note that the facet mixins aren't required, since "nt:unstructured" means the node can have any property. But it gives the ability to say that a node "isa" something. Perhaps the best benefit is that all nodes with a mixin can easily be queried.
 
Hope this helps!