1 2 Previous Next 15 Replies Latest reply on Nov 22, 2005 3:03 PM by adamw

Commiting

adamw Nov 18, 2005 4:28 PM

Hello,
here's how I think saving and updates may be (re)implemented.
Firstly, new content is never written directly to the WC. On a save, we don't do delayed commits anymore (or create a temporary wc :) ), but send the data directly to the repository - tmate has such function. Not to take too much memory, we may write the data to a temporary file first (if the user calls, for example, node.getOutputStream()), and later transmit it to the repo (in a way, it can be seen as some sort of temporary wc).
To preserve data integrity (that is, after writing data, getting a node and reading, we'd like to get the same content), the wc must be updated immediately after the commit - but only on the commited paths of course. This must be synchronized with updates of the whole repository - and these may be slow. So, we need a way to minize the time of these updates.
First optimization, is to poll the repository for the latest revision number, and update only if it has been changed. Secondly, are the commit hooks. If we can get notified of changed paths, we can update only these paths, and never do a full update (maybe with the exception of startup).

Another solution for preserving data integrity, is not to do an update after a commit, but store the file's path in some set in service, and on reading that node, check if it in the "dirty" set and if so, not read it from the wc but from the repository. The file would be removed from the "dirty" set upon an update.

So, what do you think?

--
Cheers,
Adam

1. Re: Commiting

soshah Nov 18, 2005 5:26 PM (in response to adamw)

"adamw" wrote:
Firstly, new content is never written directly to the WC. On a save, we don't do delayed commits anymore (or create a temporary wc :) ), but send the data directly to the repository - tmate has such function.

Being able to send the data to the repository directly without any dependency on the .svn files, solves everything. That is the cleanest approach. Temporary wc is only so that you don't have to mess with the main WC.

BTW- do you know how to send the content over directly. Is it part of their new API or I just missed that function in the API.
Actions
2. Re: Commiting

dsicore Nov 18, 2005 5:32 PM (in response to adamw)

Please walk through a 'conflict' scenario.
Actions
3. Re: Commiting

soshah Nov 18, 2005 5:32 PM (in response to adamw)

"adamw" wrote:

First optimization, is to poll the repository for the latest revision number, and update only if it has been changed. Secondly, are the commit hooks. If we can get notified of changed paths, we can update only these paths, and never do a full update (maybe with the exception of startup).

Polling for the latest revision number and updating only if changed does not buy you much. When you do an update, it only downloads the content that has changed in the repository between the previous update. If nothing has changed, I believe the update does nothing.

"Commit Hooks", is there a standard API way of doing this or would we need to use a monitoring script and a webservice in our system to receive the notifications?
Actions
4. Re: Commiting

soshah Nov 18, 2005 5:38 PM (in response to adamw)

"adamw" wrote:

Another solution for preserving data integrity, is not to do an update after a commit, but store the file's path in some set in service, and on reading that node, check if it in the "dirty" set and if so, not read it from the wc but from the repository. The file would be removed from the "dirty" set upon an update.

This should be easy, but this technique would run into issues in a clustered environment, unless the Set is stored in some sort of a clustered cache like JBossCache.
Actions
5. Re: Commiting

soshah Nov 18, 2005 5:41 PM (in response to adamw)

I think a "Commit Hook" would be the cleanest approach - Standard or Non-Standard.
Actions
6. Re: Commiting

soshah Nov 18, 2005 5:53 PM (in response to adamw)

"damon.sicore@jboss.com" wrote:
Please walk through a 'conflict' scenario.

First, since the main WC is never written to by the application, except being updated from a repository by a background thread, it should "never" be conflicted [unless there is a scenario I am not thinking about here]

Now, say you trying to update a document "x" from the website. A potential conflict could occur here-

1) An external client updated "x", so the content you are trying to commit to the repository is outdated. In which case, your save method will receive an error and you can handle it appropriately in the application. If you beat the external client to it, the external client will receive and error. In either case your WC will still not be conflicted.

2) More than one concurrent threads try to update content from the website, in which case its a race condition and only one of them wins, the others will receive the same "exception" as in the external client scenario.

btw- the conflict scenarios will behave the same way whether you implement with a temporary wc approach or direct save on the repository approach. I think the direct save to the repository is a better approach.

I may have missed some conflict scenarios, but this is a good starting point.
Actions
7. Re: Commiting

adamw Nov 18, 2005 6:09 PM (in response to adamw)

1. You can commit to straight to the repository (you can do almost everything without a wc) using an "SVNEditor" - you can see examples of how to do it and more on tmat'es web site:
http://tmate.org/svn/kb/examples/index.php
2. I think Sohil already answered the conflicts question :).
3. About polling and updating - on update, doesn't svn browse the whole tree to check for modifications? It's just my observations on doing an "svn update" operation on my normal wc, it always lasts a long time, even if nothing is changed.
4. I don't quite see the problems with clustering ... on each computer, shotoku will be autonomous (until maybe there is some shotoku-cluster-wide-locking implemented), so will be its service. Unless I don't know something about ejb3 services :).
5. Commit hooks - I don't think you can add this from java. But I haven't investigated yet.

--
Cheers,
Adam
Actions
8. Re: Commiting

dsicore Nov 18, 2005 6:15 PM (in response to adamw)

Well.. if you can do everything that you can do in a working copy... to resolve a conflict, I'm fine with that.

We'll just need to implement some sort of "sticky sessions."

And.. the hook scripts are simply that--a bash script calling a java program. You will have to call into Shotoku via an exported service.
Actions
9. Re: Commiting

soshah Nov 18, 2005 6:49 PM (in response to adamw)

"adamw" wrote:

4. I don't quite see the problems with clustering ... on each computer, shotoku will be autonomous (until maybe there is some shotoku-cluster-wide-locking implemented), so will be its service. Unless I don't know something about ejb3 services :).

Right, so each shotoku instance in the cluster will have a copy of this "Dirty Set". These sets will go out-of-sync unless this Set is shared within the cluster through JBossCache or a "Stateful Session Bean"(assuming EJB3 handles clustering out-of-the-box).
Actions
10. Re: Commiting

dsicore Nov 18, 2005 6:51 PM (in response to adamw)

Now.. that is interesting... a stateful session bean with the reference to the dirty node... hmm..
Actions
11. Re: Commiting

adamw Nov 21, 2005 9:19 AM (in response to adamw)

"sohil.shah@jboss.com" wrote:

Right, so each shotoku instance in the cluster will have a copy of this "Dirty Set". These sets will go out-of-sync unless this Set is shared within the cluster through JBossCache or a "Stateful Session Bean"(assuming EJB3 handles clustering out-of-the-box).

Yes, they probably will go out-of-sync. What problems does this cause?
Actions
12. Re: Commiting

soshah Nov 21, 2005 11:03 AM (in response to adamw)

It will cause inconsistency in the way data integrity works.

The point of the "Dirty Set" is to tell the Shotoku instance to load a fresh document from the repository as opposed to the dirty one in the wc.

Now in this scenario- Say there are two machines "A" and "B" that form a clustered environment

1) Client "x" updates a document. The 'Dirty Set" of the cluster instance "A" that processed the update request knows that the document just updated is dirty. In this case the 'Dirty Set" of cluster instance "B" does not know that the document is dirty

2) Client "y" tries to read the document, If this request is processed by instance "B", client "y" receives the dirty document from wc instead of the updated document.

Again, its not that big a deal cause on the next background repo update cluster B's wc will be updated. But, in that case we don't need to introduce the complexity of "Dirty Set" stuff anyways, cause you can just wait for the update of the wc
Actions
13. Re: Commiting

adamw Nov 21, 2005 11:50 AM (in response to adamw)
Yes, but I think that this scenario isn't harmful. If we use sticky sessions then it shouldn't cause any problems. However, I think it is important for this sequence to work as expected:

Node n = cm.getNode("xxx"); n.setContent("z"); n.save(); assertTrue("z".equals(cm.getNode("xxx"));

And it will work with either: - dirty sets, - update of a given path immediately after a commit, - some other idea that we haven't thought about yet :).
Actions
14. Re: Commiting

soshah Nov 21, 2005 12:45 PM (in response to adamw)
"adamw" wrote:

Node n = cm.getNode("xxx"); n.setContent("z"); n.save(); assertTrue("z".equals(cm.getNode("xxx"));

In a clustered environment with sticky session, the above sequence breaks as follows before a background wc update ofcourse:

<execution inside cluster instance A> Node n = cm.getNode("xxx"); n.setContent("z"); n.save(); </execution inside cluster instance A> <execution inside cluster instance B> assertTrue("z".equals(cm.getNode("xxx")); </execution inside cluster instance B>

Basically cm.getNode("xxx") will not return the value "z" for all sessions "stuck" to cluster instance B.

The assert works fine for all sessions "stuck" to cluster instance A.

In my opinion, application logic should yield the same result regardless of whether the environment is clustered or not and independent of what clustering strategy is employed.

Anyways, the issue is an implementation detail on the scope of the "Dirty Set". Making the "Dirty Set" cluster-aware should solve the problem. I don't think this is that big a deal and can be done easily using JBossCache I believe.

Maybe shotoku can be implemented in two phases:

First the non-clustered "Dirty Set" implementation and later
a cluster-friendly "Dirty Set" implementation.
Actions

1 2 Previous Next

Go to original post