StructureModificationChecker and Synch with VFS3
johnbailey Dec 21, 2009 6:06 PMOne of the remaining items to get resolved in deployers-vfs VFS3 integration is how to determining if an archive has been modified, and how to synch changes with temporary copies. Determining whether a VFS file tree has been modified is not a problem since it is relying on the underlying filesystem to manage the modification times. The real problem is how to deal with copies of either exploded archive, or copies of VFS roots. Once an file/archive/directory has been mounted into VFS location there is no way to get access to the original filesystem resources without unmounting.
Example of mouting of /deploy/test.war:
1. Expand content of archive into temporary location
2. Mount temporary location as /deploy/test.war
From this point on there would be no way to easily get the VirtualFile for /deploy/test.war as that VFS location would point to the mounted temporary location. So in the archive case we need some way to hold onto or get access to the original file at /deploy/test.war to track modifications. In the case of copying an existing directory structure, we need some way to track changes to the original directory structure, and this would likely require the original to be mounted into a separate (somewhat secret) location in the VFS..
An idea was to manage this as part of the Automounting process. So when the Automounter is called it will maintain the references to the original files needed, and allow some mechanism to get modification information from the originals. The goal would be to not allow external access to the original files.
There was some discussion on #jboss-dev:
[12:47pm] baileyje: Nihility: yeah. I am going to create a forum to discuss with Ales. There are some facilities that seem to exist only to keep the changes to temporary file systems up to date in VFS. This should not have a need in VFS3.
[12:48pm] dmlloyd: those need to be rewritten
[12:48pm] dmlloyd: that stuff is based on the broken notion of VFS tracking modified time
[12:49pm] dmlloyd: it's actually up to the interested parties
[12:49pm] dmlloyd: so each thing tracking the modification time of a file needs to remember its own last modified time
[12:50pm] Nihility: right also
[12:50pm] Nihility: some of that may be related to a feature
[12:50pm] Nihility: with copying and exploded deployments
[12:51pm] Nihility: there is thing thing where if you deploy and exploded deployment
[12:51pm] Nihility: it makes a copy
[12:51pm] Nihility: and then monitors files to copy to
[12:51pm] Nihility: and then monitors files to copy to the copy
[12:51pm] baileyje: Yeah. That seems like the primary purpose.
[12:52pm] baileyje: So if the real copy is update the temp copy is updated.
[12:53pm] Nihility: hmm this brings up an interesting problem
[12:53pm] Nihility: once you mount the copy to the original's location
[12:53pm] Nihility: you can no longer access the original
[12:54pm] baileyje: Right. Not without unmounting, etx
[12:54pm] baileyje: etc
[12:54pm] dmlloyd: you can mount its parent somewhere else
[12:54pm] Nihility: dmlloyd: any ideas for how to deal with that
[12:54pm] dmlloyd: I'd take a two-way approach
[12:55pm] dmlloyd: first, if it's an archive, get the physical file before mounting and use that to check the mod time of the archive
[12:56pm] Nihility: doesnt the time on the mount
[12:56pm] Nihility: reflect that though
[12:56pm] dmlloyd: second, if it's a copy of an exploded archive (for whatever reason), then you're best off creating a mount in a tmp location for the "original" and using that to track updates and copy to the other mount
[12:56pm] dmlloyd: Nihility: I don't think it does, no, because if the file is deleted the mount continues to work
[12:56pm] dmlloyd: (because it's operating off of a copy)
[12:57pm] Nihility: ah right
[12:57pm] dmlloyd: as a side note, I'd make sure that whatever monitors archives gives it about 500ms to "settle" before it decides to do anything about a change
[12:58pm] dmlloyd: in case someone does a non-atomic copy to replace the archive
[12:58pm] dmlloyd: in other words, don't do anything until the last mod time hasn't changed for the past 500ms
[12:59pm] baileyje: dmlloyd: in the archive case, who holds onto the physical file to track the changes?
[12:59pm] dmlloyd: whoever is interested in it, I guess
[1:00pm] dmlloyd: for the automount case perhaps it's just a question of having support methods which handle that
[1:00pm] dmlloyd: it's mainly for redeploy detection so I guess HDScanner or whatever its current moral equivalent is
[1:00pm] Nihility: it would be nice if there was an easy way
[1:01pm] Nihility: to get the original from the VF of the mount
[1:01pm] Nihility: i suppose that would be using the FileSystem api directly
[1:01pm] dmlloyd: you can get the physical file for a node but that's usually a copy
[1:01pm] Nihility: your saying though that you can mount the parent directory
[1:01pm] Nihility: say
[1:01pm] dmlloyd: a filesystem doesn't necessarily have an "original"
[1:01pm] Nihility: deploy/
[1:01pm] Nihility: into tmp/foo
[1:02pm] dmlloyd: yeah you could
[1:02pm] Nihility: and then mounts underneath deploy would not be affected right?
[1:02pm] dmlloyd: or you could mount the deployment directory somewhere else too (then it's not a copy)
[1:02pm] dmlloyd: the only time a RealFileSystem mount is a copy is when you manually copy it first
[1:02pm] Nihility: well
[1:02pm] dmlloyd: which would be the seam use case, where stuff has to magically hang around after it's been deleted for a while
[1:02pm] Nihility: the question i am really asking
[1:03pm] Nihility: is what happens when you do
[1:03pm] Nihility: VFS.findChild("/deploy/foo.ear");
[1:03pm] Nihility: oops
[1:03pm] Nihility: VFS.findChild("/secret/tmp/location/deploy/foo.ear");
[1:03pm] dmlloyd: you'll get the original foo.ear
[1:04pm] dmlloyd: mount points are absolute, so if you make a dupe mount of a parent, submounts are not cloned
[1:04pm] Nihility: ok so the other thing
[1:04pm] Nihility: is that you might be asked to deploy something anywhere
[1:04pm] Nihility: it might not be in deploy
[1:04pm] Nihility: for example
[1:05pm] Nihility: so really the need is to have a original for every individual deployment
[1:05pm] dmlloyd: yeah, you don't really want to mount the deploy dir as a solution to that problem
[1:05pm] Nihility: there is also the problem of name conflict then
[1:05pm] Nihility: because you could theoretically
[1:05pm] Nihility: do deploy/bob.war
[1:05pm] Nihility: deploy2/bob.war
[1:06pm] bobmcw: taking my name in vain?
[1:06pm] Nihility: hahaha
[1:06pm] dmlloyd: if you're mounting a *copy* of an exploded archive, you will: (a) make a complete copy in a temp location (b) mount the original in a temp location (c) mount the copy in the original location
[1:06pm] smarlow_away is now known as smarlow.
[1:06pm] dmlloyd: then you can monitor the mounted original, and port over changes at your discretion
[1:06pm] Nihility: right the key though
[1:07pm] Nihility: is that that temp location
[1:07pm] dmlloyd: now I don't think we're really going to be mounting copies of exploded archives though
[1:07pm] Nihility: resolve name conflicts
[1:07pm] dmlloyd: yeah, it does
[1:07pm] dmlloyd: it generates random names with a static base, so you'll get e.g. tmp-8903214-bob.war
[1:07pm] dmlloyd: for example
[1:07pm] dmlloyd: you can mount 9000 bob.wars and never get a conflict
[1:07pm] Nihility: sure
[1:07pm] Nihility: but in that case
[1:07pm] Nihility: you have to keep a ref around
[1:08pm] dmlloyd: yes, of course
[1:08pm] Nihility: you wont be able to find these things by name
[1:08pm] dmlloyd: you'll need the VirtualFile of your original
[1:08pm] Nihility: of the "mounted" original
[1:09pm] dmlloyd: right
[1:09pm] Nihility: the current code makes copies of exploded dirs
[1:09pm] Nihility: we could optionally not do that
[1:09pm] dmlloyd: ok, so the automounter will have to do that then
[1:09pm] dmlloyd: yeah it should be configurable
[1:09pm] Nihility: thats what jboss4 did actually
[1:09pm] dmlloyd: I guess making a copy is most "correct" at handling undeploy/redeploy
[1:10pm] Nihility: right
[1:10pm] dmlloyd: but you still need a way to port changes over
[1:10pm] Nihility: so i was thinking porting is probably the best
[1:10pm] dmlloyd: if we can avoid reinventing that, it'd be a good thing
[1:10pm] dimitris_jboss: sorry to jump in - copying solves a lot of the windows locking problems, too...
[1:10pm] dmlloyd: dimitris_jboss: acknowledged
[1:11pm] Nihility: yes thats a good point
[1:11pm] Nihility: a user cant actually delete an exploded dir
[1:11pm] baileyje: dmlloyd: are you imaging the Automounter makes the copy of the exploded dir and holds onto both the temp and original handle?
[1:11pm] Nihility: if we have file refs all over the place
[1:12pm] dmlloyd: it would have to, baileyje, since we can't keep deployment data on the deployment like sane people
[1:12pm] dmlloyd: what would be really nice would be to have all these deployments be modules, first-class entities, then our module system could handle all this shit
[1:12pm] dmlloyd: and GC could drive unmounting
[1:13pm] Nihility: thats really what the VFSDeployment is supposed to be
[1:13pm] baileyje: dmlloyd: right. I also don't like the idea of holding onto a physical File object. So the super secret mount approach may be best.
[1:13pm] dmlloyd: holding on to the physical file is the only way, with archives. But don't worry about it too much, it doesn't really retain any OS resources.
[1:13pm] Nihility: but yeah the nice thing about putting it on automounter
[1:13pm] Nihility: is that everything does it correctly
[1:13pm] Nihility: just by using it
[1:14pm] baileyje: Nihility: right.
[1:14pm] Nihility: haha i still cant beieve you guys are calling it automount though
[1:14pm] Nihility: i guess its "somewhat" automated
[1:14pm] baileyje: Anyone looking to use an archive, be it in zip or exploded format, just requests the mount.
[1:15pm] dmlloyd: with archives the detailed process is: (a) copy the archive to a temp location (b) retain the physicalFile of the archive (c) mount the temp location over the archive's location
[1:15pm] dmlloyd: then monitor the physical file for changes.
[1:16pm] baileyje: dmlloyd: Yeah. We are going to need to expose of ability for the deployer code to get access to the physical to get change data.
[1:16pm] Nihility: it might also be nice to not expose the File
[1:16pm] Nihility: keep people from using that to do stupid things
[1:16pm] baileyje: Nihility: yeah. Really just the VirtualFile.
[1:17pm] dmlloyd: well yeah, for automounted archives the outward API should stay the same regardless of whether it's an exploded/unexploded copied/uncopied archive
[1:17pm] baileyje: Nihility: We could expose it, but log out horrible messages every time someone calls it.
[1:17pm] Nihility: i can totally see someone asking for the original handle
[1:17pm] Nihility: then opening it
[1:17pm] Nihility: and therefore causing windows locking issues
[1:17pm] Nihility: haha
[1:18pm] Nihility: although we could just add a note in the javadocs
[1:18pm] Nihility: to say
[1:18pm] Nihility: "dont do that"
[1:19pm] dmlloyd: maybe something like : class Automounter { ... public static void trackChanges(VirtualFile relativePath, ChangeTracker t); } interface ChangeTracker { void modified(VirtualFile original, VirtualFile updated); }
[1:19pm] dmlloyd: I guess that wouldn't work so well for archives, since it would say the whole archive changed, but maybe that's OK too
[1:20pm] dmlloyd: anyway the real requirements should be figured out from the HD code
[1:21pm] Nihility: yeah some kind of specialized class
[1:22pm] Nihility: that meets the needs of HD
[1:22pm] Nihility: would be a good way to prevent misuse
[1:22pm] baileyje: So the current modification checker is really just a visitor that crawls the VirtualFile and checks all children. Well it checks meta-data first.
[1:23pm] baileyje: We really don't want to hand the secret root over and have them crawl it do we? So the ChangeTracker seems like it would be ideal to keep that from happening. I will take a look at the HD code and see how the current checker is being used.
[1:24pm] Nihility: so there are really two scenarios
[1:24pm] Nihility: we have one case which scans deploy directories
[1:24pm] Nihility: that will work as is
[1:24pm] Nihility: since the deploy -> real filesystem
[1:25pm] baileyje: Nihility: right.
[1:25pm] Nihility: monitoring changes in a exploded deployment
[1:25pm] Nihility: is different though
[1:25pm] Nihility: hmm actually i take back scenario 1
[1:26pm] Nihility: there is still the problem david mentions earlier
[1:26pm] Nihility: which is that mod time is going to be reported by the copy
[1:26pm] Nihility: in a non-expoloded case
[1:26pm] dmlloyd: in a non-exploded case you only care about the mod time of the archive itself
[1:26pm] dmlloyd: you just need a sensible way to express that
[1:27pm] Nihility: so both scenarios need adjustment
[1:27pm] Nihility: the first scenario could just ask for the mod time for a given mount
[1:27pm] dmlloyd: the detailed processes I described should be able to accommodate update checking for each type
[1:27pm] Nihility: and automounter (or a special inner class) could give it
[1:28pm] Nihility: the second scenario on the other hand
[1:28pm] Nihility: is going to require recursing into the dir
[1:28pm] Nihility: which will likely be very different code
[1:29pm] dmlloyd: yeah, but it should be possible to derive a common interface
[1:29pm] baileyje: There is also the case of a non archive deployment, -ds.xml. Unless that is handled very dfferent.
[1:29pm] dmlloyd: I see clebert stopped by for his usual 12 seconds
[1:30pm] dmlloyd: we won't be automounting those, so it's not our problem
[1:30pm] Nihility: right i suppose at the end of the day
[1:31pm] Nihility: the scanner wants to know
[1:31pm] Nihility: what the list of changes are
[1:31pm] Nihility: and then somehow act on them
[1:31pm] Nihility: in pretty much all scenarios
[1:31pm] dmlloyd: yeah, adds, removes, replacements, on a file or whole deployment level
[1:32pm] dmlloyd: that basic api would let us add all sorts of stuff in the future - class reloading using instrumentation for example
[1:32pm] dmlloyd: if we do go down that road ever
[1:33pm] dmlloyd: maybe just add/remove really
[1:33pm] dmlloyd: add/remove file/deployment-root
[1:33pm] Nihility: gavin had a feature request
[1:33pm] Nihility: he sent in awhile back
[1:33pm] Nihility: that we support delaying redeployment
[1:33pm] Nihility: until the first request comes in
[1:33pm] dmlloyd: enum Action { ADD, REMOVE } enum Range { FILE, DEPLOYMENT_ROOT }
[1:34pm] dmlloyd: Nihility: should be easy enough to support, IF he gets sponsorship for it
[1:34pm] Nihility: yeah it hasnt been decided if it will be done
[1:34pm] Nihility: but
[1:34pm] Nihility: it does point to keeping copies
[1:34pm] dmlloyd: yeah
[1:34pm] Nihility: and having a generic change tracking mechanism
[1:35pm] dmlloyd: maybe we still want to have an option to disable copies though, for production environments
[1:35pm] dmlloyd: iow disable hot deployment
[1:36pm] dmlloyd: well HDScanner anyway
[1:36pm] Nihility: yeah that could be interesting
[1:36pm] Nihility: btw some history on this
[1:36pm] dmlloyd: should cut down startup time a little
[1:36pm] Nihility: the reason for the feature request, is to prevent someone using an ide changing stuff and having jboss bounce all around
[1:36pm] Nihility: after a file changed
[1:37pm] dmlloyd: yeah makes sense
[1:37pm] Nihility: in jboss4 only touching the TLD would trigger a bounce
[1:37pm] Nihility: in 5 touching *any* file
[1:37pm] Nihility: causes it to bounce
[1:37pm] Nihility: and in 5.1 (i think)
[1:37pm] dmlloyd: see, that's silly though. The deployers should know what files are important to track.
[1:37pm] Nihility: it was changed back to TLKs
[1:37pm] Nihility: TLDs
[1:37pm] Nihility: but this isnt the best indicator either
[1:38pm] Nihility: glassfish created a special file
[1:38pm] Nihility: i think its called
[1:38pm] Nihility: "redeploy"
[1:38pm] Nihility: or ".redeploy"
[1:38pm] Nihility: something like that
[1:38pm] Nihility: which is much more reasonable
[1:38pm] dmlloyd: some changes don't require a redeploy but still need handling, like copying class or resource files over to the copy from the HD source
[1:38pm] dmlloyd: yeah a deploy file is a good idea
[1:38pm] dmlloyd: maybe making a ".deployed" file that you have to *remove*
[1:39pm] bstansberry is now known as bstans_lunch.
[1:39pm] Nihility: the only bad thing about it is that it would mean jboss-ide would have to add a button
[1:39pm] Nihility: "redeploy" or something
[1:39pm] dmlloyd: or it could intelligently decide for you
[1:39pm] Nihility: which could be annoying if you forget to touch it
[1:39pm] Nihility: the first request idea though is interesting
[1:40pm] Nihility: because by then you probably did want the changes to happen
[1:40pm] Nihility: although the bad aspects
[1:40pm] Nihility: would be accidental triggering
[1:40pm] Nihility: (AJAX script gone awry)
[1:41pm] dmlloyd: and of course the slow first request
[1:41pm] Nihility: right
[1:41pm] Nihility: the big problem with TLDs is now multiple descriptors
[1:41pm] Nihility: are in modern deployments
[1:41pm] Nihility: in servlet 3 you have this whole web-fragment descriptor
[1:42pm] Nihility: in addtion to web.xml
[1:42pm] Nihility: also there is faces configs
[1:42pm] Nihility: etc etc
[1:42pm] Nihility: so is touching web.xml really the best way to trigger a redeploy?
[1:42pm] Nihility: so imo i like the redeploy file better
[1:42pm] dmlloyd: I like the file idea
[1:42pm] dmlloyd: yeah
[1:43pm] dmlloyd: though I think a marker file to indicate that something was picked up for deployment is better than a file to tell it to redeploy
[1:43pm] dmlloyd: because you can't really tell if something was deployed or not with the latter
[1:44pm] dmlloyd: you'd still need the notifications anyway though, like I mentioned, to copy resources over
[1:44pm] Nihility: yes definitely
[1:45pm] Nihility: its just the time you copy is different
[1:45pm] Nihility: its also confusing because users expect particular behavior
[1:45pm] Nihility: for example
[1:45pm] Nihility: they expect that a .html file
[1:45pm] Nihility: should copy across instantly always
[1:45pm] Nihility: no redeploy
[1:45pm] dmlloyd: yeah
[1:45pm] dmlloyd: it would be up to the individual deployers to implement though
[1:46pm] Nihility: yeah so like the rule should probably be
[1:46pm] Nihility: copy all files
[1:46pm] Nihility: and if a magic file is in that list
[1:46pm] Nihility: then do something
[1:46pm] baileyje: https://svn.jboss.org/repos/jbossas/projects/profileservice/trunk/spi/src/main/java/org/jboss/profileservice/spi/ModificationInfo.java
[1:46pm] dmlloyd: ah, of course this would be duplicated in profileservice...
[1:47pm] baileyje: It looks like that is the only place I see it used. The profile service is delegating to the modification checker from deployers
[1:47pm] dmlloyd: the thing they'd be missing there is, for modifications, granting access to the before and after files
[2:04pm] baileyje: dmlloyd, Nihility: Ok. So profileservice is driving these checks. https://svn.jboss.org/repos/jbossas/trunk/system/src/main/java/org/jboss/system/server/profileservice/repository/HotDeploymentRepository.java
[2:04pm] dmlloyd: you mean, profileservice has code to do these checks. It's anyone's guess whether that code is being used
[2:05pm] Nihility: it is being used
[2:06pm] baileyje: It delegates to the StructureModificationChecker from the deployers code. I can not find any other reference to it. seems to be the only hot deplyment component as of M1
[2:12pm] baileyje: The problem I see is the delete and add cases are being determined by the profilesevice and the modfication case is being handled by the deployer. So the profile service would have to be able to check the original for existence to see if the deployment was removed, and the deployer would need the original to search for modifications.