From rns@fore.com Thu Feb 3 07:34:57 2000 Date: Thu, 18 Mar 1999 13:56:55 -0500 From: Bob Sidebotham To: Steven Knight Subject: Re: Cons doing parallel builds Hi Steve, I've enclosed a copy of a message I sent to John Gibson. I'd love it if you did this work; I was very impressed with the previous work you did, and the care you took to get it right. How's it going with your remote repository, BTW? Are you using it? Bob You said: >Bob-- > >I'd be glad to take a look at it and give it a try. If that >works for you, let me know how you want to proceed. > > --SK > >On Thu, 18 Mar 1999, Bob Sidebotham wrote: > >> I did build a version of Cons a couple of years ago that could >> multiple targets in parallel (with a -j flag). I'd be happy to give >> that version to anyone who wanted to try to hack the changes into >> the current version of cons. The changes were quite simple, and >> it *mostly* worked, but I recall there was some minor bug. It should >> not be too difficult to port the changes into the current version. To: jgibson@lexmark.com Subject: parallel cons Date: Thu, 18 Mar 1999 13:51:25 -0500 From: Bob Sidebotham Hi John, As I said, this was not a finished version, and it was applied to a fairly old version of cons. In the tar file, you'll find cons, cons.multi, cons.checkpoint, and cons.checkpoint.2. I'm not sure what the state of all of these is, but I can tell you that cons was the base, and it appears that cons.multi was the derived version. I assume that I was doing some further work, to create cons.checkpoint and cons.checkpoint.2. Your best bet, intially, is probably cons.multi, and you can figure out what changed by diffing against cons. The files Rmx.pm, rma, rmx, draft.rmx all have to do with the remote execution model (which is possibly what cons.checkpoint* were about, I'm not sure). The version of cons here is quite old, so some small amount of work would have to be done in order to port the changes. If you do do this work, we'd appreciate having the results back, of course! And BTW, when I said that the solution was not multi-threaded, I didn't mean to imply that it didn't fork of a separate process to do each step--it does do that. It's just that the cons build engine, itself, is not multi-threaded. If it was, some additional degree of parallelism could be obtained. In the file "stats", you'll see that I did manage to get a multi-processor machine using both processors. That was a couple of years ago, back in the sparc 10 days. We also have NTAP servers, which are very fast (we have ours attached to 100MB Ethernet, gatewayed into ATM), and I think you could do a number of parallel build steps before thrashing occured. The NFS issue is basically that when you stat an NFS file, you don't see the currently modified time if you have recently stat'd it. This is an issue if you implement remote builds: the model is that there is one process directing the build; this process farms out builds of particular items to slave processes somewhere. When each step is finished, then the master is notified and it does things like update the signature files (the .consign files). The .consign has the last modificadtion time of the file in it, and that must be correct. It turns out that you can force the modification time to be updated simply by opening the file for read. The master must therefore do this (in the remote execution module within cons) before updating the .consign file. Everything else will work: any other step that requires the output of an operation on some other machine will get the up-to-date results because that machine will open the file; it also doesn't care about the modification time--it's jut told by the master to use the file. I've thought about this and tested it, and this model *will* work. The reason I started writing the rmx module is that I wanted something that would respond instantly to a compilation request. I found that the latency for rsh was too high for what I wanted to accomplish. I got a bit bogged down in the fact that there were various administrative issues that would have to be solved: you probably want a proxy process that you can talk to to configure--add/delete machines--for your cons job. You need a way to detect that a machine has "gone west", and be able to remove it from the pool, reschedule that job step, etc. Let me know if you're planning to do anything. Thanks, Bob [UUENCODED TAR FILE DELETED]