log in |
Message boards : Number crunching : Checkpointing?
Author | Message |
---|---|
Is checkpointing at reasonable intervals going to be implemented soon? I have to occasionally restart machines, and I can't have WUs running for hours on end with no checkpointing. I just had to abort a batch of the Long WUs because they were running too long with no end in sight (24+ hours on a couple of them without a checkpoint). | |
ID: 650 · Rating: 0 · rate: / Reply Quote | |
Is checkpointing at reasonable intervals going to be implemented soon? I have to occasionally restart machines, and I can't have WUs running for hours on end with no checkpointing. I just had to abort a batch of the Long WUs because they were running too long with no end in sight (24+ hours on a couple of them without a checkpoint). Checkpointing is every 10min except on Mac, there is no checkpointing yet. Pls note that the runtime is not saved there but fixed credits will compensate the missing time. | |
ID: 651 · Rating: 0 · rate: / Reply Quote | |
I don't see where there is Checkpointing on my Windows Machine, I forgot & shut BOINC off on 1 Box & when restarted the Wu's were reset to 0 elapsed time ... It's a Windows 8.1 Box ... | |
ID: 655 · Rating: 0 · rate: / Reply Quote | |
I don't see where there is Checkpointing on my Windows Machine, I forgot & shut BOINC off on 1 Box & when restarted the Wu's were reset to 0 elapsed time ... It's a Windows 8.1 Box ... Thats what I mean with time reset but only until the checkpoint. You can check your stderr.txt in slot folder. There is a restart at the last iteration point. | |
ID: 656 · Rating: 0 · rate: / Reply Quote | |
I think I see what you mean, one of the Wu's only ran for about 35 Min after restarting BOINC & then finished & got the full 1000 Point's ... The others are still running tough so will see how they do ... | |
ID: 657 · Rating: 0 · rate: / Reply Quote | |
Is checkpointing at reasonable intervals going to be implemented soon? I have to occasionally restart machines, and I can't have WUs running for hours on end with no checkpointing. I just had to abort a batch of the Long WUs because they were running too long with no end in sight (24+ hours on a couple of them without a checkpoint). Is there any plans for checkpointing for Mac? Long tasks are running ~7 hours on my Macbook pro. | |
ID: 727 · Rating: 0 · rate: / Reply Quote | |
Is checkpointing at reasonable intervals going to be implemented soon? I have to occasionally restart machines, and I can't have WUs running for hours on end with no checkpointing. I just had to abort a batch of the Long WUs because they were running too long with no end in sight (24+ hours on a couple of them without a checkpoint). Iam still looking for someone who is able to compile the wrappers for 32/64bit mac. | |
ID: 728 · Rating: 0 · rate: / Reply Quote | |
Is checkpointing at reasonable intervals going to be implemented soon? I have to occasionally restart machines, and I can't have WUs running for hours on end with no checkpointing. I just had to abort a batch of the Long WUs because they were running too long with no end in sight (24+ hours on a couple of them without a checkpoint). I have some compilers installed. If it is not too complicated I can try. ____________ | |
ID: 729 · Rating: 0 · rate: / Reply Quote | |
Is checkpointing at reasonable intervals going to be implemented soon? I have to occasionally restart machines, and I can't have WUs running for hours on end with no checkpointing. I just had to abort a batch of the Long WUs because they were running too long with no end in sight (24+ hours on a couple of them without a checkpoint). You only need to download the boinc source and run make. I will give you the wrapper source later today and some instructions. | |
ID: 730 · Rating: 0 · rate: / Reply Quote | |
HI, | |
ID: 757 · Rating: 0 · rate: / Reply Quote | |
HI, Only way to prevent computing time loss right now is to not mix long and short tasks I guess,,, ____________ | |
ID: 758 · Rating: 0 · rate: / Reply Quote | |
HI, Hi Rasputin42, as far as I know : - checkpoints do exist at SRBase (every 10' if I remember well) despite the fact that the elapsed time is reset to 0 each time a task is restarted from suspended mode (we've been told that there's no way to fix this problem at the moment -not due to this specific project-) - maybe you could check your Boinc Manager options (Tools/Computation Preferences) and increase the value for the 'Switch application every XXX minutes' field to prevent from jumping too often between tasks | |
ID: 759 · Rating: 0 · rate: / Reply Quote | |
@Rasputin42: | |
ID: 760 · Rating: 0 · rate: / Reply Quote | |
The task has been running for about 1.5h.(Sierpinski/Riesel Base - long0.01. | |
ID: 764 · Rating: 0 · rate: / Reply Quote | |
The task has been running for about 1.5h.(Sierpinski/Riesel Base - long0.01. Hmm, I have no problems with the longrunners, restarted BOINC several times and restarted from the last checkpoint. If there is nothing in the stderr.txt then something must be wrong on this computer. | |
ID: 765 · Rating: 0 · rate: / Reply Quote | |
To avoid losing work due to task switches, you can set the flag "Leave applications in memory while suspended". | |
ID: 766 · Rating: 0 · rate: / Reply Quote | |
To avoid losing work due to task switches, you can set the flag "Leave applications in memory while suspended". i really doubt that this will work here. you either prevent boinc from switching tasks or stick to the short queues. | |
ID: 768 · Rating: 0 · rate: / Reply Quote | |
Message boards :
Number crunching :
Checkpointing?