Do "Sierpinski / Riesel base long" checkpoint?
log in

Advanced search

Message boards : Number crunching : Do "Sierpinski / Riesel base long" checkpoint?

Author Message
Drago75
Send message
Joined: 29 Mar 21
Posts: 6
Credit: 7,005,275
RAC: 0
Message 7608 - Posted: 27 May 2021, 14:39:04 UTC

Hello everyone. I couldn't find information on that in this forum, if I over looked it than please excuse me. My Ryzen 9 is running a bunch of the long units under Windows and according to the properties of every individual task they don't checkpoint. Is that correct? Thanks
____________

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7255
Credit: 42,729,227
RAC: 2
Message 7609 - Posted: 27 May 2021, 15:11:39 UTC - in response to Message 7608.
Last modified: 27 May 2021, 15:13:56 UTC

Hello everyone. I couldn't find information on that in this forum, if I over looked it than please excuse me. My Ryzen 9 is running a bunch of the long units under Windows and according to the properties of every individual task they don't checkpoint. Is that correct? Thanks


Checkpointing is every 10min if you reach the first bit. You could check your stderr.txt in slot folder. The runtime will not be saved after a restart.

https://srbase.my-firewall.org/sr5/forum_thread.php?id=6&postid=14

Drago75
Send message
Joined: 29 Mar 21
Posts: 6
Credit: 7,005,275
RAC: 0
Message 7610 - Posted: 28 May 2021, 6:43:53 UTC - in response to Message 7609.

Hello rebirther. Last night I turned off my computer after 10 hours of crunshing the long wu's hoping that I was able to continue where I was left off. Unfortunatetly that wasn't the case. This morning all 24 wu's started from scratch. That was dissapointing. I couldn't find a way of copy and pasting them here for reference so I just give you one example. Maybe they need to be looked at to see what went wrong. S340_900-950k_wu_2610_0
____________

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7255
Credit: 42,729,227
RAC: 2
Message 7611 - Posted: 28 May 2021, 7:49:20 UTC - in response to Message 7610.

Hello rebirther. Last night I turned off my computer after 10 hours of crunshing the long wu's hoping that I was able to continue where I was left off. Unfortunatetly that wasn't the case. This morning all 24 wu's started from scratch. That was dissapointing. I couldn't find a way of copy and pasting them here for reference so I just give you one example. Maybe they need to be looked at to see what went wrong. S340_900-950k_wu_2610_0


You can check the stderr.txt and find a Resume... bit line.

Drago75
Send message
Joined: 29 Mar 21
Posts: 6
Credit: 7,005,275
RAC: 0
Message 7612 - Posted: 28 May 2021, 8:20:19 UTC - in response to Message 7611.

Ok found it. But nevertheless that doesn't solve the problem because like I said all wu's started again from zero. After 2 hours they show now close to 20%. Obviously the BOINC manager didn't use the stderr.txt files to resume crunshing.
____________

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7255
Credit: 42,729,227
RAC: 2
Message 7613 - Posted: 28 May 2021, 8:35:28 UTC - in response to Message 7612.

Ok found it. But nevertheless that doesn't solve the problem because like I said all wu's started again from zero. After 2 hours they show now close to 20%. Obviously the BOINC manager didn't use the stderr.txt files to resume crunshing.


yeah, dont worry, its only a wrapper design issue where the runtime will not be saved but counting in the calculation where its restarting from the last checkpoint.

Drago75
Send message
Joined: 29 Mar 21
Posts: 6
Credit: 7,005,275
RAC: 0
Message 7616 - Posted: 29 May 2021, 18:54:51 UTC - in response to Message 7613.
Last modified: 29 May 2021, 18:56:58 UTC

Ok, maybe I was too impatient because the progress bar is also way off. Now I understand how your wu's work.

The suffix "long" in the name is no exageration as they will need around 40 hours each on my PC. I have read that it is possible to set up the app_config.xml to run multithreading on them. My question is, where do I find the exact name of the app or can I just put "all" under name? Thanks rebirther. And again for always answering so quickly. That is rather unusual I must say but greatly appreciated!
____________

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7255
Credit: 42,729,227
RAC: 2
Message 7617 - Posted: 29 May 2021, 18:57:51 UTC - in response to Message 7616.

Ok, maybe I was too impatient because the progress bar is also way off. Now I understand how your wu's work.

The suffix "long" in the name is no exageration as they will need around 40 hours each on my PC. I have read that it is possible to set up the app_config.xml to run multithreading on them. My question is, where do I find the exact name of the app or can I just put "all" under name? Thanks rebirther. And again for always answering so quickly. That is rather unusual I must say but greatly appreciated!


You can find all in the FAQ section.

https://srbase.my-firewall.org/sr5/forum_thread.php?id=6&postid=3795

Drago75
Send message
Joined: 29 Mar 21
Posts: 6
Credit: 7,005,275
RAC: 0
Message 7619 - Posted: 30 May 2021, 8:49:58 UTC
Last modified: 30 May 2021, 8:50:14 UTC

When I came back this morning to my PC I had to find out that all 24 wu's had errored after 40-50 hours of crunching, not one turned out valid! That is extremely frustrating! If you want to take a look at the log here is one example:

Name S294_900-950k_wu_2799_0
Arbeitspaket 48959768
Erstellt 24 May 2021, 9:54:55 UTC
Gesendet 27 May 2021, 12:17:37 UTC
Ablaufdatum 2 Jun 2021, 0:17:37 UTC
Empfangen 30 May 2021, 8:07:09 UTC
Serverstatus Abgeschlossen
Resultat calculation error
Clientstatus calculation error
Endstatus 194 (0xc2) EXIT_ABORTED_BY_CLIENT
Computer ID 213118
Laufzeit 1 Tage 18 Stunden 23 min. 7 sek.
CPU Zeit 4 min. 55 sek.
Prüfungsstatus invalid
Punkte 0.00
Device peak FLOPS 4.85 GFLOPS
Anwendungsversion Sierpinski / Riesel Base - long v0.22
Peak working set size 70.34 MB
Peak swap size 72.20 MB
Peak disk usage 35.20 MB
____________

Profile rebirther
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 2 Jan 13
Posts: 7255
Credit: 42,729,227
RAC: 2
Message 7620 - Posted: 30 May 2021, 11:08:45 UTC - in response to Message 7619.

Process still present 5 min after writing finish file; aborting

https://srbase.my-firewall.org/sr5/result.php?resultid=50936674

61*294^907078+1 is not prime. RES64: F04AD92BC1E328B1. OLD64: D0E08B8345A97A10 Time : 204517.266 sec.
06:57:53 (1224): llr.exe exited; CPU time 146818.328125
06:57:53 (1224): called boinc_finish(0)
10:00:22 (14604): wrapper (7.5.26012): starting
10:00:22 (14604): wrapper: running llr.exe ( -d -oPgenInputFile=input.prp -oPgenOutputFile=primes.txt -oDiskWriteTime=10 -oOutputIterations=50000 -oResultsFileIterations=99999999)
Base factorized as : 2*3*7^2
Base prime factor(s) taken : 7
Starting N-1 prime test of 61*294^907078+1
Using all-complex FMA3 FFT length 768K, Pass1=384, Pass2=2K, a = 3


Its better to suspend WUs / project if you want to restart your PC. In all your cases the WU was finished an restarted again.

Drago75
Send message
Joined: 29 Mar 21
Posts: 6
Credit: 7,005,275
RAC: 0
Message 7621 - Posted: 30 May 2021, 11:59:46 UTC - in response to Message 7620.

Its better to suspend WUs / project if you want to restart your PC. In all your cases the WU was finished an restarted again.


Now THAT is also worth mentioning in the FAQ's. Right were they deal with the long units! :-)
____________


Post to thread

Message boards : Number crunching : Do "Sierpinski / Riesel base long" checkpoint?


Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther