Posts by marmot
log in
1) Message boards : Number crunching : All end in computation error; needs Visual C 14 library. (Message 9689)
Posted 15 Feb 2024 by Profile marmot
Your system is old and need some fresh drivers. Why do you want to run 3 TF parallel on one card?


The current video card is old; the system is a newer build to host 4 Mi25's for a ChatGPT trainer.

I did 1, 2, 3 and 4 at once and the 2 or 3 WU's returned the highest RACs on sets of 120 WU's at the same room temp and GPU settings.

1 WU was about 15% lower RAC and 4 WU had the GPU swapping to system RAM on work sets and that had much worse RAC.
2) Message boards : Number crunching : Project d/l's master file table every new batch (Message 9687)
Posted 15 Feb 2024 by Profile marmot
That's unusual behavior, to regularly d/l the master file table.

Is this intended?


no, is this the same host as the other?


Same host.
Can't repeat that behavior now that it's returned successful tasks.

Is it a server attempt to correct a host that errors out on all 80 WU's?
3) Message boards : Number crunching : Project d/l's master file table every new batch (Message 9684)
Posted 14 Feb 2024 by Profile marmot
That's unusual behavior, to regularly d/l the master file table.

Is this intended?
4) Message boards : Number crunching : All end in computation error; needs Visual C 14 library. (Message 9683)
Posted 14 Feb 2024 by Profile marmot
AMD AMD Radeon R9 200 Series (3072MB) OpenCL: 1.2 running 3 TF at once.
Microsoft Windows 10
Professional x64 Edition, (10.00.16299.00)
The wrapper makes it into RAM, it calls the command console, then mfakto never makes it into RAM. The OS cannot considers it unviable application.
Some library is missing?

Decompressed mfakto-win-v8.zip, then tried to run manually.
VCRUNTIME140_1.dll is missing. That library is freely distributable, you could include it in the zipped mfakto-win-v8.zip local d/l.
This machine had all the Visual C runtimes installed last year but time to get library v14.
This is the machine meant for ChatGPT and the Windows install was Tiny10 for temporary use; no VC patching. My life got hectic and haven't gotten to finish the build.

SUCCESS!!!
mfakto is now running again!


<core_client_version>7.24.1</core_client_version>
<![CDATA[
<message>
The operating system cannot run %1.
(0xc3) - exit code 195 (0xc3)</message>
<stderr_txt>
2024-02-14 05:35:07 (288748): wrapper (7.24.26018): starting
2024-02-14 05:35:07 (288748): wrapper: running mfakto.exe (-d 0)
2024-02-14 05:35:07 (288748): wrapper: created child process 287960
2024-02-14 05:35:13 (288748): mfakto.exe exited; CPU time 0.000000
2024-02-14 05:35:13 (288748): app exit status: 0xc0000135
2024-02-14 05:35:13 (288748): called boinc_finish(195)

</stderr_txt>
]]>
5) Message boards : News : new linux wrapper test (Message 8892)
Posted 24 May 2023 by Profile marmot
New work is up, beta test is disabled, you can found it no longer in your prefs.


So the beta version is now the live, actual version?
6) Message boards : Number crunching : what happened on this WU? (Message 8679)
Posted 28 Jan 2023 by Profile marmot
94684920 90801188 23 Jan 2023, 10:25:27 UTC 27 Jan 2023, 23:21:08 UTC Completed and validated 150,394.20 150,394.20 13,500.00 TF v0.21 (cuda120)

The normal run time is 1000 -> 12000.

Why would it run for 150k sec and complete?
It's not an bug, but the work actually needed 150k sec to so all the factoring?

Does that mean it's likely a prime?
7) Message boards : Number crunching : 2 GPUs, 2 tasks on one card, not utilizing gpu 1, (Message 8654)
Posted 18 Jan 2023 by Profile marmot
Hello, this is the content of a job file:
../../projects/srbase.my-firewall.org_sr5/job_TF_l64c_00020.xml

<job_desc>
<task>
<application>./mfaktc.exe</application>
<append_cmdline_args/>
</task>
<unzip_input>
<zipfilename>mfaktc-linux64-v6.zip</zipfilename>
</unzip_input>
</job_desc>

if you want a device number you have to add a -d <number> parameter (default is 0)

Usage: ./mfaktc.exe [options]
-h display this help and exit
-d <device number> specify the device number used by this program
-tf <exp> <min> <max> trial factor M<exp> from 2^<min> to 2^<max> and exit
instead of parsing the worktodo file
-st run builtin selftest and exit
-st2 same as -st but extended range for k_min/m_max
-v <number> set verbosity (min = 0, default = 1, more = 2, max/debug = 3)


So could you give us an example app_config to force a WU onto dev x with a switch, please?
8) Message boards : Number crunching : 2 GPUs, 2 tasks on one card, not utilizing gpu 1, (Message 8643)
Posted 10 Jan 2023 by Profile marmot
Hello, this is the content of a job file:
../../projects/srbase.my-firewall.org_sr5/job_TF_l64c_00020.xml


Can we permanently edit the job file or is it created freshly upon every WU?
Can we edit a job file template that supersedes the default?
9) Message boards : Number crunching : Intel ARC GPUs (Message 8642)
Posted 10 Jan 2023 by Profile marmot


The main question above was:
Do TF's end early when they find a factor so we will see runtimes ending as low as a few seconds to maybe half the normal run time?


I have removed my last post and checked a found factor result, you are right, the test ended earlier if a factor is found. Thats a bonus too in credits.


Maybe I should start a new thread about this since it's not ARC GPU.

The 3 WU at once is getting interesting results with several ending quite quickly:
89848291 7 Jan 2023, 14:49:44 UTC 9 Jan 2023, 17:01:16 UTC Completed and validated 641.52 1.27 14000 TF v0.21 (cuda120)

The typical appears to be:
89847929 7 Jan 2023, 14:57:16 UTC 10 Jan 2023, 11:44:51 UTC Completed and validated 10447.65 3.31 14000 TF v0.21 (cuda120)



The 4 WU at once had 10x longer CPU time when Parlea@Home was on the CPU's vs LODA WU's
Typical run (4 WU at once) Parlea@Home on CPU's:
89834340 6 Jan 2023, 8:43:36 UTC 8 Jan 2023, 13:56:58 UTC Completed and validated 15232.17 128.64 14000 TF v0.19 (cuda120)


Typical with LODA on CPU's:
89848299 7 Jan 2023, 14:45:49 UTC 9 Jan 2023, 2:41:13 UTC Completed and validated 15036.94 6.58 14000 TF v0.21 (cuda120)


I'm not sure what happened to this one; it's CPU usage is severe:
89833841 6 Jan 2023, 7:44:22 UTC 8 Jan 2023, 11:19:22 UTC Completed and validated 15971.15 2114.92 14000 TF v0.19 (cuda120)


Since the runs can end soon with early factor found... that means my data sets will need to be larger. ::sigh::
10) Message boards : Number crunching : Intel ARC GPUs (Message 8631)
Posted 7 Jan 2023 by Profile marmot
There is nearly no CPU time to run TF on GPU.


If the CPU is starved from 3300 Ghz to 1800 Ghz, I'd think it might have some 5 to 10% effect. I'll test that hypothesis some weeks from now.

Of course, it's possible the GPU downclocking from the room getting hotter, caused the 17k run times when 15k is normal today.

The main question above was:
Do TF's end early when they find a factor so we will see runtimes ending as low as a few seconds to maybe half the normal run time?
11) Message boards : Number crunching : 2 GPUs, 2 tasks on one card, not utilizing gpu 1, (Message 8630)
Posted 7 Jan 2023 by Profile marmot


I can speak for all of us when I say, "we want to run the same amount of TF projects that we have of GPUs". Exp: 2 GPUs, we should run 2 TF projects, One for each GPU. Or one TF project on ANY GPU and run whatever project BOINC wants to run on the other GPU". Oh, wait BOINC does that on every project out there like, Prime Grid, GPU Grid, Amicable Numbers, Einstein, ext. Even all of SRBase. Well, for some reason NOT for TF projects.


I think if you changed the word "project" to the word "tasks", no quotes, it would be easier to get what you are trying to say.

What I think you are trying to say is that you want EVERY gpu in the pc to run at least one TF task on it here at SRBase, just like it does at every other Boinc Project ie MilkyWay, Einstein etc when you use the <use-all-gpus> line in the cc_config file.


Yes, that's what understand them to be saying also.
If you have three GTX NVidia cards in a computer then a single TF should get assigned to each GPU device automatically, as long as <use_all_gpus>1</use_all_gpus> is in the cc_config.

From Rebirther:
Q Does mfaktc support multiple GPUs?
A Yes, with the exception that a single instance of mfaktc can only use one
GPU. For each GPU you want to run mfaktc on you need (at least) one
instance of mfaktc. For each instance of mfaktc you can use the
commandline option "-d <GPU number>" to specify which GPU to use for each
specific mfaktc instance.


So how does the "-d <GPU number>" switch work?
Do we send that switch to the TF instance and how do we do that?
Or is it a switch applied in the batch file starting a BOINC client with <use_all_gpus>0</use_all_gpus> set in cc_config?

This is strange. If you start Milkyway@Home or Einstein@Home then a single BOINC client with <use_all_gpus>1</use_all_gpus> will send equal number of WU's to each GPU.
SRBase TF should do the same thing, automatically, without the user sending "-d <GPU number>" to any process.

I'll restate my opinion: BOINC management of GPU's is archaic and clunky. We should easily be able to use the BOINC Management GUI to assign GPU WU's without touching config files....
12) Message boards : Number crunching : Intel ARC GPUs (Message 8628)
Posted 7 Jan 2023 by Profile marmot
I just thought it found factor and decided not to factor further and report at once.


Yeah, if this is the logic then you would find TF's that end much more quickly than others.

Maybe that explains my data set so far, unless these are shorter because I restarted the BOINC client. ThrottleStop won't adjust AMD clock frequencies so I've been rebooting to adjust TDP wattages on the CPU. Ryzen Master won't work on my severely stripped down Windows 10 OS.
We had drastic temperature shifts the last 2 weeks (-15C back to 12C).

92207740 88474001 23 Dec 2022, 4:05:08 UTC 28 Dec 2022, 5:58:22 UTC Completed and validated 11788.17 21.27 14000 TF v0.12 (cuda100) 92208081 88474342 23 Dec 2022, 2:46:03 UTC 28 Dec 2022, 5:53:03 UTC Completed and validated 11503.82 21.2 14000 TF v0.12 (cuda100) 92207948 88474209 23 Dec 2022, 2:32:35 UTC 28 Dec 2022, 5:26:11 UTC Completed and validated 9958.87 19.86 14000 TF v0.12 (cuda100) 92207038 88473299 22 Dec 2022, 22:59:13 UTC 28 Dec 2022, 5:28:12 UTC Completed and validated 10067.67 18.89 14000 TF v0.12 (cuda100) 92206586 88472847 22 Dec 2022, 21:39:10 UTC 27 Dec 2022, 22:08:58 UTC Completed and validated 15354.27 36.41 14000 TF v0.12 (cuda100) 92206262 88472523 22 Dec 2022, 20:19:05 UTC 27 Dec 2022, 22:04:50 UTC Completed and validated 15309.16 35.41 14000 TF v0.12 (cuda100) 92205654 88471915 22 Dec 2022, 18:58:21 UTC 27 Dec 2022, 21:59:16 UTC Completed and validated 15301.56 36.69 14000 TF v0.12 (cuda100) 92204936 88471197 22 Dec 2022, 17:39:26 UTC 27 Dec 2022, 21:37:44 UTC Completed and validated 15086.4 34.75 14000 TF v0.12 (cuda100) 92204702 88470963 22 Dec 2022, 16:18:33 UTC 27 Dec 2022, 16:42:44 UTC Completed and validated 7395.35 18.67 14000 TF v0.12 (cuda100) 92203730 88469991 22 Dec 2022, 14:59:05 UTC 27 Dec 2022, 16:38:33 UTC Completed and validated 7212.77 17.42 14000 TF v0.12 (cuda100) 92202885 88469146 22 Dec 2022, 11:37:02 UTC 27 Dec 2022, 16:32:23 UTC Completed and validated 6952.6 16.97 14000 TF v0.12 (cuda100) 92202790 88469051 22 Dec 2022, 10:17:17 UTC 27 Dec 2022, 16:12:36 UTC Completed and validated 6095.67 14.05 14000 TF v0.12 (cuda100) 92202335 88468596 22 Dec 2022, 8:58:02 UTC 27 Dec 2022, 9:13:32 UTC Completed and validated 17284.71 36.39 14000 TF v0.12 (cuda100) 92201866 88468127 22 Dec 2022, 7:38:42 UTC 27 Dec 2022, 9:06:25 UTC Completed and validated 17268.87 36.86 14000 TF v0.12 (cuda100) 92201549 88467810 22 Dec 2022, 6:20:40 UTC 27 Dec 2022, 9:02:18 UTC Completed and validated 17283.26 36.09 14000 TF v0.12 (cuda100) 92200970 88467231 22 Dec 2022, 5:02:27 UTC 27 Dec 2022, 8:42:04 UTC Completed and validated 17116.18 37.14 14000 TF v0.12 (cuda100) 93642926 89833820 6 Jan 2023, 7:31:31 UTC 7 Jan 2023, 14:05:03 UTC Completed and validated 15996.08 137.91 14000 TF v0.19 (cuda120) 93642629 89833523 6 Jan 2023, 7:31:03 UTC 7 Jan 2023, 13:00:32 UTC Completed and validated 15698.36 139.28 14000 TF v0.19 (cuda120) 93642584 89833478 6 Jan 2023, 7:30:33 UTC 7 Jan 2023, 12:48:38 UTC Completed and validated 15814.41 138.52 14000 TF v0.19 (cuda120) 93642649 89833543 6 Jan 2023, 7:30:03 UTC 7 Jan 2023, 12:26:14 UTC Completed and validated 15666.06 136.45 14000 TF v0.19 (cuda120)


These were all run on the GTX 1060, 4 at a time, same room temperature (within 3 f) and MSI Afterburner settings.
So either the shorter runs (6 to 7k secs) are the WU ending when it found a factor or I reset my BOINC client. The runs at 172xx seconds might have been when the CPU was running at a lower TDP/frequency. I'm not sure how much CPU performance effects the TF runs.
I'll start the data set from the 4 that completed today and not restart the BOINC client and hold the CPU at current TDP of 40 watts (if the weather doesn't get hot in January again...)
13) Message boards : Number crunching : Intel ARC GPUs (Message 8580)
Posted 28 Dec 2022 by Profile marmot

2) Are the WUs TF CPU times only showing the amount of time after a BOINC Time Of Day suspension? (As in 3600sec of an actual 34 hour run)
3) Should my NVidia 1060 run only 1 WU at a time instead of 4 if the suspensions are an issue?


(snip)
2. CPU time is overall
3. yes


2. So, how did the UHD on the 8250U complete a TF in 36xx secs (normal is 100k+)?
There is a checkpoint in the algorithm that can determine that there is no possible prime in the remaining data set and the WU ends prematurely? Or I found a prime on the UHD! (lol)

3. I'm compiling a data set of 4 TF and 1 TF at once. Will report back the results in couple days. Sorry, I have no ARC to test upon.

Currently the 1060 at 4x WU is putting out 4.45 credit/(CPU+GPU sec).
I control the GPU clock/power and the room temp for these tests.

(BTW, running the TF on the UHD is for WUProp hours.
At least it does beat the deadline and get credit.
It's only other project is Einstein, AFAIK.)
14) Message boards : Number crunching : Intel ARC GPUs (Message 8574)
Posted 27 Dec 2022 by Profile marmot


It's at 8 hours and still crunching on the UHD. Most will be aborted for deadline.



It took 118,000 sec for that 1st WU.
The next one finished in 3680 sec though. EDIT: That 118k is a guess. I do not see 2 TF Valid WU's. The valid WU's usually purge after 24 hours so maybe the 3680sec reported IS the long WU and only the last hour of CPU time after a BOINC suspension is showing. See my question below...
The current is at 12 hours and still going.

My electric company has put me on a peak/off peak plan where it's 4 cents/KWh offpeak and 31 cents peak (6am-8am and 6pm- 8pm weekdays).

So I've had to pause all electronic devices, including BOINC, 6-8am and 6-8pm (which is it's own issue since BOINC doesn't support 2 pause periods per day; have to manually pause the GPU hosts daily until my lazy butt gets around to doing 2nd BOINC data folders).

1) Is the vast discrepancy in run times related to TF restarting from the beginning after a suspension period?
2) Are the WUs TF CPU times only showing the amount of time after a BOINC Time Of Day suspension? (As in 3600sec of an actual 34 hour run)
3) Should my NVidia 1060 run only 1 WU at a time instead of 4 if the suspensions are an issue?
15) Message boards : Number crunching : Intel ARC GPUs (Message 8573)
Posted 27 Dec 2022 by Profile marmot
I have given up on the Intel ARC 770 and replaced it with a Gigabyte GTX1660 Super 6GB card.




How many seconds does it take to complete a TF on average?
If you got that info before giving up on it.
16) Message boards : Number crunching : Intel ARC GPUs (Message 8563)
Posted 25 Dec 2022 by Profile marmot


I have changed the plan_class to not send out work for Intel HD Graphics, its too slow, BOINC doesnt have Intel ARC support.


OK, thanks.

It's at 8 hours and still crunching on the UHD. Most will be aborted for deadline.

I didn't consider buying an Intel ARC.
They seem underperforming for computation compared to the same year models coming from NVidia or AMD.
Maybe price per performance or performance per Watt's are something to compare instead?

Anyway, off topic.
17) Message boards : Number crunching : 2 GPUs, 2 tasks on one card, not utilizing gpu 1, (Message 8559)
Posted 25 Dec 2022 by Profile marmot
BOINC's GPU management is archaic and clunky.
It was designed for sharing projects on a single core CPU.

The dev's refuse to add a user managed GUI work queue for each GPU device or CPU core (group of cores).
(Folding at Home has better work queue management)

They expect us to solve these issues by multiple installs of BOINC (I use VM's to solve some CPU queue issues including multiple time of day pauses imposed by the electric company's 10x rates for 2 hours twice a day).

You can setup a BOINC install for each GPU and attempt to exclude the other GPU's in each install (<exclude_gpu>).
This should work fine on 2 GPU's but 3 or more will get trickier.
Using VM's for controlling work queue on each GPU needs GPU passthrough and only works readily on Linux with VMWare and (going by what I read, no personal experience).
I've read Hyper-V on Windows could work but not read anyone report success.

Even when the GPU sharing amongst projects is supposed to be working correctly, I've seen the client properly share WU's among the GPU's then improperly share once WU's complete and new ones start. It takes a complete BOINC client shutdown to fix the issue.

From what Rebirther says, it's not supposed to work, but try this app_config.xml for 2 GPU's
<app_config> <project_max_concurrent>X</project_max_concurrent> <app> <name>TF</name> <max_concurrent>4</max_concurrent> <gpu_versions> <gpu_usage>0.50</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> </app_config>


This could kick the client into loading 2 TF's on both GPU's.
X= your number of CPU threads, also try threads + 4 as X.
Make sure you do a complete restart of the client; don't just reread the config files from Options menu (which won't change the GPU loadout on the fly).

Maybe setting
<max_concurrent>2</max_concurrent> ... <gpu_usage>0.98</gpu_usage>

or
<gpu_usage>0.49</gpu_usage>

will trick BOINC into loading the WU on the other GPU.
The client sometimes acts in ways the documentation claims it won't.

My dual GPU machine is doing Milkyway atm, don't have time to experiment.
18) Message boards : Number crunching : Intel ARC GPUs (Message 8556)
Posted 25 Dec 2022 by Profile marmot
My 8250U received 50+ of these WU's
Since this is real data, the GPU (Intel(R) UHD Graphics 620 (3228MB) OpenCL: 2.1) runs at 740Ghz and I expect it to take 4-7 hours to complete although BOINC thinks they will finish in 2 seconds (the 1060 takes 45-90 mins).

The other laptop is i5-1035G1 UHD Graphics (3173MB) OpenCL: 2.1 but it's not receiving any WU. The server stats do not differentiate TF by GPU app so I can't tell if there are any Intel available.
The return code is:
"12/24/2022 7:40:25 PM | SRBase | Scheduler request completed: got 0 new tasks"
(The queue for GPU is empty, CPU tasks are set to 0 resource and has only few hours of work in queue)

Are all the WU's out or is something being rejected about the 1035G1?
(It was working on Einstein WU's)

EDIT: verified the 1035G1 is using Home preferences which has TF chosen, Intel and NVidia GPU's, no CPU.
19) Message boards : Number crunching : Having trouble connecting to project (Message 8060)
Posted 6 Jan 2022 by Profile marmot
7.16.20+, you can also update the cert file from FAQ


FAQ on certs

Oh, now I remember some project last winter having this issue and installing crt file on a few machines it was running on w/ BOINC ver 7.8.x.
Probably why those machines still ran YoYo successfully.

Gonna take some hours to upgrade all to 7.16.20... uggg.

Nice to be back; it stayed way too warm for SRBase in our city till this week.
Good to see SRBase still chugging away.
20) Message boards : Number crunching : Having trouble connecting to project (Message 8055)
Posted 3 Jan 2022 by Profile marmot
Getting the infamous error on my GPU host " Project communication failed: attempting access to reference site"

The server status appears normal; so I reset the project then removed and get "Failed to attach".

This regular usage laptop is also failing to communicate with the project.

BOINC 7.16.5 here and GPU host is on 7.16.11.

EDIT:
So I updated the GPU host to 7.16.20 and it again got the same error but on the 4th attempt to reattach (sometimes trying the same thing over and over is not insanity) 7.16.20 attached and d/l 1x WU for NVidia and for AMD.

This laptop (still on 7.16.5) now says: " Scheduler request failed: Peer certificate cannot be authenticated with given CA certificates"

2 Questions:
1) What is the oldest version of BOINC that has the correct CA certs?
2) Why could I edit the forums while the feeder servers refused to communicate w/ the GPU host?

(This must be the reason why some of my VM's suddenly stopped working w/ YoYo@home a month back; but some still get work w/ BOINC 7.8.x)


Next 20

Main page · Your account · Message boards


Copyright © 2014-2024 BOINC Confederation / rebirther