Author Message

Mad_Max

Send message

Joined: 31 Dec 09

Posts: 192

Credit: 14,604,469

RAC: 7,947

Joined: 31 Dec 09Posts: 192Credit: 14,604,469RAC: 7,947 Message 91678 - Posted: 12 Feb 2020, 8:32:02 UTC

Last modified: 12 Feb 2020, 9:05:43 UTC



One of my computer crashed today. Then i start digging why - it was out of RAM.

And second was in "swap of death" state"(swapping non-stop for hours while almost not doing any useful work )

More digging - reason of out of RAM and non-stop swapping was Rosetta.



I see HUGE RAM usage by some of latest WUs. Form 1.5 to 3.5 GB of RAM per working WU.



You can see a lot of task using 1400-1600 MB of RAM currently and ~2800 MB of RAM as a peak value.

Before crash and reboot few tasks peaked at ~3200-3500 MB before system crash after running out of both RAM and disk swap space.



Usual consumption for R@H in 300-1000 MB range. Is this WUs is something completely new?

Or just bugs like memory leaks?



It all Rosetta 4.07 WUs and names start by "rb_02_xx (where xx = 29, 08, 08 and 10).

I guess it Robetta WUs generated at 29 JAN, 08 FEB, 09 FEB, 10 FEB.



I was forced to limit maximum of concurrency running R@H units using "max concurrency" setting in app config.



Some example WUs

https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861215

https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861165

https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861118

https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861128

https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861130

https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861138

https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861090

https://boinc.bakerlab.org/rosetta/result.php?resultid=1121861114

https://boinc.bakerlab.org/rosetta/result.php?resultid=1121613378 Hello.One of my computer crashed today. Then i start digging why - it was out of RAM.And second was in "swap of death" state"(swapping non-stop for hours while almost not doing any useful work )More digging - reason of out of RAM and non-stop swapping was Rosetta.I see HUGE RAM usage by some of latest WUs. Form 1.5 to 3.5 GB of RAM per working WU.You can see a lot of task using 1400-1600 MB of RAM currently and ~2800 MB of RAM as a peak value.Before crash and reboot few tasks peaked at ~3200-3500 MB before system crash after running out of both RAM and disk swap space.Usual consumption for R@H in 300-1000 MB range. Is this WUs is something completely new?Or just bugs like memory leaks?It all Rosetta 4.07 WUs and names start by "rb_02_xx (where xx = 29, 08, 08 and 10).I guess it Robetta WUs generated at 29 JAN, 08 FEB, 09 FEB, 10 FEB.I was forced to limit maximum of concurrency running R@H units using "max concurrency" setting in app config.Some example WUs Reply Quote ID: 91678 · Rating: 0 · rate:

Mad_Max

Send message

Joined: 31 Dec 09

Posts: 192

Credit: 14,604,469

RAC: 7,947

Joined: 31 Dec 09Posts: 192Credit: 14,604,469RAC: 7,947 Message 91680 - Posted: 12 Feb 2020, 10:31:00 UTC

Last modified: 12 Feb 2020, 10:41:20 UTC







Now > 3000 MB per WU after ~5 hours of running.

rb_02_08_15652_15556__t000__0_C3_SAVE_ALL_OUT_IGNORE_THE_REST_891233_7217

and

rb_02_08_15652_15556__t000__0_C3_SAVE_ALL_OUT_IGNORE_THE_REST_891233_7469



Looks much like memory leaks. Buy it non linear but RAM usage jump after each stage of computation finished and new begins.

Smell like data/object not released properly after use. Longer they run - more RAM to consume.Now > 3000 MB per WU after ~5 hours of running.rb_02_08_15652_15556__t000__0_C3_SAVE_ALL_OUT_IGNORE_THE_REST_891233_7217andrb_02_08_15652_15556__t000__0_C3_SAVE_ALL_OUT_IGNORE_THE_REST_891233_7469Looks much like memory leaks. Buy it non linear but RAM usage jump after each stage of computation finished and new begins.Smell like data/object not released properly after use. Reply Quote ID: 91680 · Rating: 0 · rate:

Jim1348

Send message

Joined: 19 Jan 06

Posts: 480

Credit: 25,919,419

RAC: 72,278

Joined: 19 Jan 06Posts: 480Credit: 25,919,419RAC: 72,278 Message 91683 - Posted: 12 Feb 2020, 15:22:32 UTC - in response to Message 91680.

Last modified: 12 Feb 2020, 15:23:53 UTC I think the moderator says that happens on the development versions. In any case, I am glad to see my memory used.

I have 16 GB on a Ryzen 2600 (using 11 cores) and 32 GB on a Ryzen 3700x (using 15 cores), and haven't run out yet, though I see over 3 GB used on several of them.



Thanks for the warning. Reply Quote ID: 91683 · Rating: 0 · rate:

Jim1348

Send message

Joined: 19 Jan 06

Posts: 480

Credit: 25,919,419

RAC: 72,278

Joined: 19 Jan 06Posts: 480Credit: 25,919,419RAC: 72,278 Message 91687 - Posted: 13 Feb 2020, 7:00:26 UTC - in response to Message 91683.

Last modified: 13 Feb 2020, 7:01:23 UTC I just got my first work unit suspended "waiting for memory" on my Ryzen 2600 (with 16 GB).

There was about 1 GB available.



So I will continue on my Ryzen 3700x (32 GB). That should work for the foreseeable future. Reply Quote ID: 91687 · Rating: 0 · rate:

Admin



Project administrator Send message

Joined: 1 Jul 05

Posts: 5239

Credit: 0

RAC: 0

Joined: 1 Jul 05Posts: 5239Credit: 0RAC: 0 Message 91696 - Posted: 14 Feb 2020, 6:16:56 UTC These are likely jobs that are modeling the Spike complex (http://new.robetta.org/results.php?id=15652) of 2019-nCoV_S, the corona virus. The genome has been sequenced and there is a mad rush to determine structures for possible drug targets.



We are collaborating with a number of different research groups to model corona virus proteins that may be possible drug targets, including the NIH/NAIAD and SSGCID https://www.ssgcid.org/. Reply Quote ID: 91696 · Rating: 0 · rate:

Jim1348

Send message

Joined: 19 Jan 06

Posts: 480

Credit: 25,919,419

RAC: 72,278

Joined: 19 Jan 06Posts: 480Credit: 25,919,419RAC: 72,278 Message 91698 - Posted: 14 Feb 2020, 10:19:41 UTC - in response to Message 91696. We are collaborating with a number of different research groups to model corona virus proteins that may be possible drug targets, including the NIH/NAIAD and SSGCID https://www.ssgcid.org/.

Great! I couldn't ask for more. I have re-arranged my machines so that Rosetta has plenty of memory.

Throw them at us, though I am not surprised if it causes a lot of problems. I hope people check here for what is going on. Great! I couldn't ask for more. I have re-arranged my machines so that Rosetta has plenty of memory.Throw them at us, though I am not surprised if it causes a lot of problems. I hope people check here for what is going on. Reply Quote ID: 91698 · Rating: 0 · rate:

Nick Name

Send message

Joined: 12 Aug 09

Posts: 3

Credit: 2,047,689

RAC: 113

Joined: 12 Aug 09Posts: 3Credit: 2,047,689RAC: 113 Message 91702 - Posted: 14 Feb 2020, 19:24:39 UTC - in response to Message 91696. These are likely jobs that are modeling the Spike complex (http://new.robetta.org/results.php?id=15652) of 2019-nCoV_S, the corona virus. The genome has been sequenced and there is a mad rush to determine structures for possible drug targets.



We are collaborating with a number of different research groups to model corona virus proteins that may be possible drug targets, including the NIH/NAIAD and SSGCID https://www.ssgcid.org/.

This is exciting, but these types of jobs should be accompanied by a News notice so that users aren't surprised. I'd also suggest they be put in a special category in project preferences requiring users to explicitly allow tasks this large. Most users are not going to be able to run these without problems. Team USA page | Team USA forum

Follow us on Twitter This is exciting, but these types of jobs should be accompanied by a News notice so that users aren't surprised. I'd also suggest they be put in a special category in project preferences requiring users to explicitly allow tasks this large. Most users are not going to be able to run these without problems. Reply Quote ID: 91702 · Rating: 0 · rate:

jringo

Send message

Joined: 15 Aug 17

Posts: 12

Credit: 2,628,933

RAC: 0

Joined: 15 Aug 17Posts: 12Credit: 2,628,933RAC: 0 Message 91724 - Posted: 17 Feb 2020, 12:31:57 UTC We're using BOINC Network to spread the word that R@H is working on corona virus problems.



This is the sort of news that would be a great public driver! This news will not only bring cycles from other BOINC projects to yours (likely only temporary -- to solve an immediate and tangible problem -- so don't feel guilty), but would likely bring a significant number of people into the BOINC network at large.



Always feel free to reach out if you'd like help getting a PR made up.



Good luck on the project!



email: boinc.network@gmail.com

discord: https://discord.gg/wPRafUq

twitter: @BOINCNetwork Reply Quote ID: 91724 · Rating: 0 · rate:

retalaznstyle

Send message

Joined: 18 Feb 20

Posts: 1

Credit: 0

RAC: 0

Joined: 18 Feb 20Posts: 1Credit: 0RAC: 0 Message 91729 - Posted: 18 Feb 2020, 2:33:56 UTC - in response to Message 91696. Hi, mod here from coronavirus subreddit. Do you have a post for new users who want to sign up for the coronavirus research efforts via rosetta? Reply Quote ID: 91729 · Rating: 0 · rate:

Jim1348

Send message

Joined: 19 Jan 06

Posts: 480

Credit: 25,919,419

RAC: 72,278

Joined: 19 Jan 06Posts: 480Credit: 25,919,419RAC: 72,278 Message 91730 - Posted: 18 Feb 2020, 3:24:40 UTC - in response to Message 91702. I'd also suggest they be put in a special category in project preferences requiring users to explicitly allow tasks this large.

Yes, exactly. That would allow the use of machines with more memory where they are needed, while the ordinary machines can do the ordinary work.



Also, is more capacity needed? Just ask and we will do it, but we need to know what the need is. Yes, exactly. That would allow the use of machines with more memory where they are needed, while the ordinary machines can do the ordinary work.Also, is more capacity needed? Just ask and we will do it, but we need to know what the need is. Reply Quote ID: 91730 · Rating: 0 · rate:

dcdc

Send message

Joined: 3 Nov 05

Posts: 1662

Credit: 77,876,601

RAC: 33,004

Joined: 3 Nov 05Posts: 1662Credit: 77,876,601RAC: 33,004 Message 91732 - Posted: 18 Feb 2020, 8:32:43 UTC - in response to Message 91729. Hi! Can you ask people just to install BOINC and choose Rosetta, and explain the following?



There is a huge pool of Rosetta tasks, so if some people were to pull out and run the Coronavirus tasks, the rest of us will just end up running more of the other tasks as that is all that would be left.



Does that make sense?



Danny Reply Quote ID: 91732 · Rating: 0 · rate:

JP

Send message

Joined: 18 Feb 20

Posts: 2

Credit: 37,156

RAC: 286

Joined: 18 Feb 20Posts: 2Credit: 37,156RAC: 286 Message 91733 - Posted: 18 Feb 2020, 11:22:45 UTC - in response to Message 91732. Hi Danny,



-I am a brand new user to both BOINC & Rosetta. I was brought here from the COVID-19 Reddit post.



-I do not have a clear understanding of how tasks are distributed and/or prioritized among users.



-If it is possible, I am trying to clarify the best course of action with the most simple set of instructions to communicate to a wide, non-technical, audience on to how to best use Rosetta for COVID-19 related tasks.



-I may have misunderstood, but I believe what you are saying is that it is not possible to prioritize particular tasks in Rosetta because a users resources are distributed among many tasks at once. Therefore, people wishing to commit processing resources to COVID-19 tasks should just run Rosetta. In the course of running Rosetta their computing power will be added to the pool working on all tasks - including COVID-19 related jobs.



Further, if there were the capability to allocate ones own computing power to specific (COVID-19) tasks it would force the resources of other users to be allocated to other, non-specified (non-COVID19) tasks rendering the power of any task specification moot.



-I am running Rosetta now and. unless I missed it, I do not see where I could specify or prioritize particular tasks. It appears that this is not an option anyway.



-It seems the best course of action is to simply download BOINC & run Rosetta?



Thank you for any clarification you could provide.



Best,

-JP Reply Quote ID: 91733 · Rating: 0 · rate:

Jim1348

Send message

Joined: 19 Jan 06

Posts: 480

Credit: 25,919,419

RAC: 72,278

Joined: 19 Jan 06Posts: 480Credit: 25,919,419RAC: 72,278 Message 91735 - Posted: 18 Feb 2020, 13:25:00 UTC - in response to Message 91733. -I may have misunderstood, but I believe what you are saying is that it is not possible to prioritize particular tasks in Rosetta because a users resources are distributed among many tasks at once. Therefore, people wishing to commit processing resources to COVID-19 tasks should just run Rosetta. In the course of running Rosetta their computing power will be added to the pool working on all tasks - including COVID-19 related jobs.



Further, if there were the capability to allocate ones own computing power to specific (COVID-19) tasks it would force the resources of other users to be allocated to other, non-specified (non-COVID19) tasks rendering the power of any task specification moot.



-I am running Rosetta now and. unless I missed it, I do not see where I could specify or prioritize particular tasks. It appears that this is not an option anyway.



-It seems the best course of action is to simply download BOINC & run Rosetta?

As a long-time Rosetta user, I can answer that. Yes, you just work on the pool of all the tasks. In fact, unless you can figure out their obscure nomenclature, you don't even know which ones are for COVID-19.



That is fine with me. It doesn't matter on which machine which particular task is run, as long as they have enough resources.

And if they run out of work, then they have more than enough. As a long-time Rosetta user, I can answer that. Yes, you just work on the pool of all the tasks. In fact, unless you can figure out their obscure nomenclature, you don't even know which ones are for COVID-19.That is fine with me. It doesn't matter on which machine which particular task is run, as long as they have enough resources.And if they run out of work, then they have more than enough. Reply Quote ID: 91735 · Rating: 0 · rate:

Mod.Sense



Volunteer moderator Send message

Joined: 22 Aug 06

Posts: 4015

Credit: 0

RAC: 0

Joined: 22 Aug 06Posts: 4015Credit: 0RAC: 0 Message 91737 - Posted: 18 Feb 2020, 22:25:01 UTC - in response to Message 91733.

-It seems the best course of action is to simply download BOINC & run Rosetta?





Yes, with the expectation that your contributed effort will benefit research teams that use Rosetta to study COVID-19, as well as other protein structures. Your efforts also benefit the team at University of Washington that is developing improvements to Rosetta, which makes this type of computational structure prediction possible. Rosetta Moderator: Mod.Sense Yes, with the expectation that your contributed effort will benefit research teams that use Rosetta to study COVID-19, as well as other protein structures. Your efforts also benefit the team at University of Washington that is developing improvements to Rosetta, which makes this type of computational structure prediction possible. Reply Quote ID: 91737 · Rating: 0 · rate:

Mad_Max

Send message

Joined: 31 Dec 09

Posts: 192

Credit: 14,604,469

RAC: 7,947

Joined: 31 Dec 09Posts: 192Credit: 14,604,469RAC: 7,947 Message 91747 - Posted: 19 Feb 2020, 13:36:47 UTC - in response to Message 91696.

Last modified: 19 Feb 2020, 13:44:30 UTC These are likely jobs that are modeling the Spike complex (http://new.robetta.org/results.php?id=15652) of 2019-nCoV_S, the corona virus. The genome has been sequenced and there is a mad rush to determine structures for possible drug targets.



We are collaborating with a number of different research groups to model corona virus proteins that may be possible drug targets, including the NIH/NAIAD and SSGCID https://www.ssgcid.org/.

So it no memory leaks, it just abnormally big (compared to R@H average work) protein model? 1273 amino acid residues if i get it right?



Is any work on developing of multi-threaded app for such big targets? To not to waste huge amounts of RAM for complete datasest copy for each working thread.

Modern computer getting more and more CPU cores/thread and just running multiples copies on each thread means more and more "overhead" for RAM, Disk and Internet(Bandwidth) usage because use of all of these resources is multiplicates by number of task is running. While multi-threaded app is share all of this and only need multiple CPU/threads.



Usual(common) setup for non server computers is about 1 GB of RAM per 1 CPU thread.

2 GB per thread is much more rare cases. And there are almost no "consumer" or "office" or "home" computer with >2 GB RAM per CPU thread.

So you can not just throw task which consume >=3 GB of RAM per thread and expect that all will be working OK. There WILL be problems on majority of computer.



In other case if there is a multi-threaded app is available then using even 5-10 GB of RAM per single large model will be acceptable for most volunteer computers. Also i will help with runtimes of biggest models on older CPUs - really big models often getting aborted on old(or just slow like Intel Atom or AMD Puma/Jaguar/Bobcat) CPUs by watchdog due to exceeding max allowed runtime (8+4 = 12 hour MAX as default) before very first model/decoy is calculated and CPU time spend is wasted. So it no memory leaks, it just abnormally big (compared to R@H average work) protein model? 1273 amino acid residues if i get it right?Is any work on developing of multi-threaded app for such big targets? To not to waste huge amounts of RAM for complete datasest copy for each working thread.Modern computer getting more and more CPU cores/thread and just running multiples copies on each thread means more and more "overhead" for RAM, Disk and Internet(Bandwidth) usage because use of all of these resources is multiplicates by number of task is running. While multi-threaded app is share all of this and only need multiple CPU/threads.Usual(common) setup for non server computers is about 1 GB of RAM per 1 CPU thread.2 GB per thread is much more rare cases. And there are almost no "consumer" or "office" or "home" computer with >2 GB RAM per CPU thread.So you can not just throw task which consume >=3 GB of RAM per thread and expect that all will be working OK. There WILL be problems on majority of computer.In other case if there is a multi-threaded app is available then using even 5-10 GB of RAM per single large model will be acceptable for most volunteer computers. Also i will help with runtimes of biggest models on older CPUs - really big models often getting aborted on old(or just slow like Intel Atom or AMD Puma/Jaguar/Bobcat) CPUs by watchdog due to exceeding max allowed runtime (8+4 = 12 hour MAX as default) before very first model/decoy is calculated and CPU time spend is wasted. Reply Quote ID: 91747 · Rating: 0 · rate: