Author Message

Marius



Project developer

Project scientist



Project administratorProject developerProject scientist Send message

Joined: 29 Jun 15

Posts: 470

Credit: 4,276

RAC: 0

Joined: 29 Jun 15Posts: 470Credit: 4,276RAC: 0 Message 20463 - Posted: 21 Oct 2015, 15:12:58 UTC

Last modified: 21 Oct 2015, 15:16:10 UTC Upgrade completed successfully. Reply Quote ID: 20463 ·

Marius



Project developer

Project scientist



Project administratorProject developerProject scientist Send message

Joined: 29 Jun 15

Posts: 470

Credit: 4,276

RAC: 0

Joined: 29 Jun 15Posts: 470Credit: 4,276RAC: 0 Message 20464 - Posted: 21 Oct 2015, 15:14:06 UTC

Last modified: 21 Oct 2015, 21:26:07 UTC



Everything appears to have gone well, we don't anticipate this should disrupt the jobs you were running. However please comment with any issues you might find related to the upgrade here, we'll use this thread as tech support. Please also make sure you read the new



This upgrade is a big milestone for us. Its the first time in several years the server or the application has been upgraded. It sets us up to deploy and update future applications very easily compared to the work that was required before, and I'm really excited about what we can and will do at C@H in the future!



So what exactly is new?



A new app, "camb_boinc2docker", based on the very latest version of CAMB. It runs in an entirely new way, using software I developed for BOINC based on Virtualbox and Docker, and is what will make future development much more efficient.



Mac OSX support



Multi-threaded support



An accurate progress bar



The new default "third" BOINC credit system



A very recent version of the BOINC server software which includes a number of changes e.g. to the forum functionality, etc...



A visual redesign of the site.



For 32-bit users or users who don't have Virtualbox installed, the existing camb app, now called "camb_legacy", is still supported.



The server code is (almost) entirely public on github.

Today with Kevin's help we completed the upgrade of the C@H server.Everything appears to have gone well, we don't anticipate this should disrupt the jobs you were running. However please comment with any issues you might find related to the upgrade here, we'll use this thread as tech support. Please also make sure you read the new requirements and FAQ This upgrade is a big milestone for us. Its the first time in several years the server or the application has been upgraded. It sets us up to deploy and update future applications very easily compared to the work that was required before, and I'm really excited about what we can and will do at C@H in the future!So what exactly is new?



For now I have marked the new app, camb_boinc2docker, as "beta" which means if you would like to run it, you need to check "Run test applications" under your Cosmology@Home preferences in your account. We'll get rid of the beta tag shortly after we are sure everything checks out.



Thank you also to those that ran the beta server over the last month, it is now shutdown permanently and I'll be transferring over the credits you earned in the next few days.



Thanks everyone and feel free to leave comments / questions below,



Marius & C@H team Reply Quote ID: 20464 ·

Jim1348

Send message

Joined: 17 Nov 14

Posts: 103

Credit: 4,253,888

RAC: 341

Joined: 17 Nov 14Posts: 103Credit: 4,253,888RAC: 341 Message 20465 - Posted: 21 Oct 2015, 17:36:03 UTC - in response to Message 20464. After a shaky start, things are going well. The first one got stuck at 0.100 percent, so I aborted it after 10 minutes (it was by then running High Priority). The second one finished in 34 seconds, so it clearly will be a validate error. But the next four finished OK in about 5 1/2 minutes each on six cores of an i7-4770 (Win7 64-bit, VBox 5.0.6). I think it will work. Reply Quote ID: 20465 ·

Marius



Project developer

Project scientist



Project administratorProject developerProject scientist Send message

Joined: 29 Jun 15

Posts: 470

Credit: 4,276

RAC: 0

Joined: 29 Jun 15Posts: 470Credit: 4,276RAC: 0 Message 20466 - Posted: 21 Oct 2015, 19:50:22 UTC - in response to Message 20465.

Last modified: 21 Oct 2015, 19:50:32 UTC After a shaky start, things are going well. The first one got stuck at 0.100 percent, so I aborted it after 10 minutes (it was by then running High Priority). The second one finished in 34 seconds, so it clearly will be a validate error. But the next four finished OK in about 5 1/2 minutes each on six cores of an i7-4770 (Win7 64-bit, VBox 5.0.6). I think it will work.

Thanks for the update. I'm a bit surprised by the stuck job hopefully its an exception. The validate error has been happening sporadically to everyone on the beta, as you know. At least it dies very quickly so there's not much wasted effort. Both I think are sourced by a bug in Docker (really following along) which I believe should be fixed in version 1.9.0 which should be out literally any day now. As soon as it is I'll update camb_boinc2docker. Thanks for the update. I'm a bit surprised by the stuck job hopefully its an exception. The validate error has been happening sporadically to everyone on the beta, as you know. At least it dies very quickly so there's not much wasted effort. Both I think are sourced by a bug in Docker ( this if you'refollowing along) which I believe should be fixed in version 1.9.0 which should be out literally any day now. As soon as it is I'll update camb_boinc2docker. Reply Quote ID: 20466 ·

Crystal Pellet

Send message

Joined: 12 Feb 13

Posts: 23

Credit: 363,133

RAC: 2

Joined: 12 Feb 13Posts: 23Credit: 363,133RAC: 2 Message 20467 - Posted: 21 Oct 2015, 20:44:46 UTC - in response to Message 20466. I'm a bit surprised by the stuck job hopefully its an exception. The validate error has been happening sporadically to everyone on the beta, as you know.



Is it not caused by this failure:



Error while pulling image: Get https://index.docker.io/v1/repositories/marius311/camb_boinc2docker/images: dial tcp: lookup index.docker.io: no DNS servers Is it not caused by this failure: Reply Quote ID: 20467 ·

Marius



Project developer

Project scientist



Project administratorProject developerProject scientist Send message

Joined: 29 Jun 15

Posts: 470

Credit: 4,276

RAC: 0

Joined: 29 Jun 15Posts: 470Credit: 4,276RAC: 0 Message 20468 - Posted: 21 Oct 2015, 21:40:10 UTC - in response to Message 20467.

Is it not caused by this failure:



Error while pulling image: Get https://index.docker.io/v1/repositories/marius311/camb_boinc2docker/images: dial tcp: lookup index.docker.io: no DNS servers



That's right, this is what I think is fixed in 1.9.0, granted I'm not 100% sure. We should find out in a few days and if its not then there's some other options. That's right, this is what I think is fixed in 1.9.0, granted I'm not 100% sure. We should find out in a few days and if its not then there's some other options. Reply Quote ID: 20468 ·

Jim1348

Send message

Joined: 17 Nov 14

Posts: 103

Credit: 4,253,888

RAC: 341

Joined: 17 Nov 14Posts: 103Credit: 4,253,888RAC: 341 Message 20469 - Posted: 21 Oct 2015, 22:57:45 UTC Another one got stuck, but at 99+%. After 45 minutes I aborted it; the CPU usage was down to practically zero.

camb_boinc2docker_1826_1445437510.429958_0 Reply Quote ID: 20469 ·

Marius



Project developer

Project scientist



Project administratorProject developerProject scientist Send message

Joined: 29 Jun 15

Posts: 470

Credit: 4,276

RAC: 0

Joined: 29 Jun 15Posts: 470Credit: 4,276RAC: 0 Message 20470 - Posted: 21 Oct 2015, 23:18:41 UTC

Last modified: 21 Oct 2015, 23:18:57 UTC Note to Mac users: I'm aware of a bug affecting Mac that might be causing your job to finish after ~30 seconds with no error, but produce an invalid result. I'll look into fixing it as soon as I can. Reply Quote ID: 20470 ·

Marius



Project developer

Project scientist



Project administratorProject developerProject scientist Send message

Joined: 29 Jun 15

Posts: 470

Credit: 4,276

RAC: 0

Joined: 29 Jun 15Posts: 470Credit: 4,276RAC: 0 Message 20471 - Posted: 21 Oct 2015, 23:22:42 UTC - in response to Message 20469. Another one got stuck, but at 99+%. After 45 minutes I aborted it; the CPU usage was down to practically zero.

camb_boinc2docker_1826_1445437510.429958_0

That's weird, the That's weird, the log looks like the calculation actually ran, so this is unlike any other stuck job I've seen so far where it gets stuck pulling the Docker image at the beginning. Correct me if I'm wrong, you didn't see any stuck jobs on the beta server, right? Reply Quote ID: 20471 ·

Jim1348

Send message

Joined: 17 Nov 14

Posts: 103

Credit: 4,253,888

RAC: 341

Joined: 17 Nov 14Posts: 103Credit: 4,253,888RAC: 341 Message 20472 - Posted: 22 Oct 2015, 0:22:02 UTC - in response to Message 20471.

Last modified: 22 Oct 2015, 0:23:28 UTC Correct me if I'm wrong, you didn't see any stuck jobs on the beta server, right?

I think that there were a small number there also; probably one every other day or so. I don't recall whether they stuck at the beginning or the end of a job (more likely the end), and I have detached from that server so the logs are no longer available at my end. I think that there were a small number there also; probably one every other day or so. I don't recall whether they stuck at the beginning or the end of a job (more likely the end), and I have detached from that server so the logs are no longer available at my end. Reply Quote ID: 20472 ·

Marius



Project developer

Project scientist



Project administratorProject developerProject scientist Send message

Joined: 29 Jun 15

Posts: 470

Credit: 4,276

RAC: 0

Joined: 29 Jun 15Posts: 470Credit: 4,276RAC: 0 Message 20477 - Posted: 22 Oct 2015, 13:29:33 UTC

Last modified: 22 Oct 2015, 14:37:38 UTC



Win/Linux users seeing sporadic invalid jobs (there may still be a few jobs getting stuck the very first time you run camb_boinc2docker, hopefully not many and this should be fixed by Docker 1.9.0 coming out in the next few day)



Mac users having all jobs invalid.

I just pushed two updates which should fix:

Note: I got rid of all the old jobs which didn't have these updates and weren't in progress for anyone, but it may take a bit to flush out the ones that were. Reply Quote ID: 20477 ·

Marius



Project developer

Project scientist



Project administratorProject developerProject scientist Send message

Joined: 29 Jun 15

Posts: 470

Credit: 4,276

RAC: 0

Joined: 29 Jun 15Posts: 470Credit: 4,276RAC: 0 Message 20490 - Posted: 22 Oct 2015, 19:50:11 UTC - in response to Message 20489.

Last modified: 22 Oct 2015, 19:54:03 UTC Thanks very much for sorting through and sending these logs, its really helpful. According to this, they're all getting stuck after the calculation is complete and the VM is shutdown, so this has nothing to do with Docker. Unfortunately the log doesn't offer many hints.



One thing that'd help, which I know is asking a lot so don't feel obliged to, but if you or anyone else seeing this the next time they get a job stuck, before you abort if you could go into your BOINC folder, in the subfolder slots/X where X is whatever number this job happens to be, and send me the contents of the various text files you find in there (you can send via PM).



I'll keep looking into this. Reply Quote ID: 20490 ·

newman

Send message

Joined: 25 Oct 08

Posts: 3

Credit: 181,743

RAC: 0

Joined: 25 Oct 08Posts: 3Credit: 181,743RAC: 0 Message 20508 - Posted: 24 Oct 2015, 21:22:15 UTC My new WU also all stuck. In the log I find the following:



Guest Log: progress_template

2015-10-24 23:07:44 (7640): Guest Log: params_00.ini

2015-10-24 23:07:44 (7640): Guest Log: params_01.ini

2015-10-24 23:07:44 (7640): Guest Log: params_02.ini

2015-10-24 23:07:44 (7640): Guest Log: params_03.ini

2015-10-24 23:07:44 (7640): Guest Log: params_04.ini

2015-10-24 23:07:44 (7640): Guest Log: Error: No such image or container: marius311/camb_boinc2docker:0.02 Reply Quote ID: 20508 ·

newman

Send message

Joined: 25 Oct 08

Posts: 3

Credit: 181,743

RAC: 0

Joined: 25 Oct 08Posts: 3Credit: 181,743RAC: 0 Message 20509 - Posted: 24 Oct 2015, 21:49:34 UTC 0:00:34.466021 VMMDev: Guest Log: b3d362b23ec1: Download complete

00:00:35.137288 VMMDev: Guest Log: time="2015-10-24T21:45:17.667661453Z" level=debug msg="Downloaded b3d362b23ec1a7ba1694e6607b44c5e3fb63d68e5ae01f339c6abe8b0c995601 to tempfile /var/lib/docker/tmp/GetImageBlob710106124"

00:00:50.600772 VMMDev: Guest Log: a7e6eea8e649: Verifying Checksum

00:00:51.200290 VMMDev: Guest Log: time="2015-10-24T21:45:33.802853465Z" level=error msg="filesystem layer verification failed for digest sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4"

00:00:53.591200 VMMDev: Guest Log: 757de7f408a1: Verifying Checksum

00:00:54.201561 VMMDev: Guest Log: time="2015-10-24T21:45:36.793200800Z" level=error msg="filesystem layer verification failed for digest sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4" Reply Quote ID: 20509 ·

Marius



Project developer

Project scientist



Project administratorProject developerProject scientist Send message

Joined: 29 Jun 15

Posts: 470

Credit: 4,276

RAC: 0

Joined: 29 Jun 15Posts: 470Credit: 4,276RAC: 0 Message 20510 - Posted: 24 Oct 2015, 21:52:57 UTC - in response to Message 20508. My new WU also all stuck. In the log I find the following:



2015-10-24 23:07:44 (7640): Guest Log: Error: No such image or container: marius311/camb_boinc2docker:0.02

That error is actually expected, it just means this is your first time running camb_boinc2docker and the image needs to be downloaded. The problem is that this download fails, which is what is shown in the several lines below that. This is the problem that I believe will be solved in Docker 1.9.0 which is due in a couple of days. Alternatively if you're eager to get it working now, its pretty sporadic, so you might just try aborting jobs that get stuck and trying to run new ones; once your client gets the image downloaded it won't have to do it again for subsequent jobs. That error is actually expected, it just means this is your first time running camb_boinc2docker and the image needs to be downloaded. The problem is that this download fails, which is what is shown in the several lines below that. This is the problem that I believe will be solved in Docker 1.9.0 which is due in a couple of days. Alternatively if you're eager to get it working now, its pretty sporadic, so you might just try aborting jobs that get stuck and trying to run new ones; once your client gets the image downloaded it won't have to do it again for subsequent jobs. Reply Quote ID: 20510 ·

Marius



Project developer

Project scientist



Project administratorProject developerProject scientist Send message

Joined: 29 Jun 15

Posts: 470

Credit: 4,276

RAC: 0

Joined: 29 Jun 15Posts: 470Credit: 4,276RAC: 0 Message 20522 - Posted: 26 Oct 2015, 23:19:37 UTC

Last modified: 26 Oct 2015, 23:28:00 UTC



I had fixed the problem of no camb_legacy jobs being sent out, which introduced the problem that users were now getting camb_legacy jobs instead of camb_boinc2docker ones. Anyway, this is all fixed now, it took a bug fix in BOINC's scheduler thanks to David Anderson.



I had accidentally deleted the log out button. Its back now on your "Your Account" page. (But why would you want to leave? :)

Just pushed two updates:

Next on the TODO list is to fix errors running jobs for Mac users. Hang in OSX guys, sorry its taken this long! Reply Quote ID: 20522 ·

kararom

Send message

Joined: 9 Jan 09

Posts: 69

Credit: 29,506,700

RAC: 0

Joined: 9 Jan 09Posts: 69Credit: 29,506,700RAC: 0 Message 20532 - Posted: 29 Oct 2015, 16:39:29 UTC

Last modified: 29 Oct 2015, 16:39:50 UTC Is camb_boinc2docker beta test now?



http://www.cosmologyathome.org/apps.php



P.S.: Button B i u k Quote Code List List= Img URL - not working Reply Quote ID: 20532 ·

Marius



Project developer

Project scientist



Project administratorProject developerProject scientist Send message

Joined: 29 Jun 15

Posts: 470

Credit: 4,276

RAC: 0

Joined: 29 Jun 15Posts: 470Credit: 4,276RAC: 0 Message 20533 - Posted: 29 Oct 2015, 16:45:32 UTC - in response to Message 20532.

Last modified: 29 Oct 2015, 16:45:41 UTC Is camb_boinc2docker beta test now?



http://www.cosmologyathome.org/apps.php

Yea still is. There's still a Mac issue to fix and I'd like to upgrade to Docker 1.9.0 before removing the tag, which will likely be another week or so.



P.S.: Button B i u k Quote Code List List= Img URL - not working

Thanks, I did notice that too, will look into it. Yea still is. There's still a Mac issue to fix and I'd like to upgrade to Docker 1.9.0 before removing the tag, which will likely be another week or so.Thanks, I did notice that too, will look into it. Reply Quote ID: 20533 ·