David Anderson

Volunteer moderator

Project administrator

Project developer

Project scientist



Send message

Joined: 13 Feb 99

Posts: 142

Credit: 502,653

RAC: 0



Message 1956619 - Posted: 21 Sep 2018, 8:19:33 UTC



What I realized is that when we find a multiplet whose signals are in the middle or end of the window, it's possible that there are signals beyond the end of the window which, if added to the multiplet, would increase its score.



These additional signals might be detected as another multiplet. But then we'd have two multiplets with lower scores, rather than one big multiplet with a higher score. If there's an ET signal in our data it's critical that the corresponding multiplet have as high a score as possible, so that it stands out from the noise multiplets. Similarly, we want



I came up with a way to address the problem. The idea is that when you find a multiplet M, instead of outputting it immediately, you "stash" it, slide the window so that it starts at the first signal in M, and look for a multiplet in the resulting window. If you find one, and either a) it is entirely after M, or b) its score is lower than that of M, then you output M. Otherwise you stash the new multiplet and continue.



That's the basic idea. In full detail (see below) it's quite a bit more complex, and figuring it out required extra coffee, extra concentration, and a lot of trial and error. BTW, there may be errors in my solution, and there may be a better approach entirely. If so, please let me know!



I should point out that the multiplet-finding algorithm is "heuristic". That means it only approximates the right answer,

i.e. the highest-scoring multiplets. Finding the actual right answer, I believe, is "NP-complete", essentially meaning that it would take more than the lifetime of the universe to solve. In any case, the score function is itself a heuristic for estimating the probability of a set of signals occurring in noise.



When I work on things like this I write "pseudocode", which is halfway between English and a program. Here's my final pseudocode for this.

maintain a variable "have_mp" and mp0 Invariant: if have_mp, then mp0 begins at start of window Invariant: MPs are disjoint and don't overlap in freq Assumption: if X is a subset of Y, then find_multiplet(X) has lower score than find_multiplet(Y) for signals S (in bary freq order) while window not empty if S doesn't exceed window, break // invariant: current window is maximal (S is beyond it) // each iteration of while removes at least 1 signal from start of window if find a multiplet m if have_mp have_mp = false if start(m) > end(mp0) add mp0 to result vector else if score(mp0) > score(m) add mp0 to result vector move start of window past mp0 // discard m in this case continue // decide if we should stash m f = freq of first signal in m if S.freq < f + window size // S is in the window starting at start of m; extend window mp0 = m have_mp = true move start of window to start(m) break else add m to result vector move start of window past end(m) else trim front of window enough to include S at end break append S to window at end if have_mp add m0 to result look for multiplet in final window I recently thought of a way to improve our multiplet-finding algorithm . In a nutshell, the algorithm works on one pixel at a time. It collects the signals whose sky locations are in a small disk centered at that pixel. It sorts these signals by barycentric frequency , then scans the signals with a "sliding window" whose frequency width depends on whether we're looking for barycentric and non-barycentric multiplets. When it finds a multiplet, it records it and moves the window past it.What I realized is that when we find a multiplet whose signals are in the middle or end of the window, it's possible that there are signals beyond the end of the window which, if added to the multiplet, would increase its score.These additional signals might be detected as another multiplet. But then we'd have two multiplets with lower scores, rather than one big multiplet with a higher score. If there's an ET signal in our data it's critical that the corresponding multiplet have as high a score as possible, so that it stands out from the noise multiplets. Similarly, we want birdies to be detected with as high scores as possible, to demonstrate the full sensitivity of our system.I came up with a way to address the problem. The idea is that when you find a multiplet M, instead of outputting it immediately, you "stash" it, slide the window so that it starts at the first signal in M, and look for a multiplet in the resulting window. If you find one, and either a) it is entirely after M, or b) its score is lower than that of M, then you output M. Otherwise you stash the new multiplet and continue.That's the basic idea. In full detail (see below) it's quite a bit more complex, and figuring it out required extra coffee, extra concentration, and a lot of trial and error. BTW, there may be errors in my solution, and there may be a better approach entirely. If so, please let me know!I should point out that the multiplet-finding algorithm is "heuristic". That means it only approximates the right answer,i.e. the highest-scoring multiplets. Finding the actual right answer, I believe, is "NP-complete", essentially meaning that it would take more than the lifetime of the universe to solve. In any case, the score function is itself a heuristic for estimating the probability of a set of signals occurring in noise.When I work on things like this I write "pseudocode", which is halfway between English and a program. Here's my final pseudocode for this. ID: 1956619 ·