Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porting over jump finding stuff #180

Merged
merged 1 commit into from
Jul 18, 2024
Merged

Porting over jump finding stuff #180

merged 1 commit into from
Jul 18, 2024

Conversation

skhrg
Copy link
Member

@skhrg skhrg commented May 20, 2024

Used by simonsobs/sotodlib#807

Been a few years since I've written C++ for use by other humans, so some sanity checks that I don't implement things in a dumb way are appreciated.

@skhrg skhrg requested a review from mhasself May 20, 2024 16:57
@skhrg skhrg changed the title Block moment Porting over jump finding stuff May 23, 2024
@skhrg skhrg requested a review from tskisner May 23, 2024 21:48
Copy link
Member

@mhasself mhasself left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. A few things:

  • These functions assume the arrays (buffers) or C-ordered. But nothing enforces that. So, segfault city. It's pretty easy to generalize these for arbitrary strides; see other uses of BufferWrapper. Or else check that it's C-ordered and throw an informative error. But do not allow for segfaults.
  • Add docstrings to the bp::def entries.
  • Add tests.
  • The rest of so3g formats if and for blocks like this:
if (condition) {
    block;
}

So please add that space and remove that extra new line.

@skhrg skhrg requested a review from mhasself June 9, 2024 00:06
@skhrg skhrg force-pushed the block_moment branch 2 times, most recently from e0642d3 to 382966e Compare June 26, 2024 12:34
@skhrg
Copy link
Member Author

skhrg commented Jun 26, 2024

@mhasself I think this is good to go, could you take another look? I'd like this get this faster jumpfinder out in the wild soon.

Copy link
Member

@mhasself mhasself left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments inline.

I have some reservations about the jump finding algorithm. My immediate reactions were (a) this relies on "blocks" too much and (b) I don't understand what kind of jumps this will detect / not detect.

Adapting your unit test case, I loop over several signals, where the jump is shifting to the right. Note the threshold for detection is substantially higher than the true jump size -- 1.2.

for offset in range(20):
    a = np.zeros((1, 100), dtype="float32", order="C")
    a[0, 50+offset:] += 1
    b = np.zeros((1, 100), dtype="int32", order="C")
    min_size = np.ones(1, dtype="float32", order="C")*1.2
    so3g.matched_jumps(a, b, min_size, 10)
    flagged = np.where(b)[1]
    print(flagged)

What I get is:

[48 49 50]
[49 50]
[51]
[52]
[53 54]
[53 54 55]
[54 55]
[56]
[57]
[58 59]
[58 59 60]
[59 60]
[61]
[62]
[63 64]
[63 64 65]
[64 65]
[66]
[67]
[68 69]

So the sensitivity of the finder varies depending on where the jump is relative to the window edge. I think we can cook up versions of this that will have a more consistent kernel. It will probably involve a boxcar filter -- this is easy to implement and is the continuous limit of your block-moment based approach.

flagged,
np.array(
[
0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please collapse this to ~2 lines. It would make sense to have a line break between 7 and 50, and 59 and 95. It might be clearer to construct this as an hstack of aranges.

This is the reason some people don't like automatic formatters!

Comment on lines 881 to 896
" atol: how close to the multiple of scale a value needs to be to be a jump\n"
" should be an array (float32) with shape (ndet,)\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is atol in units of the signal, or a fraction of the "scale"?

}

template <typename T>
void scale_jumps(const bp::object & tod, const bp::object & out, const bp::object & atol, int win_size, T scale)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename to find_scale_jumps. Or find_quantized_jumps. "Scale" is a verb and a noun so the name is currently pretty confusing.

for(int di = 0; di < ndet; di++) {
int ioff = di*nsamp;
for(int si = 0; si < nsamp; si++) {
int i = ioff + si;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contents of this loop are not implemented correctly, to isolate detectors. For example, you set idx=0 and then refer to tod_data[idx]. That's always the first detector's first sample.

There are patterns to reduce the risk of out-of-bounds accesses. In this case, you should make pointers for just this detector's data and output (T* det_data = tod_data + ioff; T* det_out = output + ioff;) and then only use si and quantities based on si to access det_data and det_out, in the loop.

Better yet, use T* det_data = tod_buf->ptr_1d(di);, and similarly for det_out. Then you never need tod_data at all. And you can relax the requirements on strides[0], because ptr_1d has handled that for you!

Would I be asking you to do all this, if this inner loop were right already? No, probably not. Lesson learned I hope. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes I see I should have done idx = ioff below to do what I want, good catch.

Yeah I can change things to grab a 1d pointer to make that better.

" win_size: size of window to use as buffer when differencing\n"
" scale: the scale of jumps to look for\n");
bp::def("scale_jumps64", scale_jumps<double>,
"scale_jumps64(tod, out, bsize)\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signature doesn't have the right args.

int i = ioff + si;
buffer[i] = flag_data[i];
int comp = 0;
if(i>=width) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect -- see my long comment in "scale_jumps". Solution is the same.

int bsize = stop - start;
// Could replace the loops with boost accumulators?
T center = 0.0;
if(central == 1 | moment == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the types, this should be if (central || moment == 1).

@skhrg
Copy link
Member Author

skhrg commented Jun 26, 2024

On the sensitivity of the jumpfinder caring about the location relative to the edge of the window. This is known with calculalable effects. The shifted window scheme here was chosen in such a way that there is some garunteed minumum sensitivity. See https://simonsobs.atlassian.net/wiki/spaces/PRO/pages/389939204/Matched+Filter+Jump+Finding for an explanation (page is slightly out of date but the math for this is still accurate).

A boxcar would indeed give us maximum consistency and sensitivity since you will always have a case when the jump is in the center of the kernal which maximizes the sensitivity. But I dont think that is necessary since we pad anyways.

Also the jumps getting caught with min size greater than the jump size I beleive is becuase I correct for a samples worth of slop in min size. That can be removed, I dont recall the exact edge case I was worried about when I added that.

@skhrg
Copy link
Member Author

skhrg commented Jul 3, 2024

Ah actually, the correct thing to do is to determine the sensitivity as a function of distance from the window edge and compare to min_size*sensitivity . What I do now is use the sensitivity at the minimum gaurunteed distance from the edge given the phased block approach.

I'll work out the math but I think with that (and adding a cutoff where we just skip and don't bother checking since the other block will always do better) we should get a consistent and well behaved kernel.

While yes I know that taking the continuous limit solves this problem as well, one of the reasons for going with the phased blocks like I did here was to reduce the computational complexity since making it continuous will make it block_size/2 times slower.

@mhasself let me know if that sounds sane.

@@ -624,26 +624,26 @@ void block_moment(const bp::object & tod, const bp::object & out, int bsize, int

void _clean_flag(int* flag_data, int width, int ndet, int nsamp)
{
int* buffer = new int[ndet * nsamp];
int* buffer = new int[nsamp];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need an independent buffer per thread. This I will insist on.

src/array_ops.cxx Show resolved Hide resolved
@mhasself
Copy link
Member

While yes I know that taking the continuous limit solves this problem as well, one of the reasons for going with the phased blocks like I did here was to reduce the computational complexity since making it continuous will make it block_size/2 times slower.

I am skeptical that you'd suffer a performance cost of this magnitude. But I admit I haven't thought about it super carefully.

@mhasself let me know if that sounds sane.

Let's go with this for now, sure, but I will welcome something with less structured behavior in the future...

@skhrg skhrg requested a review from mhasself July 16, 2024 16:15
@skhrg
Copy link
Member Author

skhrg commented Jul 16, 2024

Behavior is more consistent now. If you turn off samp_correction then it gives you exactly what I calculate with pen and paper. Leaving that on for now but I could be convinced its unneeded paranoia.

Copy link
Member

@mhasself mhasself left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments. Then rebase to make it mergeable and prepare for a final review...

src/array_ops.cxx Show resolved Hide resolved
Comment on lines 688 to 690
if(si==12){
std::cout << thresh << "\t" << std::flush;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove debug

@skhrg
Copy link
Member Author

skhrg commented Jul 16, 2024

Sounds good. Any opposition to me squashing the whole PR into one commit?

@mhasself
Copy link
Member

Sounds good. Any opposition to me squashing the whole PR into one commit?

Rebase + squash into one commit would be perfect. Thanks!

* Matched filter jump finder
* Quantized size jump finder
* Compute n'th moment is blocks
* Clean flag by removing small segments
Copy link
Member

@mhasself mhasself left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mhasself mhasself merged commit 34781f0 into master Jul 18, 2024
4 checks passed
@mhasself mhasself deleted the block_moment branch July 18, 2024 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants