-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepsubband writes less samples than original file has. #191
Comments
Yes, this is expected behavior. And the quick reason why is that we are missing some data to be able to output the full-duration of the data because you are de-dispersing. Since you are dedispering to a positive DM, you are assuming that the low-frequency data is arriving later than the high-frequency data. That's fine at the beginning of the observation where we can simply grab later samples at lower freqs to add them to the initial sample at the highest freq. But that doesn't work at the end of the observation where we have no low freq data recorded! So since we don't know what to add in order to properly dedisperse, we simply ignore parts of the original file where we don't have all frequencies we need. This problem gets worse and worse (meaning you lose a bigger and bigger fraction of your input data) when 1) the observation duration is relatively short and 2) when the total amount dispersion (in time) is long between the top and bottom of the observing band. In your case, for a DM of 300, the differential delay across the band (1200-1800MHz) is 0.48 seconds, and so you will always lose at least that amount of data. However, there is another issue in that prepsubband (and prepdata) work on blocks of data of (by default) 2400 samples (see the Spectra/Subint number in the output) for filterbank data. So you will often get an integer number of samples of that duration out. And finally, both of those commands automatically choose a highly factorable number of data points to output so that FFTs on that data will be efficient. For your initial case you are seeing all of these effects. You can mitigate some of this (but you can't get back data you don't have because of dispersion) by specifying the number of output data points to use with the Hope this helps! |
Thank you very much for your response. I understood your explanation and it makes sense. I tried your suggestion of looking into the .dat file but I don't see anything weird. Is there a way to predict how much data I will lose? Because like you said, with the DM and frequency range I can obtain the least amount of data that I will lose, in the example you mentioned it was ~0.5 seconds but I am losing around a minute of data. I assume this is due to the integer number of blocks that you mentioned. When executing |
Have you tried specifying |
One other request for you to try: can you try using |
Hi, I am using PRESTO to process some data and look for single pulses. The files have the following specifications:
Number of frequency channels: 2048
time resolution: 0.01 seconds
lenght: 298.68 seconds
bandwidth: 1200-1800 MHz
Number of samples: 29868
I realized that when using prepsubband
prepsubband -nobary -lodm 300 -dmstep 10 -numdms 20 -nsub 2048 -runavg -o prep_output 2023-10-07_18_16_51_752602noise.fil
the number of samples written is less than the original number of samples of the fileSo when I use single_pulse_search.py to search for transients it doesn't find anything after the sample 24000. I lose ~6000 samples which correspond to one minute of my five minute files.
I have processed data with a higher time resolution and I noticed this happens as well but due to the higher resolution the time lost is not as much as in my lower resolution case. For example for data with the following specifications:
Number of frequency channels: 512
time resolution: 54.613 microseconds
length: 3599.99913984 seconds
bandwidth: 1210-1510 MHz
Number of samples: 65917953
I lose approximately 25000 samples which correspond to 1.5 seconds of an observation of one hour.
A "solution" to this would be to make the files larger so I dont lose a significant amount of data but still a full minute is a lot of time lost.
I have seen something similar in the PRESTO_search_tutorial.pdf where the samples written is less than the original ones.
This is taken from slide 11 and as you can see the total number of points is 531000 and the data written points is 530000
Is this an expected behaviour? I am not sure why this is happening.
Is there a way to make it process the full file with its current resolution?
Any indication is much aprecciated. :)
The text was updated successfully, but these errors were encountered: