-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trouble running SCAT with huge dataset #12
Comments
SCAT was written for a small number of microsatellites. I would be Matthew On Sun, Jun 26, 2016 at 10:00 PM, lilnet notifications@github.com wrote:
|
I am unsure if the following message that was given is an indication of the dataset being too large for the program, or if it points towards another problem that I am unaware of. I am hoping if you'd be able to point me in the right direction
|
it's definitely too large. You can recompile the program with MAXLOCI increased, which should let it run (maybe) |
I know this is an older thread, but I was hoping to get a little clarification on why you think the results will be unreliable. Is this related to all the LD that certainly exists in such a large data set and the confidence intervals will be biased as a result of it? I edited the MAXLOCI parameter and got it to run on a SNP set with ~7000 loci. The average assignment seemed reasonable. |
well, perhaps I should just say I am being cautious because I have no The main thing I would check is that results from multiple starting points Matthew On Thu, Sep 29, 2016 at 5:38 PM, ddrinan notifications@github.com wrote:
|
Thanks, we'll definitely check that different seeds result in similar answers. Dan |
Well, the plot thickens. When I don't provide values for alpha and beta, the program runs, but when I provide alpha and beta values (estimated the same way as Wasser et al.), I get a segmentation fault. Any insight into what could be driving this pattern? Also, do you think a reasonable approach would be to simply run the program with default settings multiple times and if I get the same answer, be o.k. with it? In this data set, we have general ideas of where the samples should be assigned, so we would know if it is way off. Dan |
what are the estimates used for alpha,beta? On Fri, Oct 7, 2016 at 4:37 PM, ddrinan notifications@github.com wrote:
|
I ran the data with Nburn=100, Nthin = 1000, and Niter=100 and got alpha and beta values of As a side note. When I ran each data set 3 times (default settings and no fixed alpha or beta values) and compared the assigned locations, the median distance between assignments is locations is about 50 km over an area that ranges from Washington state to western Alaska (>2000 km of distance). |
I am having trouble executing SCAT with my SNP dataset consisting of more than ~18k loci for both assigning individuals and estimating allele frequencies. My file format is formatted as specified. Is there any limit to the number of loci that can be used when estimating allele frequencies and/or assigning individuals? SCAT is able to when executed with the test dataset that was included with the software.
SCAT was executed with the following:
Header of genotype file:
The text was updated successfully, but these errors were encountered: