Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

munge_sumstats issue: Usecols do not match names #446

Open
ElizabethWallace97 opened this issue Aug 9, 2024 · 1 comment
Open

munge_sumstats issue: Usecols do not match names #446

ElizabethWallace97 opened this issue Aug 9, 2024 · 1 comment

Comments

@ElizabethWallace97
Copy link

I am using the following code and ldsc via CELLECT (the files are the same as the ldsc files from this GitHub)
`conda config --append channels conda-forge
CONDA_SUBDIR=osx-64 conda create -n cellect python=2.7 -y
conda init
conda activate cellect
conda config --env --set subdir osx-64
pip install argparse
pip install bitarray==0.8
conda env create -f ${CELLECT_dir}/ldsc/environment.yml
conda init
conda activate ldsc

##Preprocessing file
#Make file with header
head -n 1 ${sumstats_dir}/clozuk_pgc2.meta.sumstats.txt > ${sumstats_dir}/tmp_clozuk_header
#Select all SNPs with rsnumber
grep 'rs' ${sumstats_dir}/clozuk_pgc2.meta.sumstats.txt > ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.txt
#Create extra column with only rs#
gawk -v n=1 '{s = gensub("^(([^:]:){"n"}).$", "\1", 1); sub(".$","",s); print $0, s}' ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.txt > ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.rs_col.txt
sed -i 's/\r//' ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.rs_col.txt
sed -i 's/\r//' ${sumstats_dir}/tmp_clozuk_header
#Create new header with SNP_RS columnname
sed 's/$/\tSNP_RS/' ${sumstats_dir}/tmp_clozuk_header > ${sumstats_dir}/tmp_clozuk_header_2
#Merge new header with file containing rs# column
cat ${sumstats_dir}/tmp_clozuk_header_2 ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.rs_col.txt > ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.rs_col.header.txt
#Munge SCZ
python ${CELLECT_dir}/ldsc/munge_sumstats.py --sumstats ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.rs_col.header.txt --out ${munged_dir}/SCZ --merge-alleles ${ldsc_dir}/w_hm3.snplist --N 105317 --snp SNP_RS --ignore SNP --p P --frq Freq.A1`

The data processing is from this source: https://github.com/mitchellolislagers/cell_type_enrichment_pipeline

Every time I run the munge_sumstats.py on this I get the 'ValueError: Usecols do not match names.' error as detailed in the attached log:
SCZ.log

@aksarkar
Copy link

aksarkar commented Aug 9, 2024

@ElizabethWallace97 The ldsc packaged with CELLECT is forked from here.

You need to ask at https://github.com/pascaltimshel/ldsc/issues, since it involves code introduced in that fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants