munge_sumstats issue: Usecols do not match names #446

ElizabethWallace97 · 2024-08-09T17:52:16Z

I am using the following code and ldsc via CELLECT (the files are the same as the ldsc files from this GitHub)
`conda config --append channels conda-forge
CONDA_SUBDIR=osx-64 conda create -n cellect python=2.7 -y
conda init
conda activate cellect
conda config --env --set subdir osx-64
pip install argparse
pip install bitarray==0.8
conda env create -f ${CELLECT_dir}/ldsc/environment.yml
conda init
conda activate ldsc

##Preprocessing file
#Make file with header
head -n 1 ${sumstats_dir}/clozuk_pgc2.meta.sumstats.txt > ${sumstats_dir}/tmp_clozuk_header
#Select all SNPs with rsnumber
grep 'rs' ${sumstats_dir}/clozuk_pgc2.meta.sumstats.txt > ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.txt
#Create extra column with only rs#
gawk -v n=1 '{s = gensub("^(([^:]:){"n"}).$", "\1", 1); sub(".$","",s); print $0, s}' ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.txt > ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.rs_col.txt
sed -i 's/\r//' ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.rs_col.txt
sed -i 's/\r//' ${sumstats_dir}/tmp_clozuk_header
#Create new header with SNP_RS columnname
sed 's/$/\tSNP_RS/' ${sumstats_dir}/tmp_clozuk_header > ${sumstats_dir}/tmp_clozuk_header_2
#Merge new header with file containing rs# column
cat ${sumstats_dir}/tmp_clozuk_header_2 ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.rs_col.txt > ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.rs_col.header.txt
#Munge SCZ
python ${CELLECT_dir}/ldsc/munge_sumstats.py --sumstats ${sumstats_dir}/clozuk_pgc2.meta.sumstats.only_rs.rs_col.header.txt --out ${munged_dir}/SCZ --merge-alleles ${ldsc_dir}/w_hm3.snplist --N 105317 --snp SNP_RS --ignore SNP --p P --frq Freq.A1`

The data processing is from this source: https://github.com/mitchellolislagers/cell_type_enrichment_pipeline

Every time I run the munge_sumstats.py on this I get the 'ValueError: Usecols do not match names.' error as detailed in the attached log:
SCZ.log

aksarkar · 2024-08-09T18:57:25Z

@ElizabethWallace97 The ldsc packaged with CELLECT is forked from here.

You need to ask at https://github.com/pascaltimshel/ldsc/issues, since it involves code introduced in that fork.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

munge_sumstats issue: Usecols do not match names #446

munge_sumstats issue: Usecols do not match names #446

ElizabethWallace97 commented Aug 9, 2024

aksarkar commented Aug 9, 2024

munge_sumstats issue: Usecols do not match names #446

munge_sumstats issue: Usecols do not match names #446

Comments

ElizabethWallace97 commented Aug 9, 2024

aksarkar commented Aug 9, 2024