Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileNotFoundError: [Errno 2] No such file or directory: './Data/ReichLabEigenstrat/Raw/meta.csv' #12

Open
mariamnawaz1 opened this issue Jul 27, 2023 · 1 comment

Comments

@mariamnawaz1
Copy link

Hi,

I am combing results from hapsb_ind() output using the pp_individual_roh() and I don't want to add meta information. But pp_individual_roh() throws me an error even if I do meta_info=False. It seems to me it is because it is reading meta_path before checking meta_info which happens only during the merging step. Am I missing something?

Function:

def pp_individual_roh(iids, meta_path="./Data/ReichLabEigenstrat/Raw/meta.csv", 
                      base_folder="./Empirical/Eigenstrat/Reichall/", 
                      suffix='_roh_full.csv', save_path="", min_cm=[4,8,12], snp_cm=50, 
                      gap=0.5, min_len1=2.0, min_len2=4.0,
                      output=True, meta_info=True):
    """Post-process Individual ROH .csv files. Combines them into one summary ROH.csv, saved in save_path.
    Use Individuals iids, create paths and run the combining.
    iids: List of target Individuals
    base_folder: Folder where to find individual results .csvs
    min_cm: Minimum post-processed Length of ROH blocks. Array (to have multiple possible values)
    snp_cm: Minimum Number of SNPs per cM
    gap: Maximum length of gaps to merge
    output: Whether to plot output per Individual.
    meta_info: Whether to merge in Meta-Info from the original Meta File
    save_path: If given, save resulting dataframe there
    min_len1: Minimum Length of shorter block to merge [cM]
    min_len2: Maximum Length of longer block to merge [cM]"""
    
    ### Look up Individuals in meta_df and extract relevant sub-table
    df_full = pd.read_csv(meta_path)
    df_meta = df_full[df_full["iid"].isin(iids)]  # Extract only relevant Indivdiuals
    
    print(f"Loaded {len(df_meta)} / {len(df_full)} Individuals from Meta")
    
    paths = give_iid_paths(df_meta["iid"], base_folder=base_folder, suffix=suffix)
    df1 = create_combined_ROH_df(paths, df_meta["iid"].values, df_meta['clst'].values, 
                                 min_cm=min_cm, snp_cm=snp_cm, gap=gap, 
                                 min_len1=min_len1, min_len2=min_len2, output=output)
    
    ### Merge results with Meta-Dataframe
    if meta_info:
        df1 = pd.merge(df1, df_meta, on="iid")
        
    if len(save_path) > 0:
        df1.to_csv(save_path, sep="\t", index=False)
        print(f"Saved to: {save_path}")
    
    return df1
@hringbauer
Copy link
Owner

Ah yes, this function for post-processing is overly complicated as it strictly requires a meta table path.

"meta_info" just indicates whether the info from the meta file should be part of the output table - but it does not turn off the requirement to provide such a meta table.

As a quick fix, you can simply create a table with "iid" and "clst" columns - I know that this is not ideal.

I will try to update this function to make the meta table file fully optional. Stay tuned for the next release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants