Skip to content

Conversation

@cbl
Copy link

@cbl cbl commented May 13, 2025

This pull request adds a from_path method to InitialEstimates to reduce complexity of loading the required files and doing necessary checks.

Before

data_df = pd.read_csv("hodgkins_disease.csv")
surv_0_df = pd.read_csv("initial_estimates/surv_0.csv")
surv_1_df = pd.read_csv("initial_estimates/surv_1.csv")
cens_surv_0_df = pd.read_csv("initial_estimates/cens_surv_0.csv")
cens_surv_1_df = pd.read_csv("initial_estimates/cens_surv_1.csv")
haz1_0_df = pd.read_csv("initial_estimates/haz1_0.csv")
haz1_1_df = pd.read_csv("initial_estimates/haz1_1.csv")
haz2_0_df = pd.read_csv("initial_estimates/haz2_0.csv")
haz2_1_df = pd.read_csv("initial_estimates/haz2_1.csv")
prop_0_df = pd.read_csv("initial_estimates/prop_0.csv", header=None)
prop_1_df = pd.read_csv("initial_estimates/prop_1.csv", header=None)

# List of dataframes to check
dataframes = [surv_0_df, surv_1_df, cens_surv_0_df, cens_surv_1_df, haz1_0_df, haz1_1_df, haz2_0_df, haz2_1_df]

# Check if all dataframes have the same columns
all_col_equal = all(df.index.equals(dataframes[0].index) for df in dataframes)

dataframes += [data_df, prop_0_df, prop_1_df]
# Check if all dataframes have the same index
all_indices_equal = all(df.index.equals(dataframes[0].index) for df in dataframes)


print("All dataframes have the same index and columns:", all_indices_equal and all_col_equal)

initial_estimates = {
    0: InitialEstimates(times = surv_0_df.columns.astype(float),
                        g_star_obs= 1 - data_df["chemo"].values,
                        propensity_scores=prop_0_df.values.squeeze(),
                        hazards=np.stack([haz1_0_df.values, haz2_0_df.values], axis=-1),
                        event_free_survival_function=surv_0_df.values,
                        censoring_survival_function=cens_surv_0_df.values),
    1: InitialEstimates(times = surv_1_df.columns.astype(float),
                        g_star_obs= data_df["chemo"].values,
                        propensity_scores=prop_1_df.values.squeeze(),
                        hazards=np.stack([haz1_1_df.values, haz2_1_df.values], axis=-1),
                        event_free_survival_function=surv_1_df.values,
                        censoring_survival_function=cens_surv_1_df.values),
}

After

data_df = pd.read_csv("hodgkins_disease.csv")
initial_estimates = InitialEstimates.from_path(data_df, 'initial_estimates')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant