Skip to content

Error when running angle_data_preparation_py.ipynb #46

@dsikar

Description

@dsikar

I am running the README.md steps on the Intel DevCloud. I generated full_under_200.txt both with the julia script get_proteins_under_200aa.jl and julia_get_proteins_under_200aa.ipynb for good measure. A diff says files are different but they look the same (tab separated values).
In the DevCloud environment, when I run angle_data_preparation_py.ipynb, I get an error when extracting data from text:

# Scan first n proteins
names = []
seqs = []
psis = []
phis = []
pssms = []
(...)

ValueError: could not convert string to float: '0.0\ (...)

Which can be suppresed by changing function parse_lines(raw) to:

# Helper functions to extract numeric data from text
def parse_lines(raw):
    # added tab \t to suppress previous error
    return np.array([[float(x) for x in line.split("\t") if x != ""] for line in raw])
(...)

That gets passed the first error, but then throws another one further down:

(...)
---> 10             outputs.append([phis[i][j], psis[i][j]])
     11             # break
     12         # print(i, "Added: ", len(seqs[i])-34,"total for now:  ", long)

IndexError: list index out of range

Which I suspect has someting to do with one of the previous outputs, and the features' "n. prots" not being the same:

# Ensure all features have same n. prots

print("Names: ", len(names))
print("Seqs: ", len(seqs))
print("PSSMs: ", len(pssms))
print("Phis: ", len(phis))
print("Psis: ", len(psis))

Names:  601
Seqs:  600
PSSMs:  600
Phis:  0
Psis:  0

Any suggestions on what could be wrong in parsing the full_under_200.txt file?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions