-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
I am running the README.md steps on the Intel DevCloud. I generated full_under_200.txt both with the julia script get_proteins_under_200aa.jl and julia_get_proteins_under_200aa.ipynb for good measure. A diff says files are different but they look the same (tab separated values).
In the DevCloud environment, when I run angle_data_preparation_py.ipynb, I get an error when extracting data from text:
# Scan first n proteins
names = []
seqs = []
psis = []
phis = []
pssms = []
(...)
ValueError: could not convert string to float: '0.0\ (...)
Which can be suppresed by changing function parse_lines(raw) to:
# Helper functions to extract numeric data from text
def parse_lines(raw):
# added tab \t to suppress previous error
return np.array([[float(x) for x in line.split("\t") if x != ""] for line in raw])
(...)
That gets passed the first error, but then throws another one further down:
(...)
---> 10 outputs.append([phis[i][j], psis[i][j]])
11 # break
12 # print(i, "Added: ", len(seqs[i])-34,"total for now: ", long)
IndexError: list index out of range
Which I suspect has someting to do with one of the previous outputs, and the features' "n. prots" not being the same:
# Ensure all features have same n. prots
print("Names: ", len(names))
print("Seqs: ", len(seqs))
print("PSSMs: ", len(pssms))
print("Phis: ", len(phis))
print("Psis: ", len(psis))
Names: 601
Seqs: 600
PSSMs: 600
Phis: 0
Psis: 0
Any suggestions on what could be wrong in parsing the full_under_200.txt file?
Metadata
Metadata
Assignees
Labels
No labels