Expand supported file formats

A parser for different file formats is needed, especially for processing csv files that contain several trajectories.

| file format | description | priority
|---|---|---|
| .xyz | file with several atoms and time points | 1 |
| .csv | csv file with several trajectories | 2|
|LAMMPS| Molecular dynamics |2|
| .pdb | protein data bank |   3|

## .xyz file

```
Nparticles [integer]
comment [character]
X Y Z [repeat Nparticles]
[repeat Nframes] 
```


## CSV with several trajectories - format definition

The csv should contain 5 columns: time `t`, 3 spatial (`x`, `y`, `z`) components and the trajectory identifier `id`.

## LAMMPS data file format
[Large-scale Atomic/Molecular Massively Parallel Simulator](https://www.lammps.org/) is a molecular dynamics program from Sandia National Laboratories.

More details about the file format: https://docs.lammps.org/read_data.html

The LAMMPS data dump file format is written in yaml with the following structure:

```
---
creator: LAMMPS
timestep: 0
units: lj
time: 0
natoms: 3
boundary: [ p, p, p, p, p, p, ]
thermo:
  - keywords: [ Step, Temp, E_pair, E_mol, TotEng, Press, ]
  - data: [ 0, 0, -27093.472213010766, 0, 0, 0, ]
box:
  - [ 0, 16.795961913825074 ]
  - [ 0, 16.795961913825074 ]
  - [ 0, 16.795961913825074 ]
  - [ 0, 0, 0 ]
keywords: [ id, type, x, y, z, vx, vy, vz, ix, iy, iz,  ]
data:
  - [     1 , 1 ,  0.000000e+00 ,  0.000000e+00 ,  0.000000e+00 ,  -1.841579e-01 , -9.710036e-01 , -2.934617e+00 , 0 , 0 , 0, ]
  - [     2 , 1 ,  8.397981e-01 ,  8.397981e-01 ,  0.000000e+00 ,  -1.799591e+00 ,  2.127197e+00 ,  2.298572e+00 , 0 , 0 , 0, ]
  - [     3 , 1 ,  8.397981e-01 ,  0.000000e+00 ,  8.397981e-01 ,  -1.807682e+00 , -9.585130e-01 ,  1.605884e+00 , 0 , 0 , 0, ]
---
timestep: 100
...
---
```

A parser for this file format is straightforward with `yaml.load_all()` function.

## Protein Data Bank (PDB) format

Standard file format for protein structures containing several atoms each file at different time steps. Each pdb file can contain a screenshot of the system or several trajectories, so we need to process several pdb files at once to extract trajectories.

A possible workflow would be: 

1.  Read each pdb file and extract the trajectories per atom
2. Write a CSV file using the format (y, x, y, z, id), where `id` is the atom identifier. 
3. Use the CSV file to compute the features using trajpy

More information about pdb file format: https://en.wikipedia.org/wiki/Protein_Data_Bank_(file_format)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand supported file formats #109

.xyz file

CSV with several trajectories - format definition

LAMMPS data file format

Protein Data Bank (PDB) format

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

file format	description	priority
.xyz	file with several atoms and time points	1
.csv	csv file with several trajectories	2
LAMMPS	Molecular dynamics	2
.pdb	protein data bank	3

Expand supported file formats #109

Description

.xyz file

CSV with several trajectories - format definition

LAMMPS data file format

Protein Data Bank (PDB) format

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions