pdb2fasta

Convert PDB and mmCIF structure files to FASTA format.

Features

Parse PDB format files
Parse mmCIF format files
Auto-detect file format
Configurable output options
C++ library with Python bindings
Command-line interface

Installation

Python Package

pip install pdb2fasta

From Source

Requirements:

CMake >= 3.15
C++17 compiler
Python >= 3.10
pybind11

Build and install:

# Install in development mode (recommended for testing)
pip install -e .

# Or install normally
pip install .

# With test dependencies
pip install -e ".[test]"

Build the C++ extension:

The Python package uses scikit-build-core to automatically build the C++ extension during installation. However, if you need to rebuild:

# Clean and rebuild
pip install --no-build-isolation --force-reinstall -e .

C++ Library Only

mkdir build && cd build
cmake .. -DBUILD_CLI=ON -DBUILD_PYTHON=OFF
make
make install

Usage

Python

import pdb2fasta

# Convert a file
fasta = pdb2fasta.convert("structure.pdb")
print(fasta)

# Convert from string
pdb_content = open("structure.pdb").read()
fasta = pdb2fasta.pdb_to_fasta(pdb_content)

# Convert mmCIF
cif_content = open("structure.cif").read()
fasta = pdb2fasta.mmcif_to_fasta(cif_content)

# With options
fasta = pdb2fasta.pdb_to_fasta(
    pdb_content,
    line_width=60,
    include_chain_id=True
)

# Using the Converter class
options = pdb2fasta.ConversionOptions()
options.line_width = 80
converter = pdb2fasta.Converter(options)
fasta = converter.convert_file("structure.pdb")

# Parse and inspect structure
parser = pdb2fasta.PDBParser()
structure = parser.parse(pdb_content)
for chain in structure.chains:
    print(f"Chain {chain.id}: {len(chain.residues)} residues")

Command Line

# Basic usage
pdb2fasta-cli structure.pdb

# Multiple files
pdb2fasta-cli *.pdb *.cif

# With options
pdb2fasta-cli -w 60 -f mmcif structure.cif

# Options:
#   -h, --help          Show help message
#   -f, --format <fmt>  Force input format (pdb, mmcif, auto)
#   -w, --width <n>     Line width for FASTA output (default: 80)
#   -n, --no-chain      Don't include chain ID in header

C++

#include <pdb2fasta/pdb2fasta.hpp>
#include <iostream>

int main() {
    // Simple conversion
    std::string fasta = pdb2fasta::convert("structure.pdb");
    std::cout << fasta;
    
    // With options
    pdb2fasta::ConversionOptions options;
    options.line_width = 60;
    
    pdb2fasta::Converter converter(options);
    fasta = converter.convert_file("structure.cif");
    
    return 0;
}

Development

Running Tests

First, build and install the package:

pip install -e ".[test]"

Then run tests:

pytest
# or
uv run pytest

Troubleshooting

If you get ModuleNotFoundError: No module named '_pdb2fasta':

Make sure you've installed the package: pip install -e .
Check that the build completed successfully
Verify CMake and a C++ compiler are available
Try a clean rebuild: pip install --no-build-isolation --force-reinstall -e .

Supported Formats

Input

PDB (.pdb, .ent)
mmCIF (.cif, .mmcif)

Output

FASTA format

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.vscode		.vscode
include/pdb2fasta		include/pdb2fasta
src		src
tests		tests
.clang-format		.clang-format
.cmake-format		.cmake-format
.gitattributes		.gitattributes
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pdb2fasta

Features

Installation

Python Package

From Source

C++ Library Only

Usage

Python

Command Line

C++

Development

Running Tests

Troubleshooting

Supported Formats

Input

Output

License

About

Uh oh!

Releases

Uh oh!

Languages

License

exTerEX/pdb2fasta

Folders and files

Latest commit

History

Repository files navigation

pdb2fasta

Features

Installation

Python Package

From Source

C++ Library Only

Usage

Python

Command Line

C++

Development

Running Tests

Troubleshooting

Supported Formats

Input

Output

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Languages