This repository contains various utils to parse GitHub repositories into function definition and docstring pairs. It is based on tree-sitter to parse code into ASTs and apply heuristics to parse metadata in more details. Currently, it supports 6 languages: Python, Java, Go, Php, Ruby, and Javascript.
It also parses function calls and links them with their definitions for Python.
Input library keras-team/keras
is parsed into list of functions including various metadata (e.g. identifier, docstring, sha, url, etc.). Below is an example output of Activation
function from keras
library.
{
'nwo': 'keras-team/keras',
'sha': '0fc33feb5f4efe3bb823c57a8390f52932a966ab',
'path': 'keras/layers/core.py',
'language': 'python',
'identifier': 'Activation.__init__',
'parameters': '(self, activation, **kwargs)',
'argument_list': '',
'return_statement': '',
'docstring': '',
'function': 'def __init__(self, activation, **kwargs):\n super(Activation, self).__init__(**kwargs)\n self.supports_masking = True\n self.activation = activations.get(activation)',
'url': 'https://github.com/keras-team/keras/blob/0fc33feb5f4efe3bb823c57a8390f52932a966ab/keras/layers/core.py#L294-L297'
}
One example of Activation
in the call sites of eriklindernoren/Keras-GAN
repository is shown below:
{
'nwo': 'eriklindernoren/Keras-GAN',
'sha': '44d3320e84ca00071de8a5c0fb4566d10486bb1d',
'path': 'dcgan/dcgan.py',
'language': 'python',
'identifier': 'Activation',
'argument_list': '("relu")',
'url': 'https://github.com/eriklindernoren/Keras-GAN/blob/44d3320e84ca00071de8a5c0fb4566d10486bb1d/dcgan/dcgan.py#L61-L61'
}
With an edge linking the two urls
(
'https://github.com/eriklindernoren/Keras-GAN/blob/44d3320e84ca00071de8a5c0fb4566d10486bb1d/dcgan/dcgan.py#L61-L61',
'https://github.com/keras-team/keras/blob/0fc33feb5f4efe3bb823c57a8390f52932a966ab/keras/layers/core.py#L294-L297'
)
A demo notebook is also provided for exploration.
script/bootstrap
to build docker containerscript/server
to run the jupyter notebook server and navigate tofunction_parser/demo.ipynb
script/bootstrap
to build docker containerscript/setup
to download libraries.io datascript/console
to ssh into the container- Inside the container, run
python function_parser/process.py --language python --processes 16 '/src/function-parser/data/libraries-1.4.0-2018-12-22/' '/src/function-parser/data/'