|
| 1 | +Contribution guidelines |
| 2 | +=================== |
| 3 | + |
| 4 | +This document contains guidelines for contributing to NlpTools. |
| 5 | + |
| 6 | +Coding style |
| 7 | +------------------ |
| 8 | + |
| 9 | +NlpTools adheres to the [psr-2][1] standard. It also follows the convention of |
| 10 | +appending the word *Interface* to any interface. |
| 11 | + |
| 12 | +To enforce the psr-2 style it is suggested to use the [php-cs-fixer][2] tool. |
| 13 | +While you 're at it why not enforce some more styles as well. The fixers used |
| 14 | +are the **default** (which are more than the psr-2 level uses) but they will be |
| 15 | +explicitly listed here in case they change in the future. |
| 16 | + |
| 17 | +* indentation |
| 18 | +* linefeed |
| 19 | +* trailing_spaces |
| 20 | +* unused_use |
| 21 | +* phpdoc_params |
| 22 | +* visibility |
| 23 | +* return |
| 24 | +* short_tag |
| 25 | +* braces |
| 26 | +* include |
| 27 | +* php_closing_tag |
| 28 | +* extra_empty_lines |
| 29 | +* psr0 |
| 30 | +* control_spaces |
| 31 | +* elseif |
| 32 | +* eof_ending |
| 33 | + |
| 34 | +The above fixers are the default. |
| 35 | + |
| 36 | +Commenting Style |
| 37 | +-------------------------- |
| 38 | + |
| 39 | +Every public method must have comments that follow the php doc convention. |
| 40 | +@param and @return annotations are mandatory. The comments should be |
| 41 | +explanatory not simply rewriting the method's name in a sentence. If the method |
| 42 | +is too simple or the name explains the actions sufficiently then just add the |
| 43 | +@param and @return annotations. |
| 44 | + |
| 45 | +Examples of bad commenting currently in the develop branch: |
| 46 | + |
| 47 | +``` php |
| 48 | +/** |
| 49 | + * Calls internal functions to handle data processing |
| 50 | + * @param type $string |
| 51 | + */ |
| 52 | +public function tokenize($str) |
| 53 | +{ |
| 54 | + ...... |
| 55 | +} |
| 56 | +``` |
| 57 | + |
| 58 | +It should be something along the lines of: |
| 59 | + |
| 60 | +``` php |
| 61 | +/** |
| 62 | + * Splits $str to smaller strings according to Penn Treebank tokenization rules. |
| 63 | + * |
| 64 | + * You can see the regexes in function initPatternAndReplacement() |
| 65 | + * @param string $str The string to be tokenized |
| 66 | + * @return array An array of smaller strings (the tokens) |
| 67 | + */ |
| 68 | +.... |
| 69 | +``` |
| 70 | + |
| 71 | +Equally necessary are class comments. The class comment should be explaining |
| 72 | +what the class does from a high point of view. Redirections to online resources |
| 73 | +like wikipedia are welcome. A good example that also contains a reference to an |
| 74 | +external resource is the following: |
| 75 | + |
| 76 | +``` php |
| 77 | +/** |
| 78 | + * Implement a gradient descent algorithm that maximizes the conditional |
| 79 | + * log likelihood of the training data. |
| 80 | + * |
| 81 | + * See page 24 - 28 of http://nlp.stanford.edu/pubs/maxent-tutorial-slides.pdf |
| 82 | + * @see NlpTools\Models\Maxent |
| 83 | + */ |
| 84 | +class MaxentGradientDescent extends GradientDescentOptimizer implements MaxentOptimizerInterface |
| 85 | +``` |
| 86 | + |
| 87 | +Pull Requests |
| 88 | +-------------------- |
| 89 | + |
| 90 | +### Find something to work on ### |
| 91 | + |
| 92 | +If it is your first contribution try to find something straightforward and |
| 93 | +concise to implement without many design decisions as much as development |
| 94 | +decisions. You could first submit an issue, if you like, and state your will to |
| 95 | +correct this issue yourself. |
| 96 | + |
| 97 | +### Branch off ### |
| 98 | + |
| 99 | +When you 've found something to develop, create a new branch off of the develop |
| 100 | +branch. Make your changes, add your tests (see below for testing) and then make |
| 101 | +a pull request. Always keep your develop branch in sync with the remote and |
| 102 | +before you create a pull request **rebase** your local branch to develop. |
| 103 | + |
| 104 | +If you rebased but there has been a change pushed since, you don't have to |
| 105 | +remove the pull request, rebase and recreate it. I will pull your changes |
| 106 | +rebase them, merge them and then close the pull request. This will have the |
| 107 | +effect of showing some merged pull requests as simply closed but it is worth |
| 108 | +keeping the commit history clean. |
| 109 | + |
| 110 | +So in two small sentences: Always create a new branch to develop on. Always |
| 111 | +rebase before making a pull request. |
| 112 | + |
| 113 | +### Tests ### |
| 114 | + |
| 115 | +If you are implementing a new feature always include tests in your pull request. |
| 116 | + |
| 117 | +Also contributing just tests is extremely welcome. |
| 118 | + |
| 119 | +Testing |
| 120 | +----------- |
| 121 | + |
| 122 | +A bit of information can be found in the tests folder in the README file. |
| 123 | + |
| 124 | +Tests should test the implementation thoroughly. You can test your |
| 125 | +implementation like a black box, based only on the outputs given some inputs, |
| 126 | +or you can test every small part for how it works. Either is acceptable. I will |
| 127 | +make my point clear with an example. |
| 128 | + |
| 129 | +The PorterStemmer implementation has 5 steps and some even have sub steps. One |
| 130 | +way to write the test would be to expose those steps (maybe by extending the |
| 131 | +PorterStemmer class) and write tests for each one. One other way would be to |
| 132 | +simply take a big list of English words and their stems according to the |
| 133 | +canonical implementation and check if your code produces the same results. |
| 134 | + |
| 135 | +While the second is a lot easier to implement, in case of failure, it gives |
| 136 | +very little information regarding the cause of the error. Both are acceptable |
| 137 | +(in the case of the example the second is implemented). |
| 138 | + |
| 139 | +[1]: https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-2-coding-style-guide.md |
| 140 | +[2]: http://cs.sensiolabs.org/ |
0 commit comments