Skip to content

Commit

Permalink
Merge pull request #25 from opendilab/doc/practice
Browse files Browse the repository at this point in the history
doc(hansbug): add practice pages
  • Loading branch information
HansBug authored Jan 1, 2022
2 parents 38871e4 + f021ad5 commit f5bd4fc
Show file tree
Hide file tree
Showing 6 changed files with 58 additions and 3 deletions.
2 changes: 0 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,6 @@ gen

### Images template
# JPEG
*.jpg
*.jpeg
*.jpe
*.jif
*.jfif
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
35 changes: 35 additions & 0 deletions docs/source/best_practice/sklearn/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Apply into Scikit-Learn
===========================

Actually, ``TreeValue`` can be used in practice with not only ``numpy`` or ``torch`` library, such as ``scikit-learn``.
In the following part, a demo of PCA to tree-structured arrays will be shown.

In the field of traditional machine learning, PCA (Principal Component Analysis) is often used to preprocess data,
by normalizing the data range, and trying to reduce the dimensionality of the data, so as to reduce the complexity
of the input data and improve machine learning's efficiency and quality. Just as the following image

.. figure:: heading_of_pca.jpg
:alt: PCA Principle

PCA in a nutshell. Source: Lavrenko and Sutton 2011, slide 13.

In the scikit-learn library, the PCA class is provided to support this function, and the function ``fit_transform``
can be used to simplify the data. For a set of ``np.array`` format data that presents a tree structure,
we can implement the operation support for the tree structure by quickly wrapping the function ``fit_transform``.
The specific code is as follows

.. literalinclude:: sklearn.demo.py
:language: python
:linenos:

The output should be

.. literalinclude:: sklearn.demo.py.txt
:language: text
:linenos:

For further information, see the links below:

* `Official documentation of PCA in scikit-learn <https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html?highlight=pca#sklearn.decomposition.PCA>`_
* `Details of PCA <https://devopedia.org/principal-component-analysis>`_

20 changes: 20 additions & 0 deletions docs/source/best_practice/sklearn/sklearn.demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import numpy as np
from sklearn.decomposition import PCA

from treevalue import FastTreeValue

fit_transform = FastTreeValue.func()(lambda x: PCA(min(*x.shape)).fit_transform(x))

if __name__ == '__main__':
data = FastTreeValue({
'a': np.random.randint(-5, 15, (4, 3)),
'x': {
'c': np.random.randint(-15, 5, (5, 4)),
}
})
print("Original int data:")
print(data)

pdata = fit_transform(data)
print("Fit transformed data:")
print(pdata)
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ structure processing when the calculation is tree-based.
:caption: Best Practice

best_practice/numpy/index
best_practice/sklearn/index

.. toctree::
:maxdepth: 2
Expand Down
3 changes: 2 additions & 1 deletion requirements-doc.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ packaging
sphinx-multiversion~=0.2.4
where~=1.0.2
numpy>=1.19,<2
easydict>=1.7,<2
easydict>=1.7,<2
scikit-learn>=0.24.2

0 comments on commit f5bd4fc

Please sign in to comment.