forked from nebuly-ai/optimate
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add forward forward app (nebuly-ai#126)
* add forward forward app * implement comments * import nebullvm as a dependency in apps * add readme * Add figs * add architecture description * rename title * Add links and change image format * Updated readme and added image * rename matrixmaster and modify readme Co-authored-by: diegofiori <d.fiori@nebuly.ai> Co-authored-by: Nebuly <83510798+nebuly-ai@users.noreply.github.com>
- Loading branch information
1 parent
8bbe607
commit 5fb48f6
Showing
20 changed files
with
1,685 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
# Forward-Forward Algorithm App | ||
|
||
This app implements a complete open-source version of [Geoffrey Hinton's Forward Forward](https://www.cs.toronto.edu/~hinton/FFA13.pdf) Algorithm, an alternative approach to backpropagation. | ||
|
||
The Forward Forward algorithm is a method for training deep neural networks that replaces the backpropagation forward and backward passes with two forward passes, one with positive (i.e., real) data and the other with negative data that could be generated by the network itself. | ||
|
||
Unlike the backpropagation approach, forward forward does not require calculating the gradient of the loss function with respect to the network parameters. Instead, each optimization step can be performed locally and the weights of each layer can be updated immediately after the layer has performed its forward pass. | ||
|
||
If you appreciate the project, show it by [leaving a star ⭐](https://github.com/nebuly-ai/nebullvm/stargazers) | ||
|
||
<img width="1012" alt="Screenshot 2022-12-20 at 14 45 22" src="https://user-images.githubusercontent.com/83510798/208681462-2d8fc8f8-b24e-41a3-978a-72101f7f6392.png"> | ||
|
||
## Installation | ||
|
||
The forward-forward app is built on top of nebullvm, a framework for efficiency-based apps. The app can be easily installed from source code. First you have to clone the repository and navigate to the app directory: | ||
|
||
```bash | ||
git clone https://github.com/nebuly-ai/nebullvm.git | ||
cd nebullvm/apps/accelerate/forward_forward | ||
``` | ||
|
||
Then install the app: | ||
|
||
```bash | ||
pip install . | ||
``` | ||
This process will just install the minimum requirements for running the app. If you want to run the app on a GPU you have to install the CUDA version of PyTorch. You can find the instructions on the official PyTorch website. | ||
|
||
## Usage | ||
At the current stage, this implementation supports the main architectures discussed by Hinton in his paper. Each architecture can be trained with the following command: | ||
|
||
```python | ||
from forward_forward import train_with_forward_forward_algorithm | ||
|
||
|
||
trained_model = train_with_forward_forward_algorithm( | ||
model_type="progressive", | ||
n_layers=3, | ||
hidden_size=2000, | ||
lr=0.03, | ||
device="cuda", | ||
epochs=100, | ||
batch_size=5000, | ||
theta=2., | ||
) | ||
``` | ||
|
||
Three architectures are currently supported: | ||
* `progressive`: the most simple architecture described in the paper. It has a pipeline-like structure and each layer can be trained independently from the following ones. Our implementation differs respect the original one since the labels are injected in the image concatenating them to the flattened tensor instead of replacing the first n_classes pixels value with a one-hot-representation of the label. | ||
|
||
* `recurrent`: the recurrent architecture described in the paper. It has a recurrent-like structure and its based on the `GLOM` architecture proposed by Hinton. | ||
|
||
* `nlp`: A simple network which can be used as a language model. | ||
|
||
The recurrent and nlp network architectures are better explained below. | ||
|
||
## Recurrent Architecture | ||
The recurrent architecture is based in the `GLOM` architecture for videos, proposed by Hinton in the paper [How to represent part-whole hierarchies in a neural network](https://arxiv.org/pdf/2102.12627.pdf). Its application to the forward-forward algorithm aims at enabling each layer to learn not just from the previous layer output, but from the following layers as well. This is done by concatenating the outputs of the previous layer and following layers computed at the previous time-step. A learned representation of the label (positive or negative) it is given as input to the last layer. The following figure shows the structure of the network: | ||
|
||
<p align="center"> | ||
<img width="500" alt="recurrent_net" src="https://user-images.githubusercontent.com/38586138/208651417-498c4bd4-f2dc-4613-a376-0b69317c73d4.png"> | ||
</p> | ||
|
||
## NLP Architecture | ||
The forward-forward architecture developed for NLP is a simple network which can be used as a language model. The network is composed by few normalized fully connected layers followed by a ReLU activation. All hidden representations are then concatenated together and given as input to the softmax for predicting the next token. The network can be trained in a progressive way, i.e. each layer can be sequentially trained separately from the following ones. The following figure shows the structure of the network: | ||
|
||
<p align="center"> | ||
<img width="500" class="center" alt="nlp_net" src="https://user-images.githubusercontent.com/38586138/208651624-c159b230-f903-4e13-aaa7-b39a0d1c52fc.png"> | ||
</p> | ||
|
||
## What is missing | ||
This app implements the main architectures exposed by hinton in its paper. However, there are still some features that are not implemented yet. In particular, the following features are missing: | ||
|
||
* [ ] Implementation of unsupervised training. | ||
* [ ] Implementation of the `progressive` architecture using local receptive fields instead of fully connected layers. | ||
* [ ] Training on CIFAR-10 for CV-based architectures. | ||
|
||
And don't forget to [leave a star ⭐](https://github.com/nebuly-ai/nebullvm/stargazers) if you appreciate the project! | ||
If you have any questions about the implementation, [open an issue](https://github.com/nebuly-ai/nebullvm/issues) or contact us in the [community chat](https://discord.gg/RbeQMu886J). | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
from forward_forward.api.functions import ( # noqa F401 | ||
train_with_forward_forward_algorithm, | ||
) |
File renamed without changes.
52 changes: 52 additions & 0 deletions
52
apps/accelerate/forward_forward/forward_forward/api/functions.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
from torchvision import datasets | ||
|
||
from forward_forward.root_op import ( | ||
ForwardForwardRootOp, | ||
ForwardForwardModelType, | ||
) | ||
|
||
|
||
def train_with_forward_forward_algorithm( | ||
n_layers: int = 2, | ||
model_type: str = "progressive", | ||
device: str = "cpu", | ||
hidden_size: int = 2000, | ||
lr: float = 0.03, | ||
epochs: int = 100, | ||
batch_size: int = 5000, | ||
theta: float = 2.0, | ||
shuffle: bool = True, | ||
**kwargs, | ||
): | ||
model_type = ForwardForwardModelType(model_type) | ||
root_op = ForwardForwardRootOp(model_type) | ||
|
||
output_size = None | ||
if model_type is ForwardForwardModelType.PROGRESSIVE: | ||
input_size = 28 * 28 + len(datasets.MNIST.classes) | ||
elif model_type is ForwardForwardModelType.RECURRENT: | ||
input_size = 28 * 28 | ||
output_size = len(datasets.MNIST.classes) | ||
else: # model_type is ForwardForwardModelType.NLP | ||
input_size = 10 # number of characters | ||
output_size = 30 # length of vocabulary | ||
assert ( | ||
kwargs.get("predicted_tokens") is not None | ||
), "predicted_tokens must be specified for NLP model" | ||
|
||
root_op.execute( | ||
input_size=input_size, | ||
n_layers=n_layers, | ||
hidden_size=hidden_size, | ||
optimizer_name="Adam", | ||
optimizer_params={"lr": lr}, | ||
loss_fn_name="alternative_loss_fn", | ||
batch_size=batch_size, | ||
epochs=epochs, | ||
device=device, | ||
shuffle=shuffle, | ||
theta=theta, | ||
output_size=output_size, | ||
) | ||
|
||
return root_op.get_result() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
from nebullvm.apps.base import App | ||
|
||
from forward_forward.root_op import ForwardForwardRootOp | ||
|
||
|
||
class ForwardForwardApp(App): | ||
def __init__(self): | ||
super().__init__() | ||
self.root_op = ForwardForwardRootOp() | ||
|
||
def execute(self, *args, **kwargs): | ||
return self.root_op.execute(*args, **kwargs) |
Empty file.
114 changes: 114 additions & 0 deletions
114
apps/accelerate/forward_forward/forward_forward/operations/build_models.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
from abc import ABC, abstractmethod | ||
|
||
import torch | ||
|
||
from nebullvm.operations.base import Operation | ||
|
||
from forward_forward.utils.modules import ( | ||
FCNetFFProgressive, | ||
RecurrentFCNetFF, | ||
LMFFNet, | ||
) | ||
|
||
|
||
class BaseModelBuildOperation(Operation, ABC): | ||
def __init__(self): | ||
super().__init__() | ||
self.model = None | ||
|
||
@abstractmethod | ||
def execute( | ||
self, | ||
input_size: int, | ||
n_layers: int, | ||
hidden_size: int, | ||
optimizer_name: str, | ||
optimizer_params: dict, | ||
loss_fn_name: str, | ||
output_size: int = None, | ||
): | ||
raise NotImplementedError | ||
|
||
def get_result(self): | ||
return self.model | ||
|
||
|
||
class FCNetFFProgressiveBuildOperation(BaseModelBuildOperation): | ||
def __init__(self): | ||
super().__init__() | ||
|
||
def execute( | ||
self, | ||
input_size: int, | ||
n_layers: int, | ||
hidden_size: int, | ||
optimizer_name: str, | ||
optimizer_params: dict, | ||
loss_fn_name: str, | ||
output_size: int = None, | ||
): | ||
layer_sizes = [input_size] + [hidden_size] * n_layers | ||
model = FCNetFFProgressive( | ||
layer_sizes=layer_sizes, | ||
optimizer_name=optimizer_name, | ||
optimizer_kwargs=optimizer_params, | ||
loss_fn_name=loss_fn_name, | ||
epochs=-1, | ||
) | ||
if output_size is not None: | ||
output_layer = torch.nn.Linear(layer_sizes[-1], output_size) | ||
model = torch.nn.Sequential(model, output_layer) | ||
|
||
self.model = model | ||
|
||
|
||
class RecurrentFCNetFFBuildOperation(BaseModelBuildOperation): | ||
def __init__(self): | ||
super().__init__() | ||
|
||
def execute( | ||
self, | ||
input_size: int, | ||
n_layers: int, | ||
hidden_size: int, | ||
optimizer_name: str, | ||
optimizer_params: dict, | ||
loss_fn_name: str, | ||
output_size: int = None, | ||
): | ||
layer_sizes = [input_size] + [hidden_size] * n_layers + [output_size] | ||
model = RecurrentFCNetFF( | ||
layer_sizes=layer_sizes, | ||
optimizer_name=optimizer_name, | ||
optimizer_kwargs=optimizer_params, | ||
loss_fn_name=loss_fn_name, | ||
) | ||
self.model = model | ||
|
||
|
||
class LMFFNetBuildOperation(BaseModelBuildOperation): | ||
def __init__(self): | ||
super().__init__() | ||
|
||
def execute( | ||
self, | ||
input_size: int, | ||
n_layers: int, | ||
hidden_size: int, | ||
optimizer_name: str, | ||
optimizer_params: dict, | ||
loss_fn_name: str, | ||
output_size: int = None, | ||
): | ||
model = LMFFNet( | ||
token_num=output_size, | ||
hidden_size=hidden_size, | ||
n_layers=n_layers, | ||
seq_len=input_size, | ||
optimizer_name=optimizer_name, | ||
optimizer_kwargs=optimizer_params, | ||
loss_fn_name=loss_fn_name, | ||
epochs=-1, | ||
predicted_tokens=-1, | ||
) | ||
self.model = model |
Oops, something went wrong.