Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge amount of data #36

Open
IonicaBizau opened this issue Jan 5, 2018 · 4 comments
Open

Huge amount of data #36

IonicaBizau opened this issue Jan 5, 2018 · 4 comments

Comments

@IonicaBizau
Copy link

How would this work with huge amount of data (e.g. thousands/millions of pairs), without freezing?

Nice project, btw!

@IonicaBizau
Copy link
Author

For instance, my macbook CPU goes to 100% and gets stuck at the training step.

screen shot 2018-01-05 at 12 00 11

Here's my code:

var BrainJSClassifier = require('natural-brain');
var classifier = new BrainJSClassifier();
var lorem = require("lorem-ipsum")

const word = () => lorem({ count: 1, units: "words" })
const cats = new Array(42).fill(0).map(word)
const ran = () => cats[Math.floor(Math.random() * cats.length)]

console.log("Generating")
for (var i = 0; i < 1000; ++i) {
    classifier.addDocument(lorem({ count: 3 }), ran());
}

console.log("Training")
classifier.train();

console.log("Running")
console.log(classifier.classify('hi'));

@robertleeplummerjr
Copy link

Lets work together to make this faster!

@daffl
Copy link
Member

daffl commented Jan 5, 2018

Training with larger datasets can take a while and the lorem ipsum generator might generate conflicting classifications in which case the Neural Network will run up to 10000 iterations to get the error rate as low as possible (and training might fail if it didn't succeed).

There are two options I can see for improving performance:

  1. Train in a separate process so it at least doesn't lock up the main Node process
  2. Store and load the trained Neural Network

@IonicaBizau
Copy link
Author

Train in a separate process so it at least doesn't lock up the main Node process

Will that work on Heroku, assuming that there is just on CPU?
I guess processes will share the same CPU. How would this example look like with multiple processes?

Store and load the trained Neural Network

That sounds good.
Some kind of caching is needed anyways, because RAM is limited as well (e.g. 500MB).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants