GitHub

GOAL: A person types a paragraph of a thesis and the system finds supporting documents as links, with full citations

(LOAD STEP) Use Wikipedia Api (as a first example) for a source
(TRANSFORM STEP) a) remove stop words b) join words that frequently appear next to each other to not lose their associativity
(EXTRACT STEP) a) use node-word2vec (underlying technology to google knowledge graph) to determine which information is most relevant and important b) use Latent Dirichlet allocation to find underlying topics and branch on that
Crawl N levels to increase corpus for analysis
Repeat 1-3 stack overflow
Enhance results with data gathered from opencyc

/*

/* ssh into the linux proxy server */

ssh <nyu_id>@access.cim.nyu.edu

/* ssh into the hosting server */

ssh linserv2

/* Clone the project to your local directory (i.e. for example the server) */

/* Change to project directory On linserv2 @ ~/public_html/know */

cd know

/* To run the server side code, will also host your web application */

node server

/* view last 50 lines in log file */

watch tail -n 50 know-server.log

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
articles		articles
summaries		summaries
test		test
w2vfiles		w2vfiles
.gitignore		.gitignore
.htaccess		.htaccess
README.md		README.md
bro.sh		bro.sh
index.html		index.html
index.js		index.js
know.log		know.log
nodelda.js		nodelda.js
package.json		package.json
server.js		server.js
start.sh		start.sh
w2vecStep2.txt		w2vecStep2.txt
wnlookup.js		wnlookup.js

Provide feedback