Feb 7th - Working on Neo4j JS Driver #1118

lindajiawenli · 2023-02-07T21:01:54Z

Ran "npm install --save neo4j-driver" hence the changes to package-lock.json and package.json
Added code to starting-neo4j.js but it doesn't seem to be doing anything as of yet

DO NOT MERGE YET, code does not actually do anything

Ran "npm install --save neo4j-driver" hence the changes to package-lock.json and package.json Added code to starting-neo4j.js but it doesn't seem to be doing anything as of yet

lindajiawenli · 2023-02-13T22:24:51Z

addNode( params )

This function takes in params, which are of the form <see src/neo4j/graph-data.js>. If it does not yet exist (according to the grounding id, which is: "database name" + ":" + "database id"), create one "Gene" node with the specified parameters.

If it does exist already, do nothing

Possible edit: instead of doing nothing if it already exists, add the factoidId/UUID to the node in the factoidId field (since we can have node properties that are arrays)

lindajiawenli · 2023-02-13T22:28:17Z

addEdge( params )

This function takes in params, which are of the form <see src/neo4j/graph-data.js>. If it does not yet exist (according to the grounding id which is currently the UUID of that specific interaction [NOT the document UUID]), create one edge with the specified parameters.

If it does exist already, do nothing (highly unlikely this will happen if the grounding id is a UUID)

maxkfranz · 2023-02-14T14:26:02Z

Nice start.

Let’s say I’m a person who is using the functions that you’re creating. How would I know what the params are? Would I have to read the query strings every time I want to use these functions? Could things be made simpler or more explicit upfront? What parameters are needed exactly in each case?

lindajiawenli · 2023-02-14T14:46:37Z

What parameters are needed exactly in each case?

For a node, that would be (in the form of an object with these name-value pairs):

id: 'element.association.dbPrefix'+ ':' + 'element.association.id', (ex: 'ncbigene:5597')
factoidId: 'element.id', (ex: 598f8bef-f858-4dd0-b1c6-5168a8ae5349)
name: 'element.name', (ex. 'MAPK6')
type: 'element.type', (ex. 'protein')
dbId: 'element.association.id', (ex. 5597)
dbName: "element.association.dbName" (ex. 'NCBI Gene''NCBI Gene')

For a relationship/edge, that would be (again in the form of an object):

id1: 'element.association.dbPrefix'+ ':' + 'element.association.id', (ex. 'ncbigene:5597')
id2: 'element.association.dbPrefix'+ ':' + 'element.association.id', (ex. 'ncbigene:207')
id3: 'element.id', (ex. 01ef22cc-2a8e-46d4-9060-6bf1c273869b. NOT the document id, but the interaction id)
type: 'element.type', (ex. 'phosphorylation')
doi: 'value.citation.doi', (ex. '10.1126/sciadv.abi6439')
pmid: 'value.citation.pmid' (ex. '34767444')
documentId: 'value.id', (ex. a896d611-affe-4b45-a5e1-9bc560ffceab. This is the document id)
title: 'value.citation.title' (ex. 'MAPK6-AKT signaling promotes tumor growth and resistance to mTOR kinase blockade')

Ideally all the fields will be given to us via the Document API and the user won't have to do anything to actually create the node/edge themselves (we'll get all the info from the Biofactoid forms they fill out and make it automatically. I think this is what we said in a meeting ~2 weeks ago? I could be wrong though), but right now I just have all these parameters hard-coded.

Let me know what you guys think! @jvwong @maxkfranz

jvwong · 2023-02-14T16:05:10Z

src/neo4j/graph-data.js

+    {
+        id1: 'ncbigene:5597', id2: 'ncbigene:207', id3: '01ef22cc-2a8e-46d4-9060-6bf1c273869b',
+        type: 'phosphorylation', doi: '10.1126/sciadv.abi6439', pmid: '34767444',
+        documentId: 'a896d611-affe-4b45-a5e1-9bc560ffceab',


Beware DOI and PMID are not guaranteed to exist.

Right, in those cases Neo4j would be fine receiving 'null' or an empty string (I haven't figured out what the Document API does in those cases yet, but we can work around it).

It would complicate the use case of when the user wants to search for a pathway (rather than a gene) unless people search by title, I think

Still useful to store, but you'll have to be careful of null cases.

Also note that articleTitle could also be null.

jvwong · 2023-02-14T16:06:00Z

src/neo4j/graph-data.js

+    // MAPK6 data
+    {
+        id: 'ncbigene:5597', factoidId: '598f8bef-f858-4dd0-b1c6-5168a8ae5349', name: 'MAPK6',
+        type: 'protein', dbId: '5597', dbName: 'NCBI Gene'


dbId is redundant I suppose

same with dbName? I can get rid of both of them

dbName should be the normalised namespace, like ncbi, if we store it at all. We could remove dbId and dbName completely if we're confident we're not going to do queries on dbName, e.g. give me all proteins that are grounded to ncbi rather than uniprot.

I got "NCBI Gene" from the JSON file way back when I was just getting the values straight from those files. So I think it's pretty consistent

I think I will keep those two fields for now, but I can easily remove them when we're certain they're not necessary

In striving for consistency in syntax, in Factoid I'm going by:

NCBI Gene

dbPrefix: ncbigene

dbName: NCBI Gene

Local id pattern (regex): ^\d+$

Compact identifier: ncbigene:^\d+$

ChEBI

dbPrefix: CHEBI

dbName: ChEBI

Local id pattern (regex): ^CHEBI:\d+$

Compact identifier: CHEBI:^\d+$

maxkfranz · 2023-02-14T16:20:13Z

OK, so addNode() might look something like this:

function addNode(id, type, factoidId (?), dbName (?), dbId (?), name)

Node notes:

The factoidId would have to be stored as an array in Neo4J. There are potentially multiple Biofactoid IDs to each Neo4J node.
It might be simplest to forgo factoidId for now, making the signature function addNode(id, type, dbName, dbId, name). If we really want to store the Biofactoid node UUIDs somewhere, we could do it in the edges as something like sourceFactoidId and targetFactoidId. @jvwong
Future: The name may not be consistent across documents. A user might use name A for somedb:123 and another user might use name B for the same somedb:123. It might make sense to use the 'official' name when hooking things up to the Document API.
See above re. possible removal of dbName and dbId. Then we'd just have function addNode(id, type, name)

And addEdge() might look like:

function addEdge(id, type, factoidId, sourceId, targetId, doi, pmid, documentId, articleTitle)

Edge notes:

It might make sense to store a factoidId field in edges, even if it's redundant with id. In future, the DB may contain data that's not just from Biofactoid (i.e. id = someUUID, factoidId = null).
sourceId and targetId are dbname:123 strings rather than UUIDs.
id is the interaction UUID.
articleTitle is a bit redundant, since you could get that info elsewhere from the IDs. We shouldn't be doing text search in Neo4J. Maybe Jeff has a use case in mind for getting the article title quickly, e.g. a PC Apps like query where you click on an edge and see the article title it's associated with as a link? @jvwong

General notes:

Functions should have the parameters up front rather than in an object, unless you're going to extensively document the object's format in Javadoc/JSDoc-like comments. An options object for the parameters can also make sense in cases where lots of fields are optional, but that's not really the case here.
It's best if commonly-named parameters have similar ordering across the functions, like id and type.
IDs should be sanitised in the implementation. For instance, you could have a DB name of 'NCBIorncbi`. They should always be forced to lower case. Same with UUIDs.
In future, you'll have higher level functions like addDocument(document), addInteraction(interaction), and addEntity(entity). Those functions will use the Document API and your lower level functions that you're working on now.

maxkfranz · 2023-02-14T16:28:27Z

In short, I'd suggest for now:

function addNode(id, type, name)
function addEdge(id, type, factoidId, sourceId, targetId, doi, pmid, factoidDocumentId, articleTitle)

jvwong · 2023-02-14T19:10:26Z

In short, I'd suggest for now:

function addNode(id, type, name)

For nodes, type could vary. For example, I could have a node for TP53 (dbPrefix: ncbigene; id: 7157 ) with type protein but also RNA (as well as other types) depending on what the user specified. So an even simpler approach is to ignore type altogether. This way, I ask for TP53, you give me everything, regardless of the type.

I don't think this is a problem with type chemical.

maxkfranz · 2023-02-14T22:06:47Z

For nodes, type could vary...

This is another example of something that, if needed, may be better placed in interaction data (e.g. sourceType and targetType).

Let's ignore node type for now.

You could make the same argument for an edge's factoidId being redundant with factoidDocumentId: With one, you could find the other.

factoidDocumentId might be better as a more general datasourceId or xref. Then you could give it a value like factoid:some-doc-uuid for Biofactoid edges, and you could use a value like pc:some-other-id if the interaction came from PC (in future).

Then we'd have:

function addNode(id, name)
function addEdge(id, type, sourceId, targetId, xref, doi, pmid, articleTitle) -- I'd put xref before the following parameters, since doi, pmid, articleTitle can each be null, whereas xref should always be defined. Best to put mandatory things first.

lindajiawenli added 17 commits February 7, 2023 15:53

Feb 7th:

69ef230

Ran "npm install --save neo4j-driver" hence the changes to package-lock.json and package.json Added code to starting-neo4j.js but it doesn't seem to be doing anything as of yet

Trying a different connection to neo4j

efe8cbc

Max has helped me start testing/actually running my code

58394a4

Can now run code in test-temp-run.js

2a56322

Test now runs, generates helpful errors about cypher/neo4j

67a5011

helloWorld function can write to sandbox successfully

208228e

makeGeneNodeTest() is successful, but has exit issue

905452b

Got rid of helloWorld test. Pruned relationship properties

f6a8bf3

Co-authored-by: Jeffrey <jvwong@users.noreply.github.com>

396c2b9

Docker-compose working now

5811664

code now writes to docker instance

783585c

put document functions in a different file for organization

b6ba65b

put query strings and hard coded data in different files

c0a491f

makeGeneNodeTest now works

970a0fa

using "executeWrite" in tests instead of "run"

db670c6

2 node + relationship test has succeeded

39c28f4

read1() test for primary use case works

9378be3

Made addNode, addEdge. addNode works confirmed

c2e484e

jvwong reviewed Feb 14, 2023

View reviewed changes

addEdge and searchByGeneId work confirmed

3210615

searchByGeneId prints edge info as well as node info

d945a8e

lindajiawenli added 2 commits February 14, 2023 11:39

updated addNode: fixed parameters, deleted factoidId

4c266eb

updated addNode: deleted dbName and dbId from parameters

ed3ca2f

updated addEdge: fixed parameters

47a5f90

lindajiawenli added 2 commits February 14, 2023 14:28

update addNode: delete type parameter

bf35b32

sanitized ids to always be lower case

2f5478c

lindajiawenli added 5 commits February 15, 2023 10:58

driver functions added

c1dfb2d

updated addEdge: xref instead of factoidId/factoidDocumentId

8827681

integrated the new driver functions in code

96a5264

made first test (not yet run successfully)

4d36f40

more work on tests

c8f1766

lindajiawenli merged commit 980c29a into unstable Feb 16, 2023

jvwong deleted the neo4j-beginning branch August 30, 2023 15:15

Uh oh!

Feb 7th - Working on Neo4j JS Driver #1118

Feb 7th - Working on Neo4j JS Driver #1118

Uh oh!

Conversation

lindajiawenli commented Feb 7, 2023

Uh oh!

lindajiawenli commented Feb 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

addNode( params )

Uh oh!

lindajiawenli commented Feb 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

addEdge( params )

Uh oh!

maxkfranz commented Feb 14, 2023

Uh oh!

lindajiawenli commented Feb 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxkfranz commented Feb 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maxkfranz commented Feb 14, 2023

Uh oh!

jvwong commented Feb 14, 2023

Uh oh!

maxkfranz commented Feb 14, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lindajiawenli commented Feb 13, 2023 •

edited

Loading

lindajiawenli commented Feb 13, 2023 •

edited

Loading

maxkfranz commented Feb 14, 2023 •

edited

Loading