Multiple edges with same edgeid using VertexCentric indices after modifications to MULTI edge #301

pankajydv · 2017-06-03T15:19:36Z

The issue reproduces randomly on my production environment (on Titan), and reproduces on my dev environment pretty easily even with the ConsistencyModifier of the edge set to LOCK.

Following is the code, I used to reproduce the issue on my local environment on JanusGraph 0.1.1, Backed by Apache Cassandra 2.1.7

//Edge
label = mgmt.makeEdgeLabel(HIE_CHILD).multiplicity(Multiplicity.MULTI).make();
mgmt.setConsistency(label, ConsistencyModifier.LOCK);
		
//VertexCentric Index
mgmt.buildEdgeIndex(label, "HSCCERDecr", Direction.BOTH, Order.decr,
	mgmt.getPropertyKey(HOSTID_E),
	mgmt.getPropertyKey(STATUS_E),
	mgmt.getPropertyKey(CMSTYPE_E),
	mgmt.getPropertyKey(HRANK));

The problem is If I try to concurrently modify an edge with n threads, I get n copies of the same edge, with same edgeid and then gremlin queries start behaving very abruptly. Like if the direct children on some vertex are m, and if I fire a query which uses the VertexCentricIndex the number of children returned are m+n (n is the number of threads I used for concurrent modifications). I used the following code to reproduce the issue.

JanusGraphTransaction trxn = graph.newTransaction();
GraphTraversalSource g = trxn.traversal();
Edge edge = (Edge) g.V().has("msid", parent).outE("hie_child").as("e")
	.has("hostid_e", hostid)
	.inV().has("msid", child)
	.select("e").next();

Long updatedAt = random.nextLong();
			
edge.property("hrank", random.nextInt());
edge.property("updatedAt_e", updatedAt);
			
trxn.commit();

Following is the behavior, I observed through gremlin console:

Before the concurrent modification, number of edges with or without the VertexCentric index are same.

gremlin> g.V().has('msid', 1000000).outE('hie_child').as('e').inV().has('msid', 1000002).select('e').count()
==>1
gremlin> g.V().has('msid', 1000000).outE('hie_child').has('hostid_e', 83).as('e').inV().has('msid', 1000002).select('e').count()
==>1

After the concurrent modification with 10 threads. I see 10 edges between the parent/child with VertexCentric index and only 1 wihtout the index.

gremlin> g.V().has('msid', 1000000).outE('hie_child').has('hostid_e', 83).as('e').inV().has('msid', 1000002).select('e').count()
==>10
gremlin> g.V().has('msid', 1000000).outE('hie_child').as('e').inV().has('msid', 1000002).select('e').count()
==>1

If I try to drop only a single edge, it seems as if all the edges are dropped.

gremlin> g.V().has('msid', 1000000).outE('hie_child').has('hostid_e', 83).as('e').inV().has('msid', 1000002).select('e').range(0,1).drop()
gremlin> g.V().has('msid', 1000000).outE('hie_child').has('hostid_e', 83).as('e').inV().has('msid', 1000002).select('e').count()
==>0
gremlin> g.V().has('msid', 1000000).outE('hie_child').as('e').inV().has('msid', 1000002).select('e').count()
==>0

But n-1 edges reapper through the VertexCentric index after the transaction is committed, but without using the index there is no edge.

gremlin> g.tx().commit()
==>null
gremlin> g.V().has('msid', 1000000).outE('hie_child').has('hostid_e', 83).as('e').inV().has('msid', 1000002).select('e').count()
==>9
gremlin> g.V().has('msid', 1000000).outE('hie_child').as('e').inV().has('msid', 1000002).select('e').count()
==>0

Please note within the transaction after deleting the affected edge, it seems all the edges are deleted, but after transaction commit, n-1 edges re-appear using the VertexCentric index query, but there is no edge if that VertexCentric Index is not used.

pankajydv · 2017-06-06T08:58:12Z

Any updates here, we are suffering from this issue in our Live environment.

pankajydv · 2018-01-02T12:58:58Z

Any update, on this?

mbrukman · 2018-01-02T17:25:59Z

@pankajydv — could you please try to see if it's possible to reproduce the same issue with JanusGraph 0.2.0 and/or master branch as well? A number of changes have been made, including upgrades to dependencies.

pankajydv · 2018-01-03T09:37:40Z

@mbrukman - thanks for the update, let's test this on the latest JanusGraph codebase and we'll let you know the results.

rahul38888 · 2018-01-09T12:03:54Z

Same issue is reproducible on JanusGraph 0.2.0 with Apache Cassandra 3.0.9 as well. The properties corresponding to are following:

storage.backend=cassandra
storage.cassandra.replication-strategy-class=org.apache.cassandra.locator.NetworkTopologyStrategy
storage.cassandra.replication-strategy-options= dc1, 2
storage.hostname=xxx.xxx.xxx.xxx
storage.cassandra.keyspace=janustest
storage.cassandra.frame-size-mb=300
storage.read-only=false
query.force-index=true

pankajydv · 2018-01-10T05:00:11Z

@mbrukman - one of my colleague tried this code with JanusGraph 0.2 and Apache Cassandra 3.0.9 and we are still seeing this issue - request you to please prioritize this as this sometimes affect our live production system and is tedious to fix. In our observation what goes wrong is 2 or more copies get indexed in the VertexCentric index having same edge id (if you query in such a way that this VertexCentric index is not used then only 1 edge is found) and starts giving confusing results. This is even after we have set ConsistencyModifier to LOCK for this edge, and it's not possible to change the ConsistencyModifer of the VertexCentric index.
Please let us know if you need any more help/information in the form of logs etc from our side.

pankajydv · 2018-01-10T05:02:59Z

@rahul38888 please attach the code and config you used to reproduce this issue here in zip format.

rahul38888 · 2018-01-10T10:04:21Z

~~The code used to reproduce the issue is following.~~

This part has been removed because of the long text content. Same code can be found here

robertdale · 2018-01-10T16:16:39Z

First, this isn't a working example. It would be great if you had something where one could just do mvn test.
Next, it's not clear what the expected is vs. actual. Again, if you had a failed test case with mvn test, it would be more clear.
Finally, it looks like you're missing a consistency LOCK on edgeIndex "HSCCERDecr"

rahul38888 · 2018-01-11T11:56:40Z

We have created a working maven test to reproduce the issue. Please find the project attached with this.

Change the graph.properties in classpath accordingly.

testTemp.zip

pankajydv · 2018-01-11T12:16:06Z

@robertdale We have updated our code in Junit Testcase format, so now the issue is reproducible via mvn test. We tried setting consistency LOCK on edgeindex(VertexCentric index) "HSCCERDecr", but it seems this is not supported by Titan/JanusGraph, as we get the following exception when we try this:

gremlin> edge=mgmt.getRelationType('hie_child')
gremlin> index=mgmt.getRelationIndex(edge, 'HSCCERDecr')
gremlin> mgmt.setConsistency(index, ConsistencyModifier.LOCK)
Cannot change consistency of schema element: HSCCERDecr
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.IllegalArgumentException: Cannot change consistency of schema element: HSCCERDecr
at org.janusgraph.graphdb.database.management.ManagementSystem.setConsistency(ManagementSystem.java:1115)

amcp · 2018-01-11T12:21:21Z

Isn’t consistency part immutable? I think you need to set the consistency when you create the index.

rahul38888 · 2018-01-11T15:48:16Z

Thanks @amcp . I tried to set consistency after building vertex-centric index, as following

RelationTypeIndex hie_childIndex = mgmt.buildEdgeIndex(label, "HSCCERDecr", Direction.BOTH, Order.decr,
				mgmt.getPropertyKey("hostid_e"),
				mgmt.getPropertyKey("status_e"),
				mgmt.getPropertyKey("cmstype_e"),
				mgmt.getPropertyKey("hrank"));
		
mgmt.setConsistency(hie_childIndex, ConsistencyModifier.LOCK);

This throws following exception:

java.lang.IllegalArgumentException: Cannot change consistency of schema element: HSCCERDecr
at org.janusgraph.graphdb.database.management.ManagementSystem.setConsistency(ManagementSystem.java:1115)

Also, there seems no implementation of JanusGraphManagement#buildEdgeIndex which would take consistency as an argument.

Its seems this is not allowed even when the index is being created.

Please guide me if I am missing something.

rahul38888 · 2018-01-18T07:14:26Z

The re-runnable code to reproduce the issue is attached below:

janusedgetest.zip

chupman · 2018-01-18T23:17:54Z

Hi @rahul38888 I'd like to help reproduce the issue. Would it be possible for you to put the code from your zip into a fresh github repository with the steps required to reproduce on a fresh system included in the readme?

pankajydv · 2018-01-19T10:32:15Z

@chupman I have uploaded this test-case to a fresh git repository. Please let us know if you need more information.

chupman · 2018-01-19T18:29:27Z

Thanks @pankajydv, I was able to recreate the issue with the repo. I was trying the same thing with the zip yesterday and it was failing before it got to the use case. I forked and uploaded the output xml since 2400 lines seemed a bit long for a Gist.

pankajydv · 2018-01-23T11:18:53Z

@chupman - are you still working on this. Can you please share, if there is an update.

pluradj · 2018-02-04T22:05:36Z

It doesn't look like the test code (JanusMultipleEdgesTest and JanusEdgeUpdator ) is using multi-threaded transactions as described in the JanusGraph docs.

pankajydv · 2018-02-05T04:52:05Z

@pluradj - Sorry didn't get your point. Our use case suits a different transaction for every thread, allowing every thread to work independently.

@mbrukman - Request you to please assign this issue to someone so as to expedite the solution to this problem.

pankajydv · 2018-02-12T09:10:50Z

We tried to figure out the root cause of this problem by looking at the code of tag v0.2.0 and we see that the code in
org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit(...) is not able to acquire edge LOCK through
org.janusgraph.graphdb.database.StandardJanusGraph.acquireLock(InternalRelation, int, boolean) for the following operations

//1) Collect deleted edges and their index updates and acquire edge locks
&
//2) Collect added edges and their index updates and acquire edge locks

We modified the code of the acquireLock method to return true even in case of 'MULTI' edges if the edge consistency level is set to LOCK and could resolve this issue. Can anyone please let's know if this can lead to any discrepancy. Please note we have already tested MULTIPLE edges creation between a pair of vertices and it works fine with this change.

pankajydv · 2018-02-13T11:56:30Z

Looking at the code we can conclude that an edge with a multiplicity of MULTI will never be able to acquire LOCK. Can someone please throw some light on why is an edge with multiplicity MULTI is not supposed to acquire lock and if it is then how can we make the edge modifications acquire the LOCK. Please note edge consistency is already set to LOCK.

pluradj · 2018-02-14T14:47:33Z

@pankajydv Do you have a public fork with your changes? It would be easier for understanding what you are proposing, and ultimately could lead to a pull request from you.

pankajydv · 2018-02-14T17:20:34Z

@pluradj Please refer to my public fork with the changes corresponding to the comment.

chupman · 2018-02-14T21:08:40Z

HI @pankajydv, I'm curious why you're locking the edge label instead of the property. Based off the docs it seems like it's not recommended to rely on locking this way with Cassandra. Would using the consistency modifier FORK rather than LOCK work for your use case?

pankajydv · 2018-02-16T09:43:18Z

@chupman

I have tried setting LOCK on edge as well as on all of the edge properties, but the duplicate edges with the same edge-id issue still persist.
The issue doesn't look like a Cassandra level inconsistency issue. I can say that because if I make the aquireLock method to return true - the issue get's resolved. Hence it looks like the issue is because the edge modification path is not acquiring LOCK.
with edge consistency set to FORK, I am still getting multiple edges between these vertices the only difference is this time they all have different ids, this is expected behavior as per the docs. What we need is only a single edge between a pair of vertices depending on the value of an edge property.

graphdb2 · 2018-08-14T12:24:06Z

Hello, any new on this ? is the only way to handle concurrent update on an edge is to get duplicate edges and filter them out later ?

cjxqhhh · 2020-06-19T01:39:40Z

Any updates on this? It's an important issue with data consistency I think.

pankajydv changed the title ~~Multiple edges with same edgeid returned by VertexCentric indices~~ Multiple edges with same edgeid returned by VertexCentric indices after concurrent modifications to edge Jun 3, 2017

amcp added the kind/question label Jun 8, 2017

mbrukman added the storage/cassandra label Jan 2, 2018

pankajydv added a commit to pankajydv/janusedgetest that referenced this issue Jan 19, 2018

Junit testcase to reproduce - JanusGraph/janusgraph#301

c308e1a

pankajydv changed the title ~~Multiple edges with same edgeid returned by VertexCentric indices after concurrent modifications to edge~~ Multiple edges with same edgeid using VertexCentric indices after concurrent modifications to MULTI edge Feb 13, 2018

pankajydv changed the title ~~Multiple edges with same edgeid using VertexCentric indices after concurrent modifications to MULTI edge~~ Multiple edges with same edgeid using VertexCentric indices after modifications to MULTI edge May 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple edges with same edgeid using VertexCentric indices after modifications to MULTI edge #301

Multiple edges with same edgeid using VertexCentric indices after modifications to MULTI edge #301

pankajydv commented Jun 3, 2017 •

edited

Loading

pankajydv commented Jun 6, 2017

pankajydv commented Jan 2, 2018

mbrukman commented Jan 2, 2018

pankajydv commented Jan 3, 2018

rahul38888 commented Jan 9, 2018

pankajydv commented Jan 10, 2018

pankajydv commented Jan 10, 2018

rahul38888 commented Jan 10, 2018 •

edited

Loading

robertdale commented Jan 10, 2018 •

edited

Loading

rahul38888 commented Jan 11, 2018

pankajydv commented Jan 11, 2018

amcp commented Jan 11, 2018

rahul38888 commented Jan 11, 2018

rahul38888 commented Jan 18, 2018 •

edited

Loading

chupman commented Jan 18, 2018

pankajydv commented Jan 19, 2018

chupman commented Jan 19, 2018

pankajydv commented Jan 23, 2018

pluradj commented Feb 4, 2018

pankajydv commented Feb 5, 2018

pankajydv commented Feb 12, 2018

pankajydv commented Feb 13, 2018

pluradj commented Feb 14, 2018

pankajydv commented Feb 14, 2018

chupman commented Feb 14, 2018

pankajydv commented Feb 16, 2018

graphdb2 commented Aug 14, 2018

cjxqhhh commented Jun 19, 2020

Multiple edges with same edgeid using VertexCentric indices after modifications to MULTI edge #301

Multiple edges with same edgeid using VertexCentric indices after modifications to MULTI edge #301

Comments

pankajydv commented Jun 3, 2017 • edited Loading

pankajydv commented Jun 6, 2017

pankajydv commented Jan 2, 2018

mbrukman commented Jan 2, 2018

pankajydv commented Jan 3, 2018

rahul38888 commented Jan 9, 2018

pankajydv commented Jan 10, 2018

pankajydv commented Jan 10, 2018

rahul38888 commented Jan 10, 2018 • edited Loading

robertdale commented Jan 10, 2018 • edited Loading

rahul38888 commented Jan 11, 2018

pankajydv commented Jan 11, 2018

amcp commented Jan 11, 2018

rahul38888 commented Jan 11, 2018

rahul38888 commented Jan 18, 2018 • edited Loading

chupman commented Jan 18, 2018

pankajydv commented Jan 19, 2018

chupman commented Jan 19, 2018

pankajydv commented Jan 23, 2018

pluradj commented Feb 4, 2018

pankajydv commented Feb 5, 2018

pankajydv commented Feb 12, 2018

pankajydv commented Feb 13, 2018

pluradj commented Feb 14, 2018

pankajydv commented Feb 14, 2018

chupman commented Feb 14, 2018

pankajydv commented Feb 16, 2018

graphdb2 commented Aug 14, 2018

cjxqhhh commented Jun 19, 2020

pankajydv commented Jun 3, 2017 •

edited

Loading

rahul38888 commented Jan 10, 2018 •

edited

Loading

robertdale commented Jan 10, 2018 •

edited

Loading

rahul38888 commented Jan 18, 2018 •

edited

Loading