Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple edges with same edgeid using VertexCentric indices after modifications to MULTI edge #301

Open
pankajydv opened this issue Jun 3, 2017 · 28 comments

Comments

@pankajydv
Copy link
Contributor

pankajydv commented Jun 3, 2017

The issue reproduces randomly on my production environment (on Titan), and reproduces on my dev environment pretty easily even with the ConsistencyModifier of the edge set to LOCK.

Following is the code, I used to reproduce the issue on my local environment on JanusGraph 0.1.1, Backed by Apache Cassandra 2.1.7

//Edge
label = mgmt.makeEdgeLabel(HIE_CHILD).multiplicity(Multiplicity.MULTI).make();
mgmt.setConsistency(label, ConsistencyModifier.LOCK);
		
//VertexCentric Index
mgmt.buildEdgeIndex(label, "HSCCERDecr", Direction.BOTH, Order.decr,
	mgmt.getPropertyKey(HOSTID_E),
	mgmt.getPropertyKey(STATUS_E),
	mgmt.getPropertyKey(CMSTYPE_E),
	mgmt.getPropertyKey(HRANK));

The problem is If I try to concurrently modify an edge with n threads, I get n copies of the same edge, with same edgeid and then gremlin queries start behaving very abruptly. Like if the direct children on some vertex are m, and if I fire a query which uses the VertexCentricIndex the number of children returned are m+n (n is the number of threads I used for concurrent modifications). I used the following code to reproduce the issue.

JanusGraphTransaction trxn = graph.newTransaction();
GraphTraversalSource g = trxn.traversal();
Edge edge = (Edge) g.V().has("msid", parent).outE("hie_child").as("e")
	.has("hostid_e", hostid)
	.inV().has("msid", child)
	.select("e").next();

Long updatedAt = random.nextLong();
			
edge.property("hrank", random.nextInt());
edge.property("updatedAt_e", updatedAt);
			
trxn.commit();

Following is the behavior, I observed through gremlin console:

Before the concurrent modification, number of edges with or without the VertexCentric index are same.

gremlin> g.V().has('msid', 1000000).outE('hie_child').as('e').inV().has('msid', 1000002).select('e').count()
==>1
gremlin> g.V().has('msid', 1000000).outE('hie_child').has('hostid_e', 83).as('e').inV().has('msid', 1000002).select('e').count()
==>1

After the concurrent modification with 10 threads. I see 10 edges between the parent/child with VertexCentric index and only 1 wihtout the index.

gremlin> g.V().has('msid', 1000000).outE('hie_child').has('hostid_e', 83).as('e').inV().has('msid', 1000002).select('e').count()
==>10
gremlin> g.V().has('msid', 1000000).outE('hie_child').as('e').inV().has('msid', 1000002).select('e').count()
==>1

If I try to drop only a single edge, it seems as if all the edges are dropped.

gremlin> g.V().has('msid', 1000000).outE('hie_child').has('hostid_e', 83).as('e').inV().has('msid', 1000002).select('e').range(0,1).drop()
gremlin> g.V().has('msid', 1000000).outE('hie_child').has('hostid_e', 83).as('e').inV().has('msid', 1000002).select('e').count()
==>0
gremlin> g.V().has('msid', 1000000).outE('hie_child').as('e').inV().has('msid', 1000002).select('e').count()
==>0

But n-1 edges reapper through the VertexCentric index after the transaction is committed, but without using the index there is no edge.

gremlin> g.tx().commit()
==>null
gremlin> g.V().has('msid', 1000000).outE('hie_child').has('hostid_e', 83).as('e').inV().has('msid', 1000002).select('e').count()
==>9
gremlin> g.V().has('msid', 1000000).outE('hie_child').as('e').inV().has('msid', 1000002).select('e').count()
==>0

Please note within the transaction after deleting the affected edge, it seems all the edges are deleted, but after transaction commit, n-1 edges re-appear using the VertexCentric index query, but there is no edge if that VertexCentric Index is not used.

@pankajydv pankajydv changed the title Multiple edges with same edgeid returned by VertexCentric indices Multiple edges with same edgeid returned by VertexCentric indices after concurrent modifications to edge Jun 3, 2017
@pankajydv
Copy link
Contributor Author

Any updates here, we are suffering from this issue in our Live environment.

@pankajydv
Copy link
Contributor Author

Any update, on this?

@mbrukman
Copy link
Member

mbrukman commented Jan 2, 2018

@pankajydv — could you please try to see if it's possible to reproduce the same issue with JanusGraph 0.2.0 and/or master branch as well? A number of changes have been made, including upgrades to dependencies.

@pankajydv
Copy link
Contributor Author

@mbrukman - thanks for the update, let's test this on the latest JanusGraph codebase and we'll let you know the results.

@rahul38888
Copy link

Same issue is reproducible on JanusGraph 0.2.0 with Apache Cassandra 3.0.9 as well. The properties corresponding to are following:

storage.backend=cassandra
storage.cassandra.replication-strategy-class=org.apache.cassandra.locator.NetworkTopologyStrategy
storage.cassandra.replication-strategy-options= dc1, 2
storage.hostname=xxx.xxx.xxx.xxx
storage.cassandra.keyspace=janustest
storage.cassandra.frame-size-mb=300
storage.read-only=false
query.force-index=true

@pankajydv
Copy link
Contributor Author

@mbrukman - one of my colleague tried this code with JanusGraph 0.2 and Apache Cassandra 3.0.9 and we are still seeing this issue - request you to please prioritize this as this sometimes affect our live production system and is tedious to fix. In our observation what goes wrong is 2 or more copies get indexed in the VertexCentric index having same edge id (if you query in such a way that this VertexCentric index is not used then only 1 edge is found) and starts giving confusing results. This is even after we have set ConsistencyModifier to LOCK for this edge, and it's not possible to change the ConsistencyModifer of the VertexCentric index.
Please let us know if you need any more help/information in the form of logs etc from our side.

@pankajydv
Copy link
Contributor Author

@rahul38888 please attach the code and config you used to reproduce this issue here in zip format.

@rahul38888
Copy link

rahul38888 commented Jan 10, 2018

The code used to reproduce the issue is following.

This part has been removed because of the long text content. Same code can be found here

@robertdale
Copy link
Member

robertdale commented Jan 10, 2018

First, this isn't a working example. It would be great if you had something where one could just do mvn test.
Next, it's not clear what the expected is vs. actual. Again, if you had a failed test case with mvn test, it would be more clear.
Finally, it looks like you're missing a consistency LOCK on edgeIndex "HSCCERDecr"

@rahul38888
Copy link

We have created a working maven test to reproduce the issue. Please find the project attached with this.

Change the graph.properties in classpath accordingly.

testTemp.zip

@pankajydv
Copy link
Contributor Author

@robertdale We have updated our code in Junit Testcase format, so now the issue is reproducible via mvn test. We tried setting consistency LOCK on edgeindex(VertexCentric index) "HSCCERDecr", but it seems this is not supported by Titan/JanusGraph, as we get the following exception when we try this:

gremlin> edge=mgmt.getRelationType('hie_child')
gremlin> index=mgmt.getRelationIndex(edge, 'HSCCERDecr')
gremlin> mgmt.setConsistency(index, ConsistencyModifier.LOCK)
Cannot change consistency of schema element: HSCCERDecr
Type ':help' or ':h' for help.
Display stack trace? [yN]y
java.lang.IllegalArgumentException: Cannot change consistency of schema element: HSCCERDecr
at org.janusgraph.graphdb.database.management.ManagementSystem.setConsistency(ManagementSystem.java:1115)

@amcp
Copy link

amcp commented Jan 11, 2018

Isn’t consistency part immutable? I think you need to set the consistency when you create the index.

@rahul38888
Copy link

Thanks @amcp . I tried to set consistency after building vertex-centric index, as following

RelationTypeIndex hie_childIndex = mgmt.buildEdgeIndex(label, "HSCCERDecr", Direction.BOTH, Order.decr,
				mgmt.getPropertyKey("hostid_e"),
				mgmt.getPropertyKey("status_e"),
				mgmt.getPropertyKey("cmstype_e"),
				mgmt.getPropertyKey("hrank"));
		
mgmt.setConsistency(hie_childIndex, ConsistencyModifier.LOCK);

This throws following exception:

java.lang.IllegalArgumentException: Cannot change consistency of schema element: HSCCERDecr
at org.janusgraph.graphdb.database.management.ManagementSystem.setConsistency(ManagementSystem.java:1115)

Also, there seems no implementation of JanusGraphManagement#buildEdgeIndex which would take consistency as an argument.

Its seems this is not allowed even when the index is being created.

Please guide me if I am missing something.

@rahul38888
Copy link

rahul38888 commented Jan 18, 2018

The re-runnable code to reproduce the issue is attached below:

janusedgetest.zip

@chupman
Copy link
Member

chupman commented Jan 18, 2018

Hi @rahul38888 I'd like to help reproduce the issue. Would it be possible for you to put the code from your zip into a fresh github repository with the steps required to reproduce on a fresh system included in the readme?

pankajydv added a commit to pankajydv/janusedgetest that referenced this issue Jan 19, 2018
@pankajydv
Copy link
Contributor Author

@chupman I have uploaded this test-case to a fresh git repository. Please let us know if you need more information.

@chupman
Copy link
Member

chupman commented Jan 19, 2018

Thanks @pankajydv, I was able to recreate the issue with the repo. I was trying the same thing with the zip yesterday and it was failing before it got to the use case. I forked and uploaded the output xml since 2400 lines seemed a bit long for a Gist.

@pankajydv
Copy link
Contributor Author

@chupman - are you still working on this. Can you please share, if there is an update.

@pluradj
Copy link
Member

pluradj commented Feb 4, 2018

It doesn't look like the test code (JanusMultipleEdgesTest and JanusEdgeUpdator ) is using multi-threaded transactions as described in the JanusGraph docs.

@pankajydv
Copy link
Contributor Author

@pluradj - Sorry didn't get your point. Our use case suits a different transaction for every thread, allowing every thread to work independently.

@mbrukman - Request you to please assign this issue to someone so as to expedite the solution to this problem.

@pankajydv
Copy link
Contributor Author

We tried to figure out the root cause of this problem by looking at the code of tag v0.2.0 and we see that the code in
org.janusgraph.graphdb.database.StandardJanusGraph.prepareCommit(...) is not able to acquire edge LOCK through
org.janusgraph.graphdb.database.StandardJanusGraph.acquireLock(InternalRelation, int, boolean) for the following operations

//1) Collect deleted edges and their index updates and acquire edge locks
&
//2) Collect added edges and their index updates and acquire edge locks

We modified the code of the acquireLock method to return true even in case of 'MULTI' edges if the edge consistency level is set to LOCK and could resolve this issue. Can anyone please let's know if this can lead to any discrepancy. Please note we have already tested MULTIPLE edges creation between a pair of vertices and it works fine with this change.

@pankajydv pankajydv changed the title Multiple edges with same edgeid returned by VertexCentric indices after concurrent modifications to edge Multiple edges with same edgeid using VertexCentric indices after concurrent modifications to MULTI edge Feb 13, 2018
@pankajydv
Copy link
Contributor Author

Looking at the code we can conclude that an edge with a multiplicity of MULTI will never be able to acquire LOCK. Can someone please throw some light on why is an edge with multiplicity MULTI is not supposed to acquire lock and if it is then how can we make the edge modifications acquire the LOCK. Please note edge consistency is already set to LOCK.

@pluradj
Copy link
Member

pluradj commented Feb 14, 2018

@pankajydv Do you have a public fork with your changes? It would be easier for understanding what you are proposing, and ultimately could lead to a pull request from you.

@pankajydv
Copy link
Contributor Author

@pluradj Please refer to my public fork with the changes corresponding to the comment.

@chupman
Copy link
Member

chupman commented Feb 14, 2018

HI @pankajydv, I'm curious why you're locking the edge label instead of the property. Based off the docs it seems like it's not recommended to rely on locking this way with Cassandra. Would using the consistency modifier FORK rather than LOCK work for your use case?

@pankajydv
Copy link
Contributor Author

@chupman

  • I have tried setting LOCK on edge as well as on all of the edge properties, but the duplicate edges with the same edge-id issue still persist.

  • The issue doesn't look like a Cassandra level inconsistency issue. I can say that because if I make the aquireLock method to return true - the issue get's resolved. Hence it looks like the issue is because the edge modification path is not acquiring LOCK.

  • with edge consistency set to FORK, I am still getting multiple edges between these vertices the only difference is this time they all have different ids, this is expected behavior as per the docs. What we need is only a single edge between a pair of vertices depending on the value of an edge property.

@pankajydv pankajydv changed the title Multiple edges with same edgeid using VertexCentric indices after concurrent modifications to MULTI edge Multiple edges with same edgeid using VertexCentric indices after modifications to MULTI edge May 9, 2018
@graphdb2
Copy link

Hello, any new on this ? is the only way to handle concurrent update on an edge is to get duplicate edges and filter them out later ?

@cjxqhhh
Copy link

cjxqhhh commented Jun 19, 2020

Any updates on this? It's an important issue with data consistency I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants