Skip to content

Report dead entry points #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 42 commits into from

Conversation

yiming-tang-cs
Copy link
Contributor

No description provided.

@yiming-tang-cs yiming-tang-cs requested a review from khatchad as a code owner March 8, 2018 20:57
@yiming-tang-cs
Copy link
Contributor Author

I did not change the set of entry points because I think the entry points should be consistent with the entry points which are used to create call graph. Hence, I just report the dead entry points for now.

@yiming-tang-cs
Copy link
Contributor Author

When the tool evaluates a project, it generates a set of dead entry points and prints those entry points into a txt file named dead_entry_points.txt.

I've added 3 methods:

  1. pruneEntryPoints(): the main methods to report dead entry points
  2. isReachable: given an entry point, check whether there is a stream creation node which is reachable from the entry point
  3. reportDeadEntryPoints(): print all entry points into txt file.

@yiming-tang-cs
Copy link
Contributor Author

yiming-tang-cs commented Mar 9, 2018

I've added two new dead entry points in to streamql and tested streamql.

The dead_entry_points.txt:

deephacks.streamql.LocalDateTimeTest.<clinit>()V
deephacks.streamql.LocalDateTimeTest.<init>()V
deephacks.streamql.DurationTest.<init>()V
deephacks.streamql.PersonTest.<clinit>()V
deephacks.streamql.PeriodTest.<init>()V
deephacks.streamql.ZonedDateTimeTest.<clinit>()V
deephacks.streamql.ExceptionalTest.<init>()V
deephacks.streamql.EnumTest.<init>()V
deephacks.streamql.UriTest.<clinit>()V
deephacks.streamql.NumbersTest.<init>()V
deephacks.streamql.UriTest.<init>()V
deephacks.streamql.UuidTest.<init>()V
deephacks.streamql.FileTest.<clinit>()V
deephacks.streamql.EnumTest.<clinit>()V
deephacks.streamql.StringTest.<init>()V
deephacks.streamql.NullTest.<init>()V
deephacks.streamql.NumbersTest.<clinit>()V
deephacks.streamql.ComparatorsTest.<init>()V
deephacks.streamql.ComparatorsTest.ints()V
deephacks.streamql.PersonTest.<init>()V
deephacks.streamql.UuidTest.<clinit>()V
deephacks.streamql.ComparatorsTest.longs()V
deephacks.streamql.DurationTest.<clinit>()V
deephacks.streamql.FileTest.<init>()V
deephacks.streamql.BooleanTest.<init>()V
deephacks.streamql.ZoneOffsetTest.<init>()V
deephacks.streamql.PeriodTest.<clinit>()V
deephacks.streamql.ZoneOffsetTest.<clinit>()V
deephacks.streamql.ZonedDateTimeTest.<init>()V
deephacks.streamql.StringTest.<clinit>()V
deephacks.streamql.BooleanTest.<clinit>()V

deephacks.streamql.ComparatorsTest.ints()V and deephacks.streamql.ComparatorsTest.longs()V are two dead entry points that I added. Others are from constructors.

@yiming-tang-cs
Copy link
Contributor Author

Project:

class A {

	void m() {
		HashSet h1 = new HashSet();
		h1.stream().count();
	}
	
	@EntryPoint
	void n() {
		m();
	}
	
	@EntryPoint
	void mm() {}
}

Result:

q.A.<init>()V
q.A.mm()V

Project:

class A {

	void m() {
		HashSet h1 = new HashSet();
		h1.stream().count();
	}
	
	@EntryPoint
	void n() {
		m();
	}
	
	@EntryPoint
	void mm() {n();}
}

Result:

q.A.<init>()V

This test case is interesting:
Project:

class A {
	@EntryPoint
	void m() {
		HashSet h1 = new HashSet();
		h1.stream().count();
	}
	
	@EntryPoint
	void n() {
		m();
	}
	
	@EntryPoint
	void mm() {m();}
}

Result:

q.A.mm()V
q.A.n()V
q.A.<init>()V

@khatchad
Copy link
Member

khatchad commented Mar 9, 2018

How are the results correct for your last example? There should be no dead entry points there. They all lead to methods instantiating streams.

@khatchad
Copy link
Member

khatchad commented Mar 9, 2018

Also, ctors and static initializers are only dead iff all non-ctor and non-static initializers in the associated class are dead.

@yiming-tang-cs
Copy link
Contributor Author

How are the results correct for your last example? There should be no dead entry points there. They all lead to methods instantiating streams.

It seems that the current strategy only detects one path from the any entry point to a specific stream creation node and ignore other paths. I do not think the logic of my code is wrong for now. I need to check whether my work is wrong or this issue is from difference of call graph. I will check it.later.

@khatchad
Copy link
Member

khatchad commented Mar 9, 2018 via email

@yiming-tang-cs
Copy link
Contributor Author

How are the results correct for your last example? There should be no dead entry points there. They all lead to methods instantiating streams.

The results are very strange here.
For this example:

class A {

	void m() {
		HashSet h1 = new HashSet();
		h1.stream().count();
	}
	
	@EntryPoint
	void n() {
		m();
	}
	
	@EntryPoint
	void mm() {m();}
}

, the result are q.A.mm()V or q.A.n()V. I wonder why the code just pick one path from any entry point to a specific entry point and ignore other paths.

@yiming-tang-cs
Copy link
Contributor Author

It would be good to add such test cases to the refactoring test suite, even if the assertions are not in place. At least we can see the output.

You have mentioned to test reporting entry points instead of refactoring. I have looked at helper() method. It calls analyzer.analyze(). However, this method contains both reporting dead entry points and refactoring. How can I separate them?

Or I just need to remove the parameters of helper()?

And the output file will be overwritten again and again. Should I rename them to avoid overwriting.

@yiming-tang-cs
Copy link
Contributor Author

How are the results correct for your last example? There should be no dead entry points there. They all lead to methods instantiating streams.

I guess the reason of it may be that one CGNode only binds one method in its call string context.
For the example below:

class A {

	void m() {
		HashSet h1 = new HashSet();
		h1.stream().count();
	}
	
	@EntryPoint
	void n() {
		m();
	}
	
	@EntryPoint
	void mm() {m();}
}

, the CGNode of m() should have two call string contexts. Actually, it only has one. Sometime, it has CallStringContext: [ q.A.mm()V@1 ] and sometime it has CallStringContext: [ q.A.n()V@1 ].

Because of this, I cannot get all entry points for a specific stream creation node and I can just get at most one entry point for a specific stream creation node.

@ponder-lab ponder-lab deleted a comment from yiming-tang-cs Mar 12, 2018
@khatchad
Copy link
Member

#187 (comment) should be another issue as well.

@yiming-tang-cs
Copy link
Contributor Author

yiming-tang-cs commented Mar 12, 2018

I've changed the code snippet of getting a set of stream creation nodes. Then, the results are right.
The test case:

class A {

	void m() {
		HashSet h1 = new HashSet();
		h1.stream().count();
	}
	
	@EntryPoint
	void n() {
		m();
	}
	
	@EntryPoint
	void mm() {m();}
}

, I get nothing in the txt file.

When the test case is:

class A {

	@EntryPoint
	void m() {
		HashSet h1 = new HashSet();
		h1.stream().count();
	}
	
	@EntryPoint
	void n() {
		m();
	}
	
	@EntryPoint
	void mm() {m();}
}

, I also get 0 dead entry points.

When the test is

class A {

	@EntryPoint
	void m() {
		HashSet h1 = new HashSet();
		h1.stream().count();
	}
	
	@EntryPoint
	void n() {
		m();
	}
	
	@EntryPoint
	void mm() {}
}

, the output is

q.A.mm()V

@yiming-tang-cs
Copy link
Contributor Author

yiming-tang-cs commented Mar 12, 2018

Each node represents a method IMethod in a context.
http://wala.sourceforge.net/wiki/index.php/UserGuide:CallGraph

I did not find any official words to describe edges. I think each edge means the predecessor calls the successor.

The test case:

class A {

	void m() {
		HashSet h1 = new HashSet();
		h1.stream().count();
	}
	
	@EntryPoint
	void n() {
		m();
	}
	
	@EntryPoint
	void mm() {m();}
}

The CallGraph class represents potentially context-sensitive call graphs.
https://github.com/wala/WALA/wiki/Call-Graph

Hence, there should be 2 nodes of m() in the call graph because m() has two different contexts.
image
Node 0:
image
Node 1:
image

I found the context of a CGNode only contains call string context:
image
Can I say call string context is used to distinguish different CGNodes for the same method?

Then, I found the CGNode Node: < Application, Lq/A, m()V > Context: CallStringContext: [ q.A.mm()V@1 ] is reachable from the only entry point node Node: < Application, Lq/A, mm()V > Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@6 ] and the CGNode Node: < Application, Lq/A, m()V > Context: CallStringContext: [ q.A.n()V@1 ] is reachable from the only entry point node Node: < Application, Lq/A, n()V > Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@9 ] by debugging.

image

@@ -362,20 +362,20 @@ private IR getEnclosingMethodIR(EclipseProjectAnalysisEngine<InstanceKey> engine
}

/**
* @return The {@link CGNode} representing the enclosing method of this stream.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is from cherry pick and I need your change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. This should not be part of the change set. No need to cherry-pick. You need to do a merge.

Copy link
Member

@khatchad khatchad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Includes changes that are not part of the problem. See comments.

CallGraph callGraph) {
Set<CGNode> deadEntryPoints = new HashSet<CGNode>();
Set<String> aliveClass = new HashSet<String>();
Set<CGNode> cotersOrStaticInitializerNodes = new HashSet<CGNode>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ctors.

@@ -362,20 +362,20 @@ private IR getEnclosingMethodIR(EclipseProjectAnalysisEngine<InstanceKey> engine
}

/**
* @return The {@link CGNode} representing the enclosing method of this stream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. This should not be part of the change set. No need to cherry-pick. You need to do a merge.

This reverts commit e90d188.
Set<CGNode> streamNodes = new HashSet<CGNode>();
while (streamIterator.hasNext()) {
Stream stream = streamIterator.next();
streamNodes.addAll(stream.getEnclosingMethodNodes(engine));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also catch the NoEnclosingMethodNodeFoundException here to avoid the interruption of a program.

Stream stream = streamIterator.next();
try {
streamNodes.addAll(stream.getEnclosingMethodNodes(engine));
} catch (NoEnclosingMethodNodeFoundException e) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the exception is not caught here, the tool cannot produce the correct result for the test case below:

class A {

	void m() {
		HashSet h1 = new HashSet();
		h1.stream().count();
	}

	@EntryPoint
	void n() {
		HashSet h2 = new HashSet();
		h2.stream().count();
	}
}

The program will be interrupted by throwing a NoEnclosingMethodNodeFoundException because there is no CGNode for m() in the call graph.

@ponder-lab ponder-lab deleted a comment from yiming-tang-cs Mar 26, 2018
} catch (IOException | CoreException | CancelException e) {
LOGGER.log(Level.SEVERE,
"Exception encountered while building call graph for: " + project.getElementName() + ".", e);
throw new RuntimeException(e);
}

Collection<CGNode> deadEntryPoints;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collection<CGNode> deadEntryPoints = new HashSet<>();

Collection<CGNode> deadEntryPoints;
if (!usedEntryPoints.isEmpty())
deadEntryPoints = discoverDeadEntryPoints(engine);
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, you wouldn't need this else clause.

*/
private Set<CGNode> getStreamCreationNodes(Iterator<Stream> streamIterator,
EclipseProjectAnalysisEngine<InstanceKey> engine) {
Set<CGNode> streamNodes = new HashSet<CGNode>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set<CGNode> streamNodes = new HashSet<>();

* a {@link CallGraph}.
* @return a set of dead entry points.
*/
private Set<CGNode> getDeadEntryPointNodes(Collection<CGNode> entryPointNodes, Set<CGNode> streamNodes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be an instance method?

Set<CGNode> ctorsOrStaticInitializerNodes = new HashSet<CGNode>();
for (CGNode entryPointNode : entryPointNodes) {
// We will process ctors and static initializers later
if (Util.isCtors(entryPointNode) || Util.isStaticInitializers(entryPointNode)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English: these method names should be singular, e.g., Util.isCtor().

* @param callGraph
* @return true: reachable; false: unreachable
*/
private boolean isReachable(CGNode entryPointNode, Collection<CGNode> streamNodes, CallGraph callGraph) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be an instance method?

} catch (IOException | CoreException | CancelException e) {
LOGGER.log(Level.SEVERE,
"Exception encountered while building call graph for: " + project.getElementName() + ".", e);
throw new RuntimeException(e);
}

Collection<CGNode> deadEntryPoints;
if (!usedEntryPoints.isEmpty())
deadEntryPoints = discoverDeadEntryPoints(engine);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an INFO log here for each entry point?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log one info for each dead entry point (one per line).

@khatchad
Copy link
Member

Currently, I do not build the call graph again. I have several questions about it.

  1. Where should the tool build the call graph again?

Probably line 243 in https://github.com/ponder-lab/Java-8-Stream-Refactoring/pull/187/files.

  1. The variables, such as ret and projectAnalysisResult should store the information for the first building or the second building?

If you do it on line 243 (above), then usedEntryPoints will reflect the latest build, however, deadEntryPoints will be from the original build.

// rebuild the callgraph
usedEntryPoints = getPrunedEntryPoints(deadEntryPoints, usedEntryPoints);
buildCallGraphFromEntryPoints(engine, usedEntryPoints);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting.

@yiming-tang-cs
Copy link
Contributor Author

I've evaluated the streamql for four times, but the result is not good.

subject #entrypoint dead entry point old version time (s)
streamql 92 0 no 752.56
streamql 241 18 no 830.43
streamql 259 / yes 777.425
streamql 92 / yes 749.496

In my opinion, the reasons may be

  1. This result is gotten occasionally and this can be avoided by evaluating project several times.
  2. The evaluation may cost less time if the call graph is built by less entry points, but the second building call graph may cost too much time.
  3. The number of dead entry points is not so large to reduce enough evaluating time and the reduced time is shorter than difference of evaluating time for each evaluation.

@khatchad
Copy link
Member

What is old version?

@khatchad
Copy link
Member

subject #entrypoint dead entry point old version time (s)
streamql 241 18 no 830.43
streamql 259 / yes 777.425

Why are the number of entry points different here?

@khatchad
Copy link
Member

khatchad commented Apr 2, 2018

It sounds to me that if rebuilding the call graph is too expensive, we'll have to find a way to alter the original call graph. But, I would say make sure that this is the operation that is dominating the run time. If, on the other hand, the dominant run time is finding dead entry points, then there's no hope. We should profile this and see where most of the time is being spent, i.e., in the call graph reconstruction or in the finding of the dead entry points.

@khatchad khatchad closed this Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants