-
Notifications
You must be signed in to change notification settings - Fork 139
ECJ Analyse
< JDT Core Programmer Guide | ECJ
Flow analysis is dedicated to analysing for each location in a method body, which variable is potentially / definitely assigned or unassigned, null or nonnull.
Note that no interprocedural analysis is performed, as this is not feasible performancewise. For the case of null analysis, this is the main reason for using null annotations.
The analysis status for each location in the execution graph of a method body is captured by an instance of the light-weight class FlowInfo.
- FlowInfo is abstract, mostly dedicated to defining a protocol for
sub classes
- UnconditionalFlowInfo is the main work horse, capturing real state.
- ConditionalFlowInfo is a composite of two flow infos,
initsWhenTrue
andinitsWhenFalse
, which is able to capture all relevant information after evaluating anif
condition, e.g..
Inside UnconditionalFlowInfo, the optimized encoding of this status uses
several long
fields as bitvectors to capture information concerning 64
variables in scope. The bit position in this vector corresponds to
VariableBinding.id
for fields and VariableBinding.id + maxFieldCount
for locals.
- definiteInits: this bit indicates if the given variable is definitedly assigned at the current position
- potentialInits: indicates if the given variable is potentially assigned (i.e., some flow exists leading to this position, where the variable is indeed assigned).
- if both above bits are 0 then the given variable is definitely unassigned.
- nullBit1, ... nullBit4: for each variable, a bit slice through these
four fields gives a four-bit value encoding the "null status". See
comment in UnconditionalFlowInfo for the interpretation of the 15
relevant values (value 1000 is unused). In this table, the following
abbreviations are used:
- n - null
- nn - nonnull
- u - unknown, this is used for values from an external source: method arguments, method call results. Note that this is not related to "potentially" as used in situations with local branching.
- def - definitely
- pot - potentially
- prot - protected (prot nn means, the variable must be non-null, because it has been dereferenced before without through NPE -- I don't have an example for prot n).
Originally, the encoding has probably been chosen as to give individual bits distinct meaning, but then some bit combinations are used in unrelated ways. This implies that each combination of two null status values is a complex bit operation. For normal compiler maintenance these operations can safely be regarded as black box. Details follow below.
- iNBit, iNNBit: additional bits added at a later point to improve analysis of loops (see bug 453483, and these N&N. The (tricky) implementation idea is described in bug 453483#c8, with a slightly higher level explanation in bug 453483#c9.
Bugs relating to flow analysis with loops may be marked with [loop] as a prefix in the bug summary.
Some bug reports complain that ecj doesn't recognize "obvious" flows of null / nonnull. One cluster of these issues would require "correlation analysis", i.e., an analysis that captures correlations between different variables, like, "when flag f is true then reference x cannot be null".
As of today, no-one has come forward with an implementation strategy that would fit the typical requirements of ecj, in particular speed. Hence, correlation analysis is considered out of scope for ecj, unfortunately.
Existing bugs have been marked with prefix [correlation].
Since the above talks of long
bitvectors, it can encode the state only
for 64 variables. So how do we cope, if there are more than 64 variables
in scope?
Answer: field extra
, a dynamically growing array-of-array-of-long.
- First index encodes correspondence to the original fields:
- 0 : definiteInits
- 1 : potentialInits
- X+1, X in [1...4]: nullBitX
- IN(6): iNBit
- INN(7): iNNBit
- Second index corresponds to "variable id / 64" - this is the dynamically growing part, each additional slice adding space for 64 additional variables.
Operations using this field extra
are specifically tested using
the flag
jdt.flow.test.extra.
While ideally each AST node directly produces effects by computing a new FlowInfo, checking of some effects must be deferred, until all flows in a given context are known. The simplest example is a loop, where each position inside the loop body should also consider complete flows through the loop body which then return to the beginning of the loop body (i.e., we are analysing something like 1.5 iterations of each loop).
Specific flow contexts exist for loops, switches, try blocks, finally blocks etc.
Flow contexts have two families of methods:
-
record*
: during the first pass record the effect of a given AST node -
check*
andcomplainOnDeferred*
: when all individual flows have been computed, recheck all recorded information if it indicates any problem.
Outside FlowInfo, nullness is expressed using an int
nullStatus, as
one of UNKNOWN, NULL, NON_NULL, POTENTIALLY_UNKNOWN,
POTENTIALLY_NULL, POTENTIALLY_NON_NULL (constants in FlowInfo).
-
FlowInfo.nullStatus(LocalVariableBinding)
- Here a given flow info computes the nullStatus for the provided variable. -
Expression.nullStatus(FlowInfo,FlowContext)
- Give an incoming flow info and flow context, computes the null status of the given expression. Different kinds of expression provided different implementations.
Generally, each ASTNode will compute the FlowInfo that represents the state after its execution (if any). To achieve this, all state changing operations of FlowInfo should only be invoked an a flow info that is owned by the current ASTNode.
This implies that ASTNode wishing to update the incoming flow info must first copy the incoming flow info, before performing any updates.
Any ASTNode that represents some kind of branching, should first delegate to its children to compute each child's flow info (after execution), and then combine those flow infos according to its own logic. That's why analyseCode()
methods take an incoming flow info (from the parent node) and produce a new flow info to be combined by its parent, to yield the parent's final flow info (recursively).
When value of type Closeable
is handled, a specific use of flow
analysis determines, if that resource is closed on all paths.
Bugs relating to resource leak detection should have [resource] as a prefix in the bug summary for easier searching. Here are all currently open bugs in this area.
The core trick here is to synthesize a special LocalDeclaration of
subtype FakedTrackingVariable
which pretends that we have an extra
local variable. Whenever we see activity concering the resource value,
we encode the observation as "null info" concerning that faked tracking
variable.
- initially the tracking variable is marked as null
- when we see a
close()
call, we mark the tracking variable as nonnull.
Simply speaking, at the end of each tracking variable's scope, the tracking variable must be marked as definitely nonnull, or a leak (potential or definite) exists. Exception: a try-with-resources block automatically closes.
- Indeed, the motivation for this analysis was to alert users of cases where try-with-resources should be used (language feature and analysis were added in the Java 7 era).
Since these checks are connected to the end of a block, this is
implemented as Block.checkUnclosedCloseables()
.
Since leak analysis has a few requirements that differ from null
analysis, the following bit-wise information is stored in
FakedTrackingVariable.globalClosingState
: CLOSE_SEEN,
SHARED_WITH_OUTSIDE, OWNED_BY_OUTSIDE, CLOSED_IN_NESTED_METHOD,
REPORTED_EXPLICIT_CLOSE, REPORTED_POTENTIAL_LEAK,
REPORTED_DEFINITIVE_LEAK, FOREACH_ELEMENT_VAR (see the code comments
for explanations).
To understand nested resources where closing the outermost resource
is assumed to close also all inner resources (like, e.g.,
BufferedInputStream), each FakedTrackingVariable can be linked to an
innerTracker
and an outerTracker
.
In compiler warning messages, we prefer to use the name of a real local variable to signal which element lacks closing. For values that are never assigned to a local variable, and still subject to flow analysis, one of the constants UNASSIGNED_CLOSEABLE_NAME or UNASSIGNED_CLOSEABLE_NAME_TEMPLATE is used ("<unassigned Closeable value ...>")
Resource analysis is intended to avoid leaks of gc-resistent resources, i.e., resources outside of what the Java garbage collector can free.
Since subtypes of Closeable
have different semantics wrt the need to
close, interface TypeConstants
defines some black lists / white lists:
- JAVA_IO_WRAPPER_CLOSEABLES, JAVA_UTIL_ZIP_WRAPPER_CLOSEABLES, OTHER_WRAPPER_CLOSEABLES: no gc-resistent resource of its own, but wraps another resource that may be gc resistent.
- JAVA_IO_RESOURCE_FREE_CLOSEABLES: while generally Closeables in
package
java.io
are considered as gc-resistent, classes on this list are not. - JAVA_UTIL_STREAM: generally all subclasses of Stream are considered as not gc-resistent.
- RESOURCE_FREE_CLOSEABLE_J_U_STREAMS: additional classes in
package
java.util.streams
which should be treated likeStream
despite not being subtypes of it. - FLUENT_RESOURCE_CLASSES: resource classes (gc-resistent) meant for
fluent programming: methods returning an instance of the same type
are considered as returning
this
so that invokingclose()
on the final result of a call chainresult = rc.fluent().operation()
is sufficient to also close the original resource. - JAVA_NIO_FILE_FILES: where a method of this class returns a
Stream
, the result must be closed, despite the general rule forStream
After evaluation of a type name, the effect of the above lists are
encoded into ReferenceBinding.typeBits
. Here the following constants
from TypeIds
are relevant:
- BitAutoCloseable (inheritable)
- BitCloseable (inheritable)
- BitWrapperCloseable
- BitResourceFreeCloseable
Of these bits, those which are marked "inheritable" are propagated to all subtypes, unless canceled by a black list.
Additionally, TypeConstants.closeMethods
lists a few library methods
known to class a closeable argument.
TBC