Pointer analysis

Determining what or where each pointer points to in program code From Wikipedia, the free encyclopedia

In computer science, pointer analysis, or points-to analysis, is a static code analysis technique that establishes which pointers, or heap references, can point to which variables, or storage locations. It is often a component of more complex analyses such as escape analysis. A closely related technique is shape analysis.

This is the most common colloquial use of the term. A secondary use has pointer analysis be the collective name for both points-to analysis, defined as above, and alias analysis. Points-to and alias analysis are closely related but not always equivalent problems.

Example

Consider the following C program:

int *id(int* p) {
  return p;
}
void main(void) {
  int x;
  int y;
  int *u = id(&x);
  int *v = id(&y);
}

A pointer analysis computes a mapping from pointer expressions to a set of allocation sites of objects they may point to. For the above program, an idealized, fully precise analysis would compute the following results:

More information Allocation site ...
Pointer expressionAllocation site
&xmain::x
&ymain::y
umain::x
vmain::y
pmain::x, main::y
Close

(Where X::Y represents the stack allocation holding the local variable Y in the function X.)

However, a context-insensitive analysis such as Andersen's or Steensgaard's algorithm would lose precision when analyzing the calls to id, and compute the following result:

More information Allocation site ...
Pointer expressionAllocation site
&xmain::x
&ymain::y
umain::x, main::y
vmain::x, main::y
pmain::x, main::y
Close

Introduction

Summarize
Perspective

As a form of static analysis, fully precise pointer analysis can be shown to be undecidable.[1] Most approaches are sound, but range widely in performance and precision. Many design decisions impact both the precision and performance of an analysis; often (but not always) lower precision yields higher performance. These choices include:[2][3]

  • Field sensitivity (also known as structure sensitivity): An analysis can either treat each field of a struct or object separately, or merge them.
  • Array sensitivity: An array-sensitive pointer analysis models each index in an array separately. Other choices include modelling just the first entry separately and the rest together, or merging all array entries.
  • Context sensitivity or polyvariance: Pointer analyses may qualify points-to information with a summary of the control flow leading to each program point.
  • Flow sensitivity: An analysis can model the impact of intraprocedural control flow on points-to facts.
  • Heap modeling: Run-time allocations may be abstracted by:
    • their allocation sites (the statement or instruction that performs the allocation, e.g., a call to malloc or an object constructor),
    • a more complex model based on a shape analysis,
    • the type of the allocation, or
    • one single allocation (this is called heap-insensitivity).
  • Heap cloning: Heap- and context-sensitive analyses may further qualify each allocation site by a summary of the control flow leading to the instruction or statement performing the allocation.
  • Subset constraints or equality constraints: When propagating points-to facts, different program statements may induce different constraints on a variable's points-to sets. Equality constraints (like those used in Steensgaard's algorithm) can be tracked with a union-find data structure, leading to high performance at the expense of the precision of a subset-constraint based analysis (e.g., Andersen's algorithm).

Context-insensitive, flow-insensitive algorithms

Pointer analysis algorithms are used to convert collected raw pointer usages (assignments of one pointer to another or assigning a pointer to point to another one) to a useful graph of what each pointer can point to.[4]

Steensgaard's algorithm and Andersen's algorithm are common context-insensitive, flow-insensitive algorithms for pointer analysis. They are often used in compilers, and have implementations in SVF [5] and LLVM.

Flow-insensitive approaches

Summarize
Perspective

Many approaches to flow-insensitive pointer analysis can be understood as forms of abstract interpretation, where heap allocations are abstracted by their allocation site (i.e., a program location).[6]

Thumb
Flow-insensitive pointer analyses often abstract possible runtime allocations by their allocation site. At runtime, this program creates three separate heap allocations. A flow-insensitive pointer analysis would treat these as a single abstract memory location, leading to a loss of precision.

Many flow-insensitive algorithms are specified in Datalog, including those in the Soot analysis framework for Java.[7]

Context-sensitive, flow-sensitive algorithms achieve higher precision, generally at the cost of some performance, by analyzing each procedure several times, once per context.[8] Most analyses use a "context-string" approach, where contexts consist of a list of entries (common choices of context entry include call sites, allocation sites, and types).[9] To ensure termination (and more generally, scalability), such analyses generally use a k-limiting approach, where the context has a fixed maximum size, and the least recently added elements are removed as needed.[10] Three common variants of context-sensitive, flow-insensitive analysis are:[11]

  • Call-site sensitivity
  • Object sensitivity
  • Type sensitivity

Call-site sensitivity

In call-site sensitivity, the points-to set of each variable (the set of abstract heap allocations each variable could point to) is further qualified by a context consisting of a list of callsites in the program. These contexts abstract the control-flow of the program.

The following program demonstrates how call-site sensitivity can achieve higher precision than a flow-insensitive, context-insensitive analysis.

int *id(int* p) {
  return p;
}
void main(void) {
  int x;
  int y;
  int *u = id(&x);  // main.3
  int *v = id(&y);  // main.4
}

For this program, a context-insensitive analysis would (soundly but imprecisely) conclude that p can point to either the allocation holding x or that of y, so u and v may alias, and both could point to either allocation:

More information Allocation site ...
Pointer expressionAllocation site
&xmain::x
&ymain::y
umain::x, main::y
vmain::x, main::y
pmain::x, main::y
Close

A callsite-sensitive analysis would analyze id twice, once for main.3 and once for main.4, and the points-to facts for p would be qualified by the call-site, enabling the analysis to deduce that when main returns, u can only point to the allocation holding x and v can only point to the allocation holding y:

More information Context, Allocation site ...
ContextPointer expressionAllocation site
[]&xmain::x
[]&ymain::y
[]umain::x
[]vmain::y
[main.3]pmain::x
[main.4]pmain::y
Close

Object sensitivity

In an object sensitive analysis, the points-to set of each variable is qualified by the abstract heap allocation of the receiver object of the method call. Unlike call-site sensitivity, object-sensitivity is non-syntactic or non-local: the context entries are derived during the points-to analysis itself.[12]

Type sensitivity

Type sensitivity is a variant of object sensitivity where the allocation site of the receiver object is replaced by the class/type containing the method containing the allocation site of the receiver object.[13] This results in strictly fewer contexts than would be used in an object-sensitive analysis, which generally means better performance.

References

Bibliography

Loading related searches...

Wikiwand - on

Seamless Wikipedia browsing. On steroids.