\section{Introduction} Summarization in static analysis is a critical task for enabling compilers and developers to implement desired optimizations in programs and to rapidly identify and understand entities that must be changed to achieve a given task


Summarization in static analysis is a critical task for enabling compilers and developers to implement desired optimizations in programs and to rapidly identify and understand entities that must be changed to achieve a given task. For example, compilers rely on certain summaries from the static analysis of concurrent programs to optimize the placement and type of synchronization operations. These summaries are also used by compilers to notify programmers about the potentially conflicting concurrent object accesses by multiple threads.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

The summary information must be accurate, precise and inexpensive to generate. Transforming the information from static analysis into a useful summary can be difficult based on the summary use case. Given a static analysis use case, there are different challenges in constructing a sound summary as well as a variety of possible approaches to optimize the process. The two summarization techniques discussed in this paper have different use cases from one another and they employ different methodologies for summarization.

The first technique by Ali and Lhot{\’a}k \cite{DBLP:conf/ecoop/AliL13} is used for call graph construction. It focuses on object-oriented programs that depend on libraries. In such programs even if the application code is small in size, its dependency libraries often contain a lot of classes and methods making the call graph construction an expensive process. Ali and Lhot{\’a}k \cite{DBLP:conf/ecoop/AliL13} introduce a tool called Averroes that generates a summarized placeholder library with significantly reduced number of classes and methods in it as compared to the original library. The original library code is not used in this process and the placeholder library is based entirely on the analysis performed on the application code. After the placeholder library is generated, it is used with the application code as input to a whole-program analysis framework to construct a sound call graph. The summarized placeholder library makes this whole-program analysis significantly faster while also requiring lesser resource usage.

The second summarization technique by von Praun and Gross \cite{vonPraun:2003:SCA:781131.781145} focuses on multi-threaded object-oriented programs in which object sharing by multiple threads at runtime causes conflicts. In such programs overapproximating the interaction of threads with the objects by performing a static analysis enables the compiler to detect potential object sharing between threads at runtime. The results of this analysis are summarized in the Object Use Graph (OUG). When the compiler is able to detect conflicts using an OUG it uses this information to issue warnings, optimize the placement of synchronization operations and efficiently add instrumentation to detect access violations at runtime.

The remainder of this paper describes how Averroes summarizes methods and classes of a program’s dependency libraries in order to generate a placeholder library for faster whole-program analysis (section 2), describes the approach to construct OUG (section 3), compares and evaluates the two discussed summarization approaches (section 4) and presents a conclusion (section 5).

\section{Using Summarized Placeholder Libraries for Whole Program Analysis}
In modern object-oriented programming languages, the applications dependent on libraries present two challenges in constructing their call graphs. Firstly, the memory requirements for call graph generation are high because of library dependence. Secondly, the time it takes to analyze the whole program including the library code increases with the number of methods and size of the library (e.g., the Java standard library is sized at 25 MB).

These challenges are a consequence of how the call graph construction algorithms account for dynamic dispatch behavior in which the target of a call is determined on the basis of runtime type of the receiver. Since the receiver can exist anywhere in the application or its dependency library, these algorithms must approximate the flow of potential receivers throughout the program including all classes of the dependency libraries. When the dependency libraries are large the whole-program analysis becomes expensive and time consuming even for small application code.

The Averroes tool presented by Ali and Lhot{\’a}k \cite{DBLP:conf/ecoop/AliL13} eliminates the need for using the original library for call graph construction. With this summarization tool, sound call graphs can be constructed in the following two phases in which no library dependencies are used at any stage.
\item Averroes generates a placeholder library by taking only the application code as input
\item Any existing whole-program analysis framework uses the summarized placeholder library in conjunction with the application code to construct the call graph
The placeholder library overapproximates the behavior of the original library and contains a significantly reduced number of methods and file size. The basic idea of the placeholder library generation originates from \textit{separate compilation assumption}.

\subsubsection{Separate Compilation Assumption}
The underlying difference between the traditional whole program analysis approach and the placeholder library approach is the \textit{separate compilation assumption} and the constraints that follow from it. The assumption implies that a library can be compiled separately without the application that uses it. This implication infers some constraints imposed between application and library class hierarchy, instantiation, local variables, method calls, field and array access pattern, static initialization and exception handling. For example, the class hierarchy constraint means that an application class or interface cannot be extended by a library class. The constraints are described with detail in \cite{DBLP:conf/ecoop/AliL13}.

\caption{Comparing soundness of Averroes-based tools to the whole-program tools with respect to the dynamic call graphs \cite{DBLP:conf/ecoop/AliL13}}

\subsection{Generating Summarized Placeholder Libraries}
The main idea of call graph generation using placeholder libraries is that since a library code can be compiled separately, any placeholder library that would completely model the constraints following from the separate compilation assumption (discussed in section 2.0.1) would approximate the behavior of the original library. In other words, the goal of Averroes is to summarize the necessary classes and methods into a placeholder library that exercises all behaviors of the original library on the basis of separate compilation assumption. To achieve this, Averroes uses only the application code as input in the following manner.

Averroes uses Soot \cite{Vallee-Rai:2000:OJB:647476.727758} to find out all classes of the input application program. Within these classes it inspects the constant pool to search for all referenced classes, methods and fields of the library. The superclasses and superinterfaces of the referenced classes are also taken into account. Once all of these are consulted, Averroes has a constant pool of a restricted set of classes. Based on this information Averroes constructs a model of the class hierarchy and the overriding relationships among methods of the application. This constitutes the resulting placeholder library that contains placeholders for all the methods and fields referenced by the application. Averroes again uses Soot to output this model as an actual placeholder library.

\subsubsection{Contents of the Summarized Placeholder Library}
The summary of classes contained in Averroes placeholder library are classified into the following three categories:

\item \textbf{Directly referenced library classes:} includes directly referenced classes, their superclasses and superinterfaces, along with some classes required by whole-program analysis (e.g., \texttt{java.lang.Object}). Placeholders for all referenced methods are contained in these classes. A default public constructor is included with every class to make at least one constructor available for every class.
\item \textbf{Concrete Implementation Classes:} The library can create an object of any type as per the class instantiation constraint of the separate compilation assumption and the application can access an object of unreferenced type via its super-type. Therefore, a concrete implementation class is created by Averroes for every abstract class and interface referenced by the application. These classes contain all abstract method implementations in the abstract class or interface that triggered the creation of the concrete implementation class.
\item \textbf{Averroes Library Class:} \texttt{AverroesLibraryClass} with its following two members implements the possible behaviors of the library based on especially the following constraints from section 2.0.1: callbacks to application methods, object instantiation, exception handling and array writes.
\item \textbf{\texttt{doItAll()}}: is a public static method of the Averroes Library Class that exercises all of the behaviors resulting from the above constraints
\item \textbf{\texttt{libraryPointsTo}}: is a public static field which is assigned any object that could be assigned to any local variable in the original library

%\subsection{Static Call Graphs Constructed with Placeholder Libraries}%
\subsection{Call Graph Construction with Placeholder Libraries}
The placehlder library generated with Averroes has a single use case i.e. call graph generation. Once a placeholder library specific to an application code has been generated, any existing whole-program analysis framework can use the placeholder library in conjunction with the application code to construct a sound call graph

In order to evaluate the resulting call graphs, they can be compared with traditional call graphs in terms of overall time consumed in graph generation and more importantly soundness.

The Averroes tool generates the placeholder libraries much smaller in size and with much fewer number of methods than the original Java libraries used with the applications tested. For example, the average size of the original libraries tested by \cite{DBLP:conf/ecoop/AliL13} is 25MB, whereas the average size of the generated placeholder libraries is 80kB. When the placeholder library is used with whole program analysis tools to produce call graphs, the overall time to execute reduces significantly as compared to whole program analysis with the original libraries. Considering also the additional time taken for generating the placeholder libraries, the whole program analysis with original libraries is still significantly slower. For example Spark is on average 4.7x slower in constructing call graphs from original libraries as compared to call graphs constructed with placeholder libraries. Using placeholder libraries has also shown reduction in memory requirements of the whole program analysis tools.

Although dynamic call graph may miss different edges and separate executions, a comparison of dynamic call graph with the static call graph can nevertheless be useful for indicating violations in the static call graph.

In Figure \ref{fig:dynamic-vs-static} The \textsc{Dynamic} row shows the number of call edges in the dynamic call graph, produced by the application side only. The rest of the rows show, out of dynamic call graph edges how many edges are missed by the static call graphs generated by Doop and Spark with (Doop$_{AVE}$ and Spark$_{AVE}$) and without using a \textsc{Averros} generated placeholder library. The call graphs created by Spark and Doop without using placeholder libraries show a significant number of edges missing whereas the call graphs using the placeholder libraries have nominal misses.

\section{Summarizing Statically Detected Potential Conflicts in Multithreaded OOPs}

The detection of conflicts arising from object sharing in multi-threaded object-oriented programs plays a key role for the compilers to implement optimizations and reduce redundant synchronization operations. von Praun and Gross \cite{vonPraun:2003:SCA:781131.781145} present an approach that statically analyzes programs to show object accesses in a happened-before relationship in the form of Object Use Graph (OUG). This approach summarizes the potential conflicts in a more accurate way than similar implementations such as Heap Shape Graph (HSG) which lack the analysis of temporal, structural and lock-based aspects of object accesses.

An Object Use Graph builds upon the Heap Shape Graph (HSG) which is a compile-time abstraction for runtime objects (nodes of the heap) and their reference relations (edges of the heap). Although a heap shape graph includes information about an object accessed by multiple threads, it does not regard the temporal aspect of the access: whether the access occurs concurrently or in a happened-before manner. Object Use Graphs can approximate this relationship (as done in this approach) and summarize the conflicting shared objects in significantly reduced numbers closer to the runtime conflicts observed.

\subsection{OUG Construction}

In order to construct an OUG, the control flow inside individual threads and details about lock protection, object escape, thread-start and join are used to derive lock-based, structural and temporal information.

Construction of OUG involves the following phases:

\subsubsection{Abstract Thread Approximation}
Abstract threads are compile time abstraction of runtime threads. There are three types of abstract threads approximated in this phase.

\item an \textit{init} thread for every method that initializes a class
\item a \textit{main} thread related to the entry point of a the program
\item \textit{user threads} such as instances of \texttt{java.lang.Thread}
(Multiple or multiply executed allocation sites in a user thread signify that the multiple instances are concurrent.)

\subsubsection{HSG Construction}
In a HSG, methods are analyzed separately and the resulting method summaries are used to transfer the changes emerging from method executions to their call sites. The nodes in a HSG are alias sets. Alias sets represent the runtime values of reference variables. Alias sets contain a mapping (call \textit{fieldmap}) of fully qualified field names to the objects reachable from these fields. The edges of HSG are references to alias sets contained in the \textit{fieldmap}.

To approximate the effect of method calls inter-procedural and intraprocedural analyses are performed.
Inter-procedural analysis
Alias sets of abstract threads and classes are the root of HSG which are created in the inter-procedural analysis. The graph is extended by appending further alias sets to these roots during the intra-procedural analysis. The traversal of methods and threads during Inter-procedural analysis is in the reverse order of method invocation.

Intra-procedural analysis
In order to model the execution effect of a method to its calling context a method summary is generated by intra-procedural analysis. Method summary of a method m is defined as:

\textit{MSm := ((f$_{0}$, …,f$_{n}$), ret, except, allocs, reads, writes)}

It uses parameters (f0,…,fn – alias sets for all formal reference parameters and local variables), return values (ret – alias sets for the return values), and thrown exceptions (except – alias sets for thrown exceptions). These are the data shared between caller and callee upon method invocation. In addition, the effect of the method execution in terms of allocated, read and written objects is recorded.

Method summary construction starts by a control-flow insensitive traversal of the method statements and incrementally builds a method’s summary by the following method.

\item assignment combines alias sets
\item field and array accesses create field-reference relations (all slots of an array are represented by a single symbolic field)
\item object allocation and access is recorded in the sets allocs, reads and writes

Every abstract thread has a unique ID which is stored at object access events (method call, allocation, field/array access) in the alias set modelling the access target. Through this information it can be determined whether multiple or multiply executed abstract threads access any objects.

\subsubsection{Symbolic Execution}
This phase further summarizes the abstract object classification to create distinct between shared abstract objects as either conflicting or non-conflicting. During the sybolic execution OUGs are incrementally generated for all shared objects in HSG.

\subsection{Using OUG to Summarize Object Sharing by Threads}
OUG events are determined conflicting based on the following four conditions:
\item there is no ordering between the events
\item the events stem from different threads
\item at least one event is a PUT(write access to a field of the object)
\item the accesses are not done under common lock protection
Based on the above criteria OUG is able to summarize conflicts at compile time, which in turn represent the potential data race conditions at runtime. The summary allows us to identify the patterns that usually indicate thread synchronization and protect shared objects from unordered access.

The two summarization techniques described in this paper emphasize on different important aspects of summarization.

The technique presented in the first paper \cite{DBLP:conf/ecoop/AliL13} takes into account how most call graph construction algorithms attempt to track the flow of potential receivers throughout library-dependent programs to tackle the dynamic dispatch behavior. It uses separate compilation assumption and shows that when the call graph is generated by using a placeholder library with the application, instead of doing a whole program analysis with the actual library, the number of classes and methods needed to be analyzed for sound call graph generation is greatly reduced.

The technique in the second paper \cite{vonPraun:2003:SCA:781131.781145} complements existing techniques for generating compile time abstraction of runtime objects and their relations. This technique additionally recognizes temporal, structural and lock-based aspects of object accesses to present a more precise summary of potential conflicts via Object Use Graphs. The precision of this information is critical for introducing optimization, issuing warnings and adding instrumentation in an efficient way. OUGs provide information that is more precise than what can be obtained by analyses that do not model the temporal relationships. As a result, fewer accesses are approximately classified as conflicting, so a compiler that wants to draw the user’s attention to those (possibly erroneous) accesses has fewer accesses to report.

Although the basic ideas presented in these papers produce very useful summarization techniques, there are some scenarios where one would show more gains than the other. The approach in the first paper focuses on library-dependent applications, where even a small application may require a resource and time intensive static analysis that can span over a myriad number of classes and methods of the original library. In such cases the impact of using a summarized placeholder library based on the class hierarchy and information inferred from the input program can greatly reduce the overall resource and time required for constructing a sound call graph. The second approach implementing the Object Use Graph technique is useful in multithreaded programs where the reported potential conflicts by the OUG can help the compiler issue warnings, to add optimized instrumentation and detect potential access violations at runtime.

This paper explains two summarization techniques with different use cases. The first technique which is used for the construction of call graphs, summarizes methods and classes from an actual dependency library into a smaller size placeholder library. This improves the analysis time required for whole-program call graph construction. The second summarization technique discussed in this paper statically captures object accesses from different threads and summarizes this information into an Object Use Graph (OUG). The OUG can be used to detect conflicts arising from concurrent object accesses by multiple threads in multithreaded programs. Both of these techniques involve different summarization approaches which are compared in this paper in terms of effectiveness in potential optimizations.

%\end{document} % This is where a ‘short’ article might terminate