Approaches to Assess Validity of Derivatives and to Improve Efficiency in Automatic Differentiation of Programs
Mauricio Araya
(INRIA, BP93, 06902 Sophia-Antipolis, France)
PhD thesis, University of Nice, 2005 (152 pages)
Abstract:
The context of this work is Automatic Differentiation (AD).
Fundamentally, AD transforms a program that computes a
mathematical function into a new program that computes its
derivatives. Being automatic, AD can spare a large amount of
development effort. AD is increasingly used in Scientific Computing, but
some problems still delay its widespread use. In this thesis we
propose elements of solution for two of these problems. The first
problem is non-differentiability of most real-life programs for some
particular inputs. AD tools often overlook this problem. However users
need a strong degree of confidence in the derivatives they obtain, to
use them e.g. in optimization engines. Non-differentiability in programs
may come from various causes, from the mathematical model to the
actual implementation choices. Practically, non-differentiability in
programs is embodied by the presence of control. Rather than studying
notions of extended derivatives, our approach is to describe the
domain in the input space for which the function actually remains
differentiable. We describe several alternatives to find this domain of
validity and we evaluate their complexity. We formally study one
method to compute the complete domain, but we discard it because of
its cost. Alternatively, we propose a simpler directional method that
we have implemented and validated on several examples. The second
problem is efficiency of the reverse mode of AD, which computes gradients
and is therefore of high interest. Reverse AD programs use the
intermediate values from the original program in the reverse order,
and this has a cost, whatever the strategy used. Available strategies
all rely on a combination of storing and recomputing intermediate
values, thus costing memory space and execution time. The central
tactique, called "checkpointing", trades duplicate execution of
complete segments of the code for memory space. In this work, we
formalize the static data-flow behavior of reverse AD programs, taking
checkpointing into account. Based on these formal results, we
contribute two improvements to reverse AD strategies. First, the
data-flow analysis lets us reduce the number of stored values to a minimum,
and we give elements of proof of minimality. Second, we obtain
indications on the code segments that are most appropriate for
checkpointing. To experiment on checkpointing schemes, we extend the
data-flow analyses and the reverse mode in our AD tool. We show the
benefits on several large Scientific Computing codes.
Keywords:
Automatic Differentiation, Reverse mode, Checkpointing,
Compilation, Program Transformation, Program Optimization,
Static Analysis, Data Flow Analysis, Non-Differentiability,
Scientific Computing, Optimization, Adjoint Models.
Full text (pdf)
Summary in French (pdf)
@phdthesis{phdAraya06,
author = {Araya-Polo, M.},
title = {Approaches to assess Validity of Derivatives and to
Improve Efficiency in Automatic Differentiation of Programs},
type = {PhD},
school = {Universit{\'e} de Nice Sophia-Antipolis},
year = 2006
}