Approaches to Assess Validity of Derivatives and to Improve Efficiency in Automatic Differentiation of Programs

Mauricio Araya (INRIA, BP93, 06902 Sophia-Antipolis, France)

PhD thesis, University of Nice, 2005 (152 pages)

Abstract: The context of this work is Automatic Differentiation (AD). Fundamentally, AD transforms a program that computes a mathematical function into a new program that computes its derivatives. Being automatic, AD can spare a large amount of development effort. AD is increasingly used in Scientific Computing, but some problems still delay its widespread use. In this thesis we propose elements of solution for two of these problems. The first problem is non-differentiability of most real-life programs for some particular inputs. AD tools often overlook this problem. However users need a strong degree of confidence in the derivatives they obtain, to use them e.g. in optimization engines. Non-differentiability in programs may come from various causes, from the mathematical model to the actual implementation choices. Practically, non-differentiability in programs is embodied by the presence of control. Rather than studying notions of extended derivatives, our approach is to describe the domain in the input space for which the function actually remains differentiable. We describe several alternatives to find this domain of validity and we evaluate their complexity. We formally study one method to compute the complete domain, but we discard it because of its cost. Alternatively, we propose a simpler directional method that we have implemented and validated on several examples. The second problem is efficiency of the reverse mode of AD, which computes gradients and is therefore of high interest. Reverse AD programs use the intermediate values from the original program in the reverse order, and this has a cost, whatever the strategy used. Available strategies all rely on a combination of storing and recomputing intermediate values, thus costing memory space and execution time. The central tactique, called "checkpointing", trades duplicate execution of complete segments of the code for memory space. In this work, we formalize the static data-flow behavior of reverse AD programs, taking checkpointing into account. Based on these formal results, we contribute two improvements to reverse AD strategies. First, the data-flow analysis lets us reduce the number of stored values to a minimum, and we give elements of proof of minimality. Second, we obtain indications on the code segments that are most appropriate for checkpointing. To experiment on checkpointing schemes, we extend the data-flow analyses and the reverse mode in our AD tool. We show the benefits on several large Scientific Computing codes.

Keywords: Automatic Differentiation, Reverse mode, Checkpointing, Compilation, Program Transformation, Program Optimization, Static Analysis, Data Flow Analysis, Non-Differentiability, Scientific Computing, Optimization, Adjoint Models.

Full text (pdf)
Summary in French (pdf)


@phdthesis{phdAraya06,
  author = {Araya-Polo, M.},
  title = {Approaches to assess Validity of Derivatives and to
    Improve Efficiency in Automatic Differentiation of Programs},
  type = {PhD},
  school = {Universit{\'e} de Nice Sophia-Antipolis},
  year = 2006
}