Manuel Serrano
inria > sophia > indes
-- Compilation

[29]Serrano, M. and Findler, R. -- Dynamic Property Caches, a Step towards Faster JavaScripts Proxy Objects -- Proceedings of the 29th Compiler Construction Conference (CC'20), San Dieo, USA, feb, 2020.
[ pdf ]
Inline caches and hidden classes are two essential components for closing the performance gap between static languages such as Java, Scheme, or ML and dynamic languages such as JavaScript or Python. They rely on the observation that for a particular object access located at a particular point of the program, the shapes, usually referred to as the hidden classes, of accessed objects are likely to be the same. Taking benefit of that invariant, they replace the expensive lookup the semantics of these languages normally demand with one test, the inline cache, and a memory read indexed by an offset computed during the last cache miss. These optimizations are essential but they are not general enough to cope with JavaScript's proxies. In particular, when the property name is itself unknown statically, inline cache-based optimizations always take a slow path. In this paper, we show how to generalize inline caches to cope with an unknown property name. The paper first discusses the general principle of the extension and then presents the experimental results we collected using a modified version of the Hop JavaScript compiler, demonstrating how the optimization is crucial for improving the performance of proxy objects (as they naturally use dynamic property names extensively). The evaluation report shows that the modified Hop outperforms all other implementations of the language, including the most efficient commercial ones, by a factor ranging from 2 times to 100 times. Even better, our optimizations are applicable to existing compilers as they require only straightforward changes to runtime data structures; no complex analyses are required.
[41]Serrano, M. and Feeley, M. -- Property Caches Revisited -- Proceedings of the 28th Compiler Construction Conference (CC'19), Washington, USA, feb, 2019.
[ pdf ]
Property caches are a well-known technique invented over 30 years ago to improve dynamic object accesses. They have been adapted to JavaScript, which they have greatly contributed to accelerate. However, this technique is applicable only when some constraints are satisfied by the objects, the properties, and the property access sites. In this work, we propose enhancements to improve two common usage patterns: prototype accesses, object extensions, and megamorphic accesses. We have implemented these in the Hopc AOT JavaScript compiler and we have measured their impact. We observe that they effectively complement traditional caches. They reduce cache misses and consequently accelerate execution. Moreover, they do not cause a slowdown in the handling of the other usage patterns.
[28]Serrano, M. -- JavaScript AOT Compilation -- 14th Dynamic Language Symposium (DLS), Boston, USA, Nov, 2018.
[ pdf ]
Static compilation, a.k.a., ahead-of-time (AOT) compilation, is an alternative approach to JIT compilation that can combine good speed and lightweight memory footprint, and that can accommodate read-only memory constraints that are imposed by some devices and some operating systems. Unfortunately the highly dynamic nature of JavaScript makes it hard to compile statically and all existing AOT compilers have either gave up on good performance or full language support. We have designed and implemented an AOT compiler that aims at satisfying both. It supports full unrestricted ECMAScript 5.1 plus many ECMAScript 2017 features and the majority of benchmarks are within 50% of the performance of one of the fastest JIT compilers.
[54]Serrano, M. and Grande, J. -- Locking Fast -- Proceedings of the ACM Symposium on Applied Computing (SAC'14), Gyeongju, Korea, Mar, 2014.
[ pdf | html ]
This article presents several independent optimizations of operations on monitors. They do not involve the low-level mutual exclusion mechanisms but rather their integration with and usage within higher-level constructs of the language. The paper reports acceleration of Hop, the Web programming language for which these optimizations have been created. The paper shows that other languages such as C and Java would also benefit from these optimizations.
[23]Serpette, B. and Serrano, M. -- An Interpreter for Server-Side Hop -- Proceedings of the Dynamic Language symposium (DLS), Portland, USA, Oct, 2011.
[ pdf | html ]
HOP is a Scheme-based multi-tier programming language for the Web. The client-side of a program is compiled to JavaScript, while the server-side is executed by a mix of natively compiled code and interpreted code. At the time where HOP programs were basic scripts, the performance of the server-side interpreter was not a concern; an inefficient interpreter was acceptable. As HOP expanded, HOP programs got larger and more complex. A more efficient interpreter was necessary. This new interpreter is described in this paper. It is compact, its whole implementation counting no more than 2.5 KLOC. It is more than twice faster than the old interpreter and consumes less than a third of its memory. Although it cannot compete with static or JIT native compilers, our experimental results show that it is amongst the fastest interpreters for dynamic languages.
[51]Bres, Y. and Serpette, B. and Serrano, M. -- Bigloo.NET: compiling Scheme to .NET CLR -- Journal of Object Technology, 3(9), Oct, 2004.
[ html | issue_2004_09 ]
This paper presents the compilation of the Scheme programming language to .NET. This platform provides a virtual machine, the Common Language Runtime (CLR), that executes bytecode, the Common Intermediate Language (CIL). Since CIL was designed with language agnosticism in mind, it provides a rich set of language constructs and functionalities. As such, the CLR is the first execution environment that offers type safety, managed memory, tail recursion support and several flavors of pointers to functions. Therefore, the CLR presents an interesting ground for functional language implementations.

We discuss how to map Scheme constructs to CIL. We present performance analyses on a large set of real-life and standard Scheme benchmarks. In particular, we compare the speed of these programs when compiled to C, JVM and .NET. We show that in term of speed performance of the Mono implementation of .NET, the best implementing running on both Windows and Linux, still lags behind C and fast JVMs such as the Sun's implementations.
[3]Bres, Y. and Serpette, B. and Serrano, M. -- Compiling Scheme programs to .NET Common Intermediate Language -- 2nd International Workshop on .NET Technologies, Plzen, Czech Republic, May, 2004.
[ pdf ]
We present in this paper the compilation of the Scheme programming language to .Net platform. .Net provides a virtual machine, the Common Language Runtime (CLR), that executes bytecode, the Common Intermediate Language (CIL). Since CIL was designed with language agnosticism in mind, it provides a rich set of language constructs and functionalities. Therefore, the CLR is the first execution environment that offers type safety, managed memory, tail recursion support and several flavors of pointers to functions. As such, the CLR presents an interesting ground for functional language implementations.

We discuss how to map Scheme constructs to CIL. We present performance analyses on a large set of real-life and standard Scheme benchmarks. In particular, we compare the performances of Scheme programs when compiled to C, JVM and .Net. We show that .Net still lags behind C and JVM.
[25]Serpette, B. and Serrano, M. -- Compiling Scheme to JVM bytecode: a performance study -- 7th Sigplan Int'l Conference on Functional Programming (ICFP), Pittsburgh, Pensylvanie, USA, Oct, 2002.
[ ps.gz ]
We have added a Java virtual machine (henceforth JVM) bytecode generator to the optimizing Scheme-to-C compiler Bigloo. We named this new compiler BiglooJVM. We have used this new compiler to evaluate how suitable the JVM bytecode is as a target for compiling strict functional languages such as Scheme. In this paper, we focus on the performance issue. We have measured the execution time of many Scheme programs when compiled to C and when compiled to JVM. We found that for each benchmark, at least one of our hardware platforms ran the BiglooJVM version in less than twice the time taken by the Bigloo version. In order to deliver fast programs the generated JVM bytecode must be carefully crafted in order to benefit from the speedup of just-in-time compilers.
[39]Serrano, M. -- Inline expansion: when and how -- 9th Int'l Symposium on Programming Language Implementation and Logic Programming (PLILP), Southampton, UK, Sep, 1997, pp. 143--147.
[ ps.gz ]
Inline function expansion is an optimization that may improve program performance by removing calling sequences and enlarging the scope of other optimizations. Unfortunately it also has the drawback of enlarging programs. This might impair executable programs performance. In order to get rid of this annoying effect, we present, an easy to implement, inlining optimization that minimizes code size growth by combining a compile-time algorithm deciding when expansion should occur with different expansion frameworks describing how they should be performed. We present the experimental measures that have driven the design of inline function expansion. We conclude with measurements showing that our optimization succeeds in producing faster codes while avoiding code size increase.
[65]Serrano, M. and Feeley, M. -- Storage Use Analysis and its Applications -- 1fst Sigplan Int'l Conference on Functional Programming (ICFP), Philadelphia, Penn, USA, May, 1996, pp. 50--61.
[ ps.gz ]
In this paper we present a new program analysis method which we call Storage Use Analysis. This analysis deduces how objects are used by the program and allows the optimization of their allocation. This analysis can be applied to both statically typed languages (e.g. ML) and latently typed languages (e.g. Scheme). It handles side-effects, higher order functions, separate compilation and does not require CPS transformation. We show the application of our analysis to two important optimizations: stack allocation and unboxing. The first optimization replaces some heap allocations by stack allocations for user and system data storage (e.g. lists, vectors, procedures). The second optimization avoids boxing some objects. This analysis and associated optimizations have been implemented in the Bigloo Scheme/ML compiler. Experimental results show that for many allocation intensive programs we get a significant speedup. In particular, numerically intensive programs are almost 20 times faster because floating point numbers are unboxed and no longer heap allocated.
[37]P. H. Hartel, et al. -- Pseudoknot: a Float-Intensive Benchmark for Functional Compilers -- Journal of Functional Programming, Juillet1996, pp. 621--655.
[ ps.gz ]
Over 25 implementations of different functional languages are benchmarked using the same program, a floating-point intensive application taken from molecular biology. The principal aspects studied are compile time and execution time for the various implementations that were benchmarked. An important consideration is how the program can be modified and tuned to obtain maximal performance on each language implementation.

With few exceptions, the compilers take a significant amount of time to compile this program, though most compilers were faster than the then current GNU C compiler (GCC version 2.5.8). Compilers that generate C or Lisp are often slower than those that generate native code directly: the cost of compiling the intermediate form is normally a large fraction of the total compilation time.

There is no clear distinction between the runtime performance of eager and lazy implementations when appropriate annotations are used: lazy implementations have clearly come of age when it comes to implementing largely strict applications, such as the Pseudoknot program. The speed of C can be approached by some implementations, but to achieve this performance, special measures such as strictness annotations are required by non-strict implementations.

The benchmark results have to be interpreted with care. Firstly, a benchmark based on a single program cannot cover a wide spectrum of `typical' applications. Secondly, the compilers vary in the kind and level of optimisations offered, so the effort required to obtain an optimal version of the program is similarly varied.
[12]Serrano, M. -- A Fresh Look to Inlining Decision -- 4th International Computer Symposium (invited paper), Mexico city, Mexico, Nov, 1995, pp. 40--52.
[ ps.gz ]
Included in a compiler for functional or object oriented languages, inline function expansion has been reported as one of the most valuable optimizations. Unfortunately, it has an important counterpart: since it duplicates function body, it enlarges the code of the compiled programs as well as the resulting object code. The main contribution of this paper is to present a simple compile time inlining decision algorithm where the code length increasing factor is a constant that can be tuned by the compiler designer and where execution improvements are comparable with other previous sophisticated technics. Our major concern is about functional languages. With these languages, recursive functions are widely used: the second contribution of this paper is the presentation of an original ad hoc inlining framework for recursive functions which is more accurate than function unfolding. Experimental results demonstrate that our inlining technics succeed in producing small and efficient compiled object codes.
[30]Serrano, M. and Weis, P. -- Bigloo: a portable and optimizing compiler for strict functional languages -- 2nd Static Analysis Symposium (SAS), Lecture Notes on Computer Science, Glasgow, Scotland, Sep, 1995, pp. 366--381.
[ ps.gz ]
We present Bigloo, a highly portable and optimizing compiler. Bigloo is the first compiler for strict functional languages that can efficiently compile several languages: Bigloo is the first compiler for full Scheme and full ML, and for these two languages, Bigloo is one of the most efficient compiler now available (Bigloo is available by anonymous ftp at

This high level of performance is achieved by numerous high-level optimizations. Some of those are classical optimizations adapted to higher-order functional languages (e.g. inlining), other optimization schemes are specific to Bigloo (e.g. a new refined closure analysis, an original optimization of imperative variables, and intensive use of higher-order control flow analysis). All these optimizations share the same design guideline: the reduction of heap allocation.
[47]Serrano, M. -- Control Flow Analysis: a Functional Languages Compilation Paradigm -- 10th ACM Symposium on Applied Computing (SAC), Nashville, Tennessee, USA, Feb, 1995, pp. 118-122.
Control flow analysis (cfa) is now well known but is not widely used in real compilers since optimizations that can be achieved via cfa are not so clear. This paper aims at showing that control flow analysis is very valuable in practice by presenting a fundamental optimization based on cfa: the closure representation algorithm, the essential optimizing phase of a lambda-language compiler. Since naïve and regular scheme to represent functions as heap allocated structures is far too inefficient, the main effort of modern functional languages compilers is devoted to minimize the amount of memory allocated for functions. In particular, compilers try to discover when a procedure can safely be handled without any allocation at all. Previously described methods to do so are ad hoc, depending on the language compiled, and not very precise. Using cfa, we present a general approach which produces better results. This refined closure analysis subsumes previously known ones and optimizes more than 80 % of closure on the average. We also present another analysis based on cfa which is useful for dynamically typed languages: the type investigation algorithm. Modifying Shivers' cfa, we realized a type inference algorithm that eliminates more than 60 % of dynamic type tests. These two optimizations are fully integrated into a real compiler, so that we can report reliable measures obtained for real world programs. Time figures show that analyses based on cfa can be very efficient: when the compiler uses the improved closure allocation scheme and the type investigation algorithm, the resulting executable programs run more than two time faster.
[34]Serrano, M. -- Vers une compilation portable et performante des langages fonctionnels -- Thèse de doctorat d'université, Université Pierre et Marie Curie (Paris 6), Paris, Dec, 1994.
[ ps.gz ]
Cette thèse est consacrée à la compilation portable et performante des langages fonctionnels modernes tels que Scheme et ML. Pour obtenir la portabilité, nous avons opté pour un schéma de compilation qui utilise C comme langage d'assemblage. Le choix de ce langage cible interdit l'utilisation de certaines des techniques traditionnellement efficaces. Nous avons comblé cet handicap en multipliant les analyses et les optimisations de haut niveau. Les travaux présentés dans ce document ont donné lieu à la réalisation d'un compilateur très performant : Bigloo. Distribué depuis plus d'un an sur ftp anonyme, il est maintenant largement utilisé. Toutes les optimisations présentées dans ce document y sont implantées. Nous avons donc pu mesurer l'impact de chacune d'entre elles. Nous présentons en conclusion de cette thèse des mesures de performances qui nous permettent d'affirmer que Bigloo est, actuellement, le compilateur Scheme le plus rapide. En plus de sa rapidité, notre compilateur est extensible à deux niveaux :

• au niveau du langage source : il est possible d'insérer des passes supplémentaires dans le processus de compilation. Plusieurs extensions ont déjà été réalisées. Bigloo est capable, à ce jour, de compiler, en plus de Scheme, Camllight ou Meroon (la couche objet de Scheme conçue par C. Queinnec).

• au niveau des langages externes : Bigloo possède une interface qui permet au programmeur d'avoir accès à un langage externe depuis Scheme (ou ML). Grâce à elle, plusieurs langages peuvent être mélangés pour produire des applications autonomes.

L'utilisation conjointe de nos deux niveaux d'extensibilité nous a permis, à titre d'expérience, de réaliser des programmes exécutables mélangeant Scheme, Camllight et C.
[42]Serrano, M. -- Using Higher Order Control Flow Analysis When Compiling Functional Languages -- 6th International Symposium on Programming Language Implementation and Logic Programming (PLILP), Lecture Notes on Computer Science, (844), Madrid, Spain, Sep, 1994, pp. 447--449.
[67]Serrano, M. and Weis, P. -- 1+1 =1: an optimizing Caml compiler -- 2nd ACM Sigplan Workshop on ML and its Applications, Orlando (Florida, USA), 1994, pp. 101--111.
We present a new Caml compiler, which was obtained by an original approach: a simple pipeline between two existing compilers, each one devoted to half of the compilation process. The first compiler is a Caml compiler, it is in charge of the front end, and ensures compatibility. The second compiler is an optimizing Scheme compiler, it constitutes the back end, and ensures efficiency. These are camlc 0.6 byte-code compiler and a Scheme compiler (Bigloo). Using this technology, we were able to write the optimizing compiler in only two man-months. The new compiler is bootstrapped, fully compatible with the camlc compiler, and features interesting inter-module optimizations for curried functions. It produces efficient code, comparable with the one produced by other ML compilers (including SML/NJ). Our new compiler, Bigloo, is freely available at
[68]Serrano, M. -- De l'utilisation des analyses de flot de contrôle dans la compilation des langages fonctionnels -- Cnrs, 1993.
[69]Serrano, M. -- Rgc: un générateur d'analyseurs lexicaux efficaces en Scheme -- Feb, 1992, pp. 235--252.
[ ps.gz ]
Cet article présente Rgc, un générateur d'analyseurs lexicaux rapides, développé pour Scheme. Nous ne décrivons pas ici une maquette mais un produit final efficace. Par ses performances, il est en concurrence directe avec le logiciel Flex. Après mesures, il apparaît que Rgc est entre 5 et 10 % plus rapide que Flex et entre 250 et 260 % plus rapide que Lex. Pour obtenir ce niveau de performance, nous avons réalisé un compilateur spécialisé restreint Scheme->C. De plus, puisque Scheme ne possède pas de primitives rapides de lecture il s'est avéré indispensable de programmer les requêtes systèmes et la gestion des tampons en C. Le code est donc composé de 90 % de Scheme et 10 % de C.

This Html page has been produced by Skribe.
Last update Thu Feb 27 08:23:03 2020.