[std-interval] Parameter Passing Performance

Lawrence.Crowl at Sun.com Lawrence.Crowl at Sun.com
Mon Apr 24 19:40:39 PDT 2006


Steve Clamage wrote:
 >> I compiled and ran on 64-bit sparc (US II) and 64-bit amd64 (Opteron).
 >> According to my earlier claims, pass-by-value should have been faster.
 >> 
 >> 64-bit sparc:
 >> addByValue:     690000 ticks 4.1e+08
 >> addByRef:       680000 ticks 4.1e+08
 >> addByConstRef:  450000 ticks 4.1e+08
 >> 
 >> 64-bit amd64:
 >> addByValue:     1280000 ticks 4.1e+08
 >> addByRef:        670000 ticks 4.1e+08
 >> addByConstRef:   670000 ticks 4.1e+08
 >> 
 >> Color my face red. :-)
 >> 
 >> This result is quite different from other experiments with small structs
 >> that showed pass-by-value performing better. One possibility is that the
 >> compiler is missing some optimization opportunities. Another is that
 >> this example is not representative.

Sorry for the delay on responding, but I have been burdened with other
issues.  I decided to write my own benchmark that I found more
convincing in terms of likely use.  I chose to implement a simple
interval saxpy, but with the primitive operations out of line.  I will
not pretend this code is good interval arithmetic, but it should
provide a reasonable upper bound on the cost of parameters.  The source
is as follows.

% cat interval.h
#ifndef INTERVAL_H
#define INTERVAL_H

class interval;

#ifdef VALUE
typedef interval interval_in;
#else
typedef const interval & interval_in;
#endif

struct interval
{
        double lo, hi;
        interval() { }
        interval( double x, double y ) : lo( x ), hi( y ) { }
        interval & operator +=( interval_in x );
        friend interval operator +( interval_in x, interval_in y );
        friend interval operator *( interval_in x, interval_in y );
};

interval operator +( interval_in x, interval_in y );
interval operator *( interval_in x, interval_in y );

#endif

% cat interval.cc
#include "interval.h"

interval & interval::operator +=( interval_in x )
{
        lo += x.lo;
        hi += x.hi;
        return *this;
}

interval operator +( interval_in x, interval_in y )
{
        return interval( x.lo + y.lo, x.hi + y.hi );
}

interval operator *( interval_in x, interval_in y )
{
        double min = x.lo * y.lo;
        double max = min;
        double next = x.lo * y.hi;
        if ( next < min ) min = next;
        if ( next > max ) max = next;
        next = x.hi * y.lo;
        if ( next < min ) min = next;
        if ( next > max ) max = next;
        next = x.hi * y.hi;
        if ( next < min ) min = next;
        if ( next > max ) max = next;
        return interval( min, max );
}

% cat saxpy.h
#ifndef SAXPY_H
#define SAXPY_H

#include "interval.h"

interval saxpy( interval_in a, interval x[], interval y[], int n );

#endif

% cat saxpy.cc
#include "saxpy.h"

interval saxpy( interval_in a, interval x[], interval y[], int n )
{
        interval sum( 0.0, 0.0 );
        for ( int i = 0; i < n; ++i )
                sum += a * x[i] * y[i];
        return sum;
}

% cat benchmark.cc
#include <stdio.h>
#include "interval.h"
#include "saxpy.h"

const int size = 10000;

interval x[size];
interval y[size];
interval sum( 0.0, 0.0 );

int main()
{
        for ( int i = 0; i < size; ++i )
        {
                x[i].lo = 1.0 - i / (double)size;
                x[i].hi = 1.0 + i / (double)size;
                y[i].lo = -1.0 - i / (double)size;
                y[i].hi = -1.0 + i / (double)size;
        }
        for ( int i = 0; i < size; ++i )
        {
                sum += saxpy( x[i], x, y, size );
        }
        printf( "[%f;%f]\n", sum.lo, sum.hi );
        return 0;
}

I tested the result on four platforms and the two parameter
mechanisms.  The result is times in seconds.

    C++ param decl		refer	--------value-----------
    impl struct passing			pointer	stack	register

    SPARC v8 (32-bit)		6.3	11.3
    SPARC v9 (64-bit)		14.6			19.4
    IA32     (32-bit)		7.1		9.0
    AMD64    (64-bit)		2.8			4.3

In all cases, pass by const reference was faster.  I looked in detail
at the generated code and discovered that in all cases our compiler was
introducing an unnecessary copy for each value parameter.  It is a bug,
probably in the higher levels of the compiler.  Once one corrects for
the unnecessary copying, the value parameters are faster.

Note that performance is not the only criteria for choosing value
parameters.  The reduced aliasing means less exposure to program
errors.

  Lawrence Crowl             650-786-6146   Sun Microsystems, Inc.
                   Lawrence.Crowl at Sun.com   16 Network Circle, UMPK16-303
           http://www.Crowl.org/Lawrence/   Menlo Park, California, 94025


More information about the Std-interval mailing list