[std-interval] C++ interval std

Thu Apr 6 01:31:20 PDT 2006

Alan Eliasen wrote:
> Steve Clamage wrote:
> 
> 
> 
>>>Bill Clarke wrote:
>>>
>>>
>>>>This experiment is extremely platform specific, as you probably are
>>>>aware.  Most platforms in use today other than x86 will be the other way
>>>>around (add-by-value will be faster than add-by-const-ref).
> 
> 
> Alan Eliasen wrote:
> 
>>>   How can that assertion possibly be made?  As you know, if a parameter
>>>is passed by value, then the copy constructor is automatically called
>>>for each object (e.g. twice for an add operator) on function entry, and
>>>then the destructor is called (twice) on function exit.
>>
> 
> Steve Clamage wrote:
> 
>>A simple object like complex or interval should have no user-defined
>>copy constructor or destructor. That is, the default copy constructor
>>generated by the compiler does the right thing and does not need to be
>>user-defined. The destructor has nothing to do, and so should not be
>>user-defined. In that case, pass-by-value can be optimized into register
>>passing with no additional copying.
> 
> 
>    You still agree, though, that even if the implementation doesn't do
> memory allocation in the constructor, the compiler will still generate a
> copy constructor and a destructor, (perhaps pretty simple and efficient)
> and these will get normally get called in a pass-by-value.  Again, a
> very smart compiler can optimize these away if it realizes that the copy
> is unnecessary, but by simply specifying call-by-constant-reference, we
> don't have to hope that the compiler is very smart.  We simply eliminate
> even the possibility of calling the copy constructor and destructor.

In C++, every type (including int) has a constructor and destructor. This point 
of definition simplifies the exposition of language rules. We are saved from 
hopelessly inefficient implementations by the "as-if rule." It says the 
implementation can do what is wants as long as observable behavior is preserved. 
Multiple copies of temporary objects (as in parameter and returned-value 
passing) are not counted in "observable behavior" -- extra copies can be elided.

For POD types, I have never seen or heard of a compiler that actually generated 
a real constructor or destructor. Compilers perform a simple copy by whatever 
means is most efficient.

> 
> 
>>It is possible to construct a test case where pass-by-value gives worse
>>performance than pass-by-reference, and your example does indeed run
>>faster with pass-by-reference using Sun compilers at low optimization.
> 
> 
>    As noted, it should always be the case that pass-by-value is at least
> as slow as pass-by-constant-reference because of the overhead of calling
> the copy constructors and destructors, which are almost always
> unnecessary and can be trivially eliminated by specifying the calling
> convention.  If it's ever faster, I'd argue that it's because the
> compiler missed obvious optimizations for pass-by-constant-reference, or
> is allocating registers badly.
> 
> 
>>At high optimization, the compiler eliminated the loops and all but one
>>call to "add", which it inlined. I played with the test case for a short
>>time, but was unable to defeat the optimizer with minor changes.
> 
> 
>    Heh!  That's a smart compiler!  I had considered adding some
> randomization to the values to eliminate this possibility, but the
> timing values would have likely been swamped by the random number generator.
> 

In the original example, the call to "add" in the loop uses the same parameters. 
The compiler inlines the call, then hoists the computation out of the loop. The 
loop is now empty, so the compiler just computes the answer at compile time. 
Since the result of the computation was not used, the compiler can elide that as 
well!

I modified the loop to use the result by incrementing a variable that was later 
printed. (If the variable wasn't used, the compiler elided it, too. The "as-if" 
rule again.) To prevent hoisting the "add" out of the loop, I caused each 
iteration to add different values. Here is an example of two modified loops, and 
the output statements:

    Interval c(0.0,0.0);
    start = clock();
    for (i=0; i<upper; i++)
    {
       c.upper += addByValue( a, addByValue(Interval(i,0), b) ).upper;
    }
    end = clock();
    cout << "addByValue: \t" << (end-start) << " ticks " << c.upper << endl;

    c.upper = 0;
    start = clock();
    for (i=0; i<upper; i++)
    {
       c.upper += addByConstRef( a, addByConstRef(Interval(i,0), b) ).upper;
    }
    end = clock();
    cout << "addByConstRef: \t" << (end-start) << " ticks " << c.upper << endl;

I compiled and ran on 64-bit sparc (US II) and 64-bit amd64 (Opteron). According 
to my earlier claims, pass-by-value should have been faster.

64-bit sparc:
addByValue:     690000 ticks 4.1e+08
addByRef:       680000 ticks 4.1e+08
addByConstRef:  450000 ticks 4.1e+08

64-bit amd64:
addByValue:     1280000 ticks 4.1e+08
addByRef:        670000 ticks 4.1e+08
addByConstRef:   670000 ticks 4.1e+08

Color my face red. :-)

This result is quite different from other experiments with small structs that 
showed pass-by-value performing better. One possibility is that the compiler is 
missing some optimization opportunities. Another is that this example is not 
representative.

---
Steve Clamage, stephen.clamage at sun.com