Sunday 7 November 2010

Has-a relationship: reference, pointer or value? ( 1/2 )



You are going to implement an Has-a relationship between two classes. Which is the best way to model this relation: by reference, by pointer or by value?


By Value
In general this is the preferable way to implement an Has-a relationship, the main advantage is to have to complete ownership of the object. Let's have a look at this small chunk of code:
  1. class Component
  2. {
  3.     public:
  4.         /* some stuff */
  5.     private:
  6.         /* some other stuff */
  7. };
  8. class Owner
  9. {
  10.     public:
  11.         Owner(const Component& component)
  12.             :component_(component) {}
  13.     private:
  14.         Component component_;
  15. };
So far so good, every time we create a relationship between two objects we bond the behaviour of one object to the interface of the other one. I will use this simple example to show how the Owner class will change based upon the Component interface.
The constructor of Owner takes a const reference of a component object. The keyword  const is important because we don't and we shouldn't change the object that we are going to copy and store on our class (the keyword explicit is to avoid undesired automatic conversion, have a look at Strongly Type post for more information).

The first drawback of storing the new component by value is performance, we are copying a value, so if Component object is big we spend a lot of time copying it. Please note that the object can be indefinitely big. For example a std::vector can be very-very-very big.
  1. class Component
  2. {
  3.     private:
  4.         /* some stuff */
  5.         Component(const Component& other);
  6.         Component& operator=(const Component& other);
  7.     public:
  8.         /* some other stuff */
  9. };
The author of Component is a good developer and he knows that copying a very-very-very big object is bad, so he decided to disallow the copy semantic (as above).

Now we have the first trouble, the Owner class will not compile! So, the sad moral of the story is: we cannot implement has-a relationship by value if the component object has not copy semantic.


This is not the worst case scenario, let's take as example this Component interface:
  1. class Component
  2. {
  3.     public:
  4.         /* some stuff */
  5.         Component(Component& other);
  6.         Component& operator=(Component& other);
  7.     private:
  8.         /* some other stuff */
  9. };
This time the copy semantic is allowed but the keywords const has been removed. Why? I don't know the people are different and the developers too. Removing the const keyword can give you a very small advantage in performance (do you really need it?) but it can spoil the day of another developer. How? This example explains it:
  1. Component comp;
  2. /* some calculation */
  3. /* comp is very important! My whole program
  4.    is based on it! */
  5. Owner o(comp);
  6.    
  7. /* Arggggg.... Owner can change it! */
Changing an object that you take as an input parameter is like scratching the car of your friend. You shouldn't do that.
So: we can but we shouldn't implement an has-a relationship by value if the component object has non-const copy semantic.


Actually, we haven't finished yet with our analysis. There is another pitfall to avoid. It is a well known feature of C++, if you don't provide Constructor, Destructor, Copy-Constructor and Assignment Operator the compiler provide a version for you. You may think cool, less work. Actually there is huge drawback. The compiler is smart, but it is not a mind reader. Even the smartest compiler cannot reproduce the copy semantic that you think! You need to code it, otherwise the compiler will do the best and it can creating a bitwise copy. Bitwise copy is an exact copy of an existing object bit by bit. In order to show all the possible outcomes I have prepared a small program. Even though the program is completely meaningless the result output of it is exactly what we want:
  1. #include <iostream>
  2. #include <string>
  3. using namespace std;
  4. class Component
  5. {
  6.     public:
  7.         /* some stuff */
  8.         Component(int numberOfValue,string myString) :
  9.             i_(new int[numberOfValue]),
  10.             numberOfValue_(numberOfValue),
  11.             myString_(myString)
  12.         {
  13.             changeValue(10);
  14.         }
  15.            
  16.         ~Component() { delete[] i_; }    
  17.        
  18.         int getNumberOfValue() const { return numberOfValue_; }
  19.        
  20.         string getMyString() const { return myString_; }
  21.        
  22.         void printArray() const
  23.         {
  24.             for (int j=0; j < numberOfValue_; j++)
  25.                 cout << i_[ j ] << " " ;
  26.             cout << " Data stored by pointer" << endl;           
  27.         }
  28.        
  29.         void modify(int newNumberOfValue,string anotherString)
  30.         {
  31.             delete[] i_;
  32.             i_ = new int[newNumberOfValue];
  33.             numberOfValue_ = newNumberOfValue;     
  34.             myString_ = anotherString;
  35.             changeValue(100);
  36.         }
  37.     private:
  38.         void changeValue(int leverage)
  39.         {
  40.             for (int j=0; j < numberOfValue_; j++)
  41.                 i_[ j ] = leverage*j;          
  42.         }
  43.         /* some other stuff */
  44.         int* i_;
  45.         int numberOfValue_;
  46.         string myString_;
  47. };
  48. class Owner
  49. {
  50.     public:
  51.         Owner(Component& component)
  52.             :component_(component)
  53.     {
  54.         component_.modify(component_.getNumberOfValue(), "Bar");   
  55.     }
  56.    
  57.     private:
  58.         Component component_;
  59. };
  60. int main (int argc, char *argv[])
  61. {
  62.     //numberOfValue equal to 10
  63.     Component comp(10,"Foo");
  64.     cout << comp.getNumberOfValue() << " Built-in type "  << endl;
  65.     cout << comp.getMyString() << " Object with deep-copy semantic" << endl;
  66.     comp.printArray();
  67.        
  68.     //Make a copy of Component!
  69.     //Using the compiler automatic
  70.     //generated copy consturctor
  71.     Owner o(comp);
  72.     cout << comp.getNumberOfValue() << " Built-in type "  << endl;
  73.     cout << comp.getMyString() << " Object with deep-copy semantic" << endl;
  74.     comp.printArray();
  75.     return 0;
  76. }
The output is:
10 Built-in type
Foo Object with deep-copy semantic
0 10 20 30 40 50 60 70 80 90  Data stored by pointer
10 Built-in type
Foo Object with deep-copy semantic
0 100 200 300 400 500 600 700 800 900  Data stored by pointer



As you can see the Built-in type is copy correctly and also the string object. The array is doing something odd. The value of the array has been modified after the creation of Owner object. The reason is that the compiler makes a shallow copy of the pointer, it copies just the pointer and not all the data pointed by the pointer. So you will end up with two entities sharing the same data. Sometimes is what you want but very often is not.



So: we shouldn't implement an has-a relationship by value if the component object hasn't a deep copy semantic. If we really want to do that we need to carefully document our choice because it is not the behaviour that the client of our class will expect.

Thursday 4 November 2010

Using RAII idiom for profiling


The RAII is a well-known C++ idiom. The idea is simple and great at the same time. Two things in life are certain: birth and death. If you are wondering about taxes, please don't forget that I am Italian and in my country taxes are not mandatory.
In object oriented world birth means constructor. The constructor is the first function called when we create it, so it is a wonderful place to allocate a resource. Death means destructor, when a object goes out of scope the destructor is always called (of course if you leak the resource the destructor is not called). That's great, the destructor is the perfect place to release the memory.
Check this link for more information on RAII. What I am going to show you is a very simple class that implement this idiom to get profiling information.

Profiling
If you need to make your application go faster you need to know where your application is actually performing worst. DON'T SPECULATE! The developer are not good to guess where an application is going slow, if they were good the there wouldn't be slow applications.
Profiling is tedious, you need to put extra debug output everywhere, check the result and at the end remove them from the code (of course you can use a profiling tool, but sometimes the learning curve is so big that it is better using trivial way).
The RAII idiom will rescue us, have a look at this class:
  1. #ifndef _PROFILER_H_
  2. #define _PROFILER_H_
  3. #include <string>
  4. #include <iostream>
  5. #include <sys/time.h>
  6. using std::string;
  7. using std::ostream;
  8. namespace BitingCpp
  9. {
  10.     class Profiler
  11.     {
  12.         public:
  13.             Profiler(string what,
  14.                      ostream& out = std::cout,
  15.                      bool verbose = false);
  16.             ~Profiler(); /* throw() */
  17.         private:
  18.             Profiler(const Profiler& rhs);
  19.             Profiler& operator=(const Profiler& rhs);
  20.             string what_;
  21.             timeval start_;
  22.             timeval end_;
  23.             ostream& out_;
  24.             bool verbose_;
  25.     };
  26. } //namespace BitingCpp
  27. #endif

As you can see the class provides only constructor and destructor. The copies are not allowed. There are some private member and this is the implementation file:

  1. #include "Profiler.h"
  2. using std::endl;
  3. namespace BitingCpp
  4. {
  5.     Profiler::Profiler(string what,ostream& out,
  6.                                 bool verbose) :
  7.                                 what_(what), start_(),
  8.                                 end_(),out_(out),verbose_(verbose)
  9.     {
  10.         gettimeofday(&start_,0);
  11.         if (verbose_)
  12.             out_ << "Begin of " << what_ << " at " << start_.tv_sec
  13.                  << " s " << start_.tv_usec << " us" << endl;
  14.     }
  15.        
  16.     Profiler::~Profiler() /* throw() */
  17.     {
  18.         try
  19.         {
  20.             gettimeofday(&end_,0);
  21.             if (verbose_)
  22.                 out_ << "End of " << what_ << " at " << end_.tv_sec
  23.                      << " s " << end_.tv_usec << " us" << endl;
  24.            
  25.             double timems =
  26.               ((static_cast<double>(end_.tv_sec - start_.tv_sec))
  27.               *1000.0 + ((end_.tv_usec - start_.tv_usec)/1000.0));
  28.             out_ << what_ << " running time: "
  29.                  << timems  << " ms " << endl;
  30.         }
  31.         catch(...) {} // to avoid core dump during the stack unwind!
  32.     }
  33. } //namespace BitingCpp
The constructor stores the timestamp of creation (and print an optional statement) and the destructor makes a simple calculation and print the result.

This small program shows how to use this class:


  1. #include <iostream>
  2. #include "Profiler.h"
  3. int main (int argc, char *argv[])
  4. {
  5.     BitingCpp::Profiler mainProfile(__FUNCTION__);
  6.    
  7.     BitingCpp::Profiler* justFirstLongLoop =
  8.        new BitingCpp::Profiler("Just First Long Loop");
  9.    
  10.     int j = 0;
  11.     for (int i=0; i < 10000000; i++)
  12.         j += i*i;
  13.        
  14.     delete justFirstLongLoop; // trigger the deconstructor
  15.     BitingCpp::Profiler anotherBigLoop("Another Big Loop");
  16.     j = 0;
  17.     for (int i=0; i < 10000000; i++)
  18.         j += i*i;
  19.    
  20.    
  21.     return 0; // anotherBigLoop goes out of
  22. }             // scope first and then mainProfile



The output of this program is something like that:
Just First Long Loop running time: 52.852 ms
Another Big Loop running time: 52.435 ms
main running time: 105.829 ms
Now it is clear why the copies are not allowed, what does copying a Profiler object mean? The Profiler keeps track of its date of birth (actually a timeval structure). If I copy a Profiler object which date of birth should the new object have? The same of the first one or the time stamp of the copy operation? Too many question and furthermore I cannot see the point to make a copy of a Profiler object in the first place.
Note: the code is working on UNIX platform. You can easily change the implementation file to use Windows functions. __FUNCTION__ macro should be pretty much cross platform, I don't think it is already in the standard, but I am sure is widely supported.

Friday 29 October 2010

Strongly Type



Is this program compiling?

  1. #include <iostream>
  2. #include <string>
  3. //in a dusty corner of your library
  4. #define false 1
  5. #define true 0
  6. using std::string;
  7. using std::endl;
  8. using std::cout;
  9. void foo(bool aBool,bool bBool,const string& aString,const string& bString)
  10. {
  11.     cout << "aBool ="   << aBool << endl;
  12.     cout << "bBool ="   << bBool << endl;
  13.     cout << "aString =" << aString << endl;    
  14.     cout << "bString =" << bString << endl;
  15. }
  16. int main (int argc, char *argv[])
  17. {
  18.     foo(false,true,"A string","B string");
  19.     foo(false,"A string",true,"B string");
  20.     foo("A String",false,true,"B string");
  21.     return 0;
  22. }

I have tried on different compilers and I haven't get any error or warning.  As you can see a function that gets two bools and two const references to string can be fed with the parameters in any order and the compiler will never complain.


If you run it you will get an output like this:


aBool =1
bBool =0
aString =A string
bString =B string
terminate called after throwing an instance of 'std::logic_error'
  what():  basic_string::_S_construct NULL not valid


Are you surprised? I was when I bumped in this error few days ago. Let's try to understand what is going on and why this meaningless code is compiling fine.
Let's start with bool. A literal string is constant at compile time and it is stored using a char*, in other words we type "A string" and the compiler stores in the constant memory a pointer (integer).
The bool type participates in integral promotions. An r-value of type bool can be converted to an r-value of type int, with false becoming zero and true becoming one. So, it is not surprising that  a literal string can be automatically converted to a bool type.


Let's face the hard bit now. The std::string class has got this available constructors:
  1. string ( );
  2. string ( const string& str );
  3. string ( const string& str, size_t pos, size_t n = npos );
  4. string ( const char * s, size_t n );
  5. string ( const char * s );
  6. string ( size_t n, char c );


Let's exclude the n. 1, 3, 4, 6 because they are using more than one parameter, so they are not suitable candidates for the implicit conversion. We need to choose between the copy constructor  (const reference of string) and the overload of the constructor that takes const char*. 


Who is guilty? At this point we still don't know, but we can do some speculation and try to perform a test that can give us an answer. The standard says that a reference CANNOT be null, must always pointing to something. If you have sharp eyes, you have already noticed that we substituted the string parameter with 0 (remember #define true 0). So it cannot be the copy constructor because a reference CANNOT be null. If we write:


  1. foo(false,"A string",false,"B string");


The compiler will spot the error:

StronglyType/main.cpp(0,0): Error: invalid conversion from ‘int’ to ‘const char*’ (StronglyType)

The same problem will occur if the function foo has different signature (accepting string by value instead of by reference).


The moral of the story is: don't feed a function with literal string or constant, but define the variable before calling the function.



This code doesn't compile as we want:


  1. int main (int argc, char *argv[])
  2. {
  3.     bool false_(false);
  4.     bool true_(true);
  5.     string aString("A string");
  6.     string bString("B string");
  7.     foo(false_,aString,true_,bString);
  8.     return 0;
  9. }


I know, you are thinking: hang on, you are telling me that I need to type 5 lines of code instead of one!


Well writing 4 lines of code will take 30 seconds of your life. Loosing the type safe checking that the compiler guarantee can cost you hours of nasty debug. Your choice.



Last but not least, the big question. Is it happening just with string or also with other objects?


  1. class Bar
  2. {
  3.     public:
  4.         explicit Bar(const char* value) : value_(value) {}
  5.         Bar(const Bar& other) : value_(other.getValue()) {}
  6.         string getValue() const;
  7.     private:
  8.         string value_;
  9. };
  10. void foo(bool aBool,bool bBool,const Bar& aBar,const Bar& bBar)
  11. {
  12.   // some clever stuff
  13. }
  14. int main (int argc, char *argv[])
  15. {
  16.     foo(false, "A Bar", true, "B Bar");
  17.     return 0;
  18. }


This code is not compiling. The keyword explicit in front of the one-parameter constructor causes a compilation error. If you remove the keyword we have the same behaviour of the std::string.


For more information about the explicit keyword I recommend this link: http://msdn.microsoft.com/en-us/library/h1y7x448.aspx


Note: In order to compile it using gcc version 4.4.3 I added the two #define lines at the top. Other compilers are not so smart and the code will compile even without the two macros (by the way a good reason to not use macros to define constant). If you run the example with a different compiler let me know if the program will compile also without the two #define.