Best Practices for Writing Efficient and Reliable Code with C++/CLI

Article
02/03/2012

Best Practices for Writing Efficient and Reliable Code with C++/CLI

Kenny Kerr

May 2006

Applies to:
C++/CLI
Visual C++ 2005

Summary: Learn patterns and practices used by expert programmers to write efficient and reliable code with C++/CLI. (18 printed pages)

Introduction
Prefer for each Statements to Hand-Written Loops
Practice Safe Resource Management
Mix Types Safely and Correctly
Take Advantage of C++ Interop
Conclusion

Introduction

Visual C++ 2005 provides a wealth of features that allow you to build sophisticated applications without limits. It can, however, be challenging to write efficient and reliable code, because it can be easier to produce poorly written managed code with C++ than with some of the newer and simple languages. C++/CLI (Common Language Infrastructure) was designed to bring C++ to .NET as a first-class language for developing managed code applications, and specifically to simplify writing managed code with C++. This article walks through a number of best practices for writing efficient and reliable code with C++/CLI.

Prefer for each Statements to Hand-Written Loops

C++/CLI introduces the for each statement, which is analogous to the for_each function from the Standard C++ Library. Embrace it. It simplifies the task of iterating over a collection, and provides an abstraction that covers all kinds of managed and native containers. If you think the for statement is just fine as your default choice, think again.

Let's pretend that Jim the software developer is writing an application that needs to calculate the average of a collection of double-precision floating-point numbers. Initially, he only needs to average CLI arrays, and therefore he writes an Average function, as follows.

double Average(array<double>^ collection)
{
    ASSERT(nullptr != collection);

    int count = 0;
    double sum = 0;

    for (int index = 0; index < collection->Length; ++index)
    {
        ++count;
        sum += collection[index];
    }

    return sum / count;
}

His first attempt actually included an off-by-one error, but fortunately he's a true believer in unit tests, and caught it quickly. The for statement makes him feel comfortable. All is good.

The next day, Jim comes across another place in the application that needs to calculate an average, and feels confident that he'll be able to reuse the Average function—until he realizes that, in this case, the collection of numbers is coming from some other component, and is represented by the ICollection interface. He doesn't want to support two implementations of the same algorithm, and opts to look for a polymorphic solution. Fortunately, both CLI arrays and the ICollection interface provide the IEnumerable interface as part of their contracts. His second attempt is as follows.

double Average(IEnumerable^ collection)
{
    ASSERT(nullptr != collection);

    msclr::auto_handle<IEnumerator> enumerator = collection->GetEnumerator();

    int count = 0;
    double sum = 0;

    while (enumerator->MoveNext())
    {
        ++count;
        sum += safe_cast<double>(enumerator->Current);
    }

    return sum / count;
}

Jim is slightly less pleased with himself today. The good old for statement is gone, and he had to use the auto_handle template class, just in case the enumerator provided by the collection owns resources that need to be disposed. He's also not so comfortable with paying for the cost of boxing and unboxing the numbers read from the collection in cases where he's passing the Average function a CLI array; however, he figures he'll profile it later to determine whether its worth optimizing.

The application ships, and Jim's customers applaud, so its time to get ready for the next version. Before beginning on new features, he figures it's a good time for some refactoring. If he can convert all the code that uses collections to use the new generic collections that shipped in the .NET Framework 2.0, he can avoid the cost of boxing, and his code will be more type-safe to boot. After converting most of the code, he realizes that the semantics of the generic collections aren't exactly the same as their non-generic counterparts, and he is forced to leave some collections alone, in order to avoid destabilizing the product. Of course, he doesn't want to waste all the effort he put into converting so many of the collections, so he decides to just provide an overload of the Average function for generic collections, so that at least those collections can be averaged without any boxing overhead.

double Average(IEnumerable<double>^ collection)
{
    ASSERT(nullptr != collection);

    msclr::auto_handle<IEnumerator<double>> enumerator = collection->GetEnumerator();

    int count = 0;
    double sum = 0;

    while (enumerator->MoveNext())
    {
        ++count;
        sum += enumerator->Current;
    }

    return sum / count;
}

At this point, he's starting to wish he was using one of those newer languages with simple constructs, like foreach statements, that magically just do the right thing no matter what collection is in use. After some good coffee, he pulls himself together and sends an e-mail message to a teammate, asking her to review his code. A few minutes pass, and she e-mails back. The e-mail message doesn't contain any comments on his code. Instead, it just contains the following function.

template <typename T>
double Average(T^ collection)
{
    ASSERT(nullptr != collection);
    
    int count = 0;
    double sum = 0;

    for each (double element in collection)
    {
        ++count;
        sum += element;
    }

    return sum / count;
}

He looks in amazement, wondering how he could ever have thought of switching to one of those "other" languages. If only he had thought of this sooner, it would have saved him a lot of time.

The for each statement will roll out the equivalent of an efficient for statement if the collection is a CLI array. It will use an enumerator for both generic and non-generic collections, while avoiding the boxing for generic collections by using the appropriate generic interfaces for enumeration. The template function just ensures that the compiler has the necessary type information to produce different implementations on-demand—something that is not available for languages that do not support static polymorphism.

Embrace the for each statement. It will save you time, and the compiler will make sure that the best possible code is produced.

Practice Safe Resource Management

The Common Language Runtime (CLR) does a great job of transparently managing virtual memory so that you, as the developer, do not need to concern yourself with code for allocating and freeing memory. What it does not handle are the resources that your application uses that are not simply values or objects stored in managed memory. These resources include database connections; operating system handles such as those representing open files and other kernel objects; and memory that is not managed by the CLR, such as you might encounter when using native functions and allocating objects using the new operator. Writing correct and reliable code is, first and foremost, about safely managing these resources.

The CLI defines a pattern for resource management that developers must follow in order to ensure reliable resource management. Different languages provide varying degrees of support for this pattern, but as far as the CLR is concerned, it does not exist. That's okay. Some commercially successful runtimes prior to the CLR didn't even know what an object was, let alone destructors and stack semantics. As it was in the past, so it is today: developers need to use language features to ensure that resources are managed correctly. Luckily for the C++ developer, this language does a great job of providing a solid foundation for resource management, and C++/CLI extends this to encompass managed types.

The key is to ensure that all resources are contained within classes that embody the CLI's resource management pattern, as defined by the IDisposable interface. Implementing this interface correctly can be challenging, particularly when it comes to object hierarchies. Fortunately, C++/CLI takes care of the details and leaves you only to write a classic destructor for the class that is wrapping the resource. About the only thing that might be new to you is that destructors need to be implemented in such a way that they can be called multiple times. Luckily, this is not a big deal in most cases.

Let's take a look at a common scenario.

What do you think about when starting a new project that will require a lot of database interaction? Do you come up with a pattern for database access, or do you just start coding with ADO.NET? Consider a simple function for printing order information. If your experience is predominantly with one of the other languages that produce managed code, or if you're just thinking very literally in terms of the CLI, you might come up with something like the following.

void PrintOrderInfo(String^ connectionString)
{
    SqlConnection^ connection = gcnew SqlConnection(connectionString);

    try
    {
        connection->Open();
        SqlCommand^ command = connection->CreateCommand();

        try
        {
            command->CommandType = CommandType::StoredProcedure;
            command->CommandText = "GetOrders";
            SqlDataReader^ reader = command->ExecuteReader();

            try
            {
                while (reader->Read())
                {
                    Console::WriteLine(reader["Description"]);
                    // Print additional fields...
                }
            }
            finally
            {
                delete reader;
            }
        }
        finally
        {
            delete command;
        }
    }
    finally
    {
        delete connection;
    }
}

Although this example is free from errors, it is neither simple nor maintainable. Let's take a look at how it can be improved upon by using C++/CLI language and library features.

The first thing we can do is use stack semantics for reference types. This can only be used for objects created locally—not for handles returned from a function, for example—but it is, nevertheless, a powerful language feature. Using this approach, we avoid the outer try and finally blocks.

void PrintOrderInfo(String^ connectionString)
{
    SqlConnection connection(connectionString);
    connection.Open();
    SqlCommand^ command = connection.CreateCommand();

    try
    {
        command->CommandType = CommandType::StoredProcedure;
        command->CommandText = "GetOrders";
        SqlDataReader^ reader = command->ExecuteReader();

        try
        {
            while (reader->Read())
            {
                Console::WriteLine(reader["Description"]);
                // Print additional fields...
            }
        }
        finally
        {
            delete reader;
        }
    }
    finally
    {
        delete command;
    }
}

Although this is an improvement, we can do better. We cannot employ stack semantics for objects returned from functions as handles, but we can still make use of Resource Acquisition Is Initialization (RAII), by providing a class to wrap the handle, and provide automatic resource management. Visual C++ 2005 provides just such a class in the form of the auto_handle class in the <msclr\auto_handle.h> header file. auto_handle is analogous to the auto_ptr class from the Standard C++ Library, and provides the same semantics for CLI handles.

void PrintOrderInfo(String^ connectionString)
{
    SqlConnection connection(connectionString);
    connection.Open();

    msclr::auto_handle<SqlCommand> command = connection.CreateCommand();
    command->CommandType = CommandType::StoredProcedure;
    command->CommandText = "GetOrders";

    msclr::auto_handle<SqlDataReader> reader = command->ExecuteReader();

    while (reader->Read())
    {
        Console::WriteLine(reader.get()["Description"]);
        // Print additional fields...
    }
}

As clean and simple as this is, compared to what we started with, there is still a lot of boilerplate code that you need to write; and, if you consider that a non-trivial application may have dozens—or even hundreds—of similar functions, this can still be tedious, if not error-prone. What if you needed to execute a command in the context of a transaction? What if a subset of your database operations needs to disable connection pooling? A little abstraction can go a long way.

typedef msclr::auto_handle<SqlCommand> Command;
typedef msclr::auto_handle<SqlDataReader> DataReader;

[Flags]
enum class ConnectionOptions
{
    None        = 0x0000,
    Transaction = 0x0001,
    DoNotPool   = 0x0002
};

ref class Connection
{
public:

    Connection(String^ connectionString)
    {
        ASSERT(nullptr != connectionString);

        m_connection.ConnectionString = connectionString;
        m_connection.Open();
    }

    Connection(String^ connectionString,
               ConnectionOptions options)
    {
        ASSERT(nullptr != connectionString);

        if (ConnectionOptions::DoNotPool == (ConnectionOptions::DoNotPool & options))
        {
            connectionString = "Pooling=false;" + connectionString;
        }

        m_connection.ConnectionString = connectionString;
        m_connection.Open();

        if (ConnectionOptions::Transaction == (ConnectionOptions::Transaction & options))
        {
            m_transaction.reset(m_connection.BeginTransaction(IsolationLevel::ReadCommitted));
        }
    }

    Command CreateCommand(String^ commandText)
    {
        ASSERT(nullptr != commandText);

        Command command = m_connection.CreateCommand();
        command->CommandType = CommandType::StoredProcedure;
        command->CommandText = commandText;
        command->Transaction = m_transaction.get();
        return command;
    }

    void Commit()
    {
        ASSERT(nullptr != m_transaction.get());
        m_transaction->Commit();
    }

private:

    SqlConnection m_connection;
    msclr::auto_handle<SqlTransaction> m_transaction;
};

With the help of these type definitions, the PrintOrderInfo function is simplified even further.

void PrintOrderInfo(String^ connectionString)
{
    Connection connection(connectionString);
    Command command = connection.CreateCommand("GetOrders");
    DataReader reader = command->ExecuteReader();

    while (reader->Read())
    {
        Console::WriteLine(reader.get()["Description"]);
        // Print additional fields...
    }
}

Not only that, but adding a transaction to the mix is trivial.

void AddOrderForNewUser(String^ connectionString /* additional parameters */)
{
    Connection connection(connectionString,
                          ConnectionOptions::Transaction);

    Command addUser = connection.CreateCommand("AddUser");
    // Add command parameters...
    addUser->ExecuteNonQuery();

    Command addOrder = connection.CreateCommand("AddOrder");
    // Add command parameters...
    addOrder->ExecuteNonQuery();

    connection.Commit();
}

Have a strategy for resource management before you start coding. Use language and library features to simplify your code. You will save yourself time, while producing more-reliable and more-maintainable projects, with less hand-written code.

Mix Types Safely and Correctly

Chances are that one of the reasons you are using C++ to write managed code is because you appreciate the ability to reuse existing libraries and headers written in C and native C++. But how exactly do you reuse these existing libraries? How can you write managed code and still make use of classes from a C++ library, or call exported functions without resorting to Platform Invoke (P/Invoke)? In this section, we'll take a look at mixing managed and native types. In the next section, we'll take a look at additional techniques for interoperating with native code.

One of the first challenges you might encounter when trying to make use of your native types from within managed types is how to allocate and store them within the realm of the managed heap. After all, managed types are created on the managed heap, where the rules that native types take for granted are simply not available—namely, stable addresses in a process's address space.

Let's take a look at how we can solve this problem. The following mix just won't work.

class NativeType
{
    // ...
};

ref class ManagedType
{
    NativeType m_native;
};

The compiler will display an error indicating that "mixed types are not supported." Well, mixing types in this manner is not supported, but mixed types are, in fact, possible. Consider the following alternative.

ref class ManagedType
{
    NativeType* m_native;
};

See the difference? Although the CLI doesn't support native types as members of managed types directly, it doesn't have a problem storing a pointer to a native type. After all, a native pointer is just a 4-byte or 8-byte value, depending on your platform. Of course, now we're once again faced with a resource management problem, but one that can easily be solved with a bit of help from C++/CLI. A naïve solution would be to simply add a destructor to the managed type, to take care of deleting the pointer.

ref class ManagedType
{
    NativeType* m_native;

    ~ManagedType()
    {
        delete m_native;
    }
};

Although this code is technically correct, it does not lead to safe and reliable code. What we need is a managed template class, much like the auto_handle class that we used in the previous section, but one that can manage the lifetime of a native pointer instead of a managed object handle, in order that we can write something like the following.

ref class ManagedType
{
    AutoPtr<NativeType> m_native;
};

Assuming that AutoPtr is itself a managed type, it can be declared as a member of another managed type, and its destructor, which takes care of deleting the native pointer, will automatically be called when the containing object is disposed. This is because the C++ compiler provides automatic destruction of member variables, even for managed types and members. Although Visual C++ does not currently supply such a smart pointer, it is reasonably easy to write one yourself. The following is a simple implementation.

template <typename T>
ref struct AutoPtr
{
    AutoPtr() : m_ptr(0) {}
    AutoPtr(T* ptr) : m_ptr(ptr) {}
    AutoPtr(AutoPtr<T>% right) : m_ptr(right.Release()) {}

    ~AutoPtr()
    {
        delete m_ptr;
        m_ptr = 0;
    }
    !AutoPtr()
    {
        ASSERT(0 == m_ptr);
        delete m_ptr;
    }
    T* operator->()
    {
        ASSERT(0 != m_ptr);
        return m_ptr;
    }

    T* Get()
    {
        return m_ptr;
    }
    T* Release()
    {
        T* released = m_ptr;
        m_ptr = 0;
        return released;
    }
    void Reset()
    {
        Reset(0);
    }
    void Reset(T* ptr)
    {
        if (ptr != m_ptr)
        {
            delete m_ptr;
            m_ptr = ptr;
        }
    }

private:
    T* m_ptr;
};

As with the auto_ptr template class from the Standard C++ Library, the AutoPtr class shown in the preceding code provides transfer-of-ownership semantics, and should be used accordingly. This means that you should only ever declare AutoPtr objects "by-value"—either on the stack or, more interestingly, as members of managed types. Creating AutoPtr objects on the managed heap directly will only lead to trouble.

What about storing managed types within native types? Luckily Visual C++ has it all taken care of.

The gcroot native template class provides a type-safe garbage collected root. In other words, it allows you to store a handle to a managed object on the native heap, in such a way that the CLR will know that the handle exists, update it as the managed heap is compacted, and not reclaim the object's memory prematurely. There is no magic here. Although gcroot is only a class included in the Visual C++ libraries, it wraps the GCHandle type from the .NET Framework, which does the actual work of managing the native pointers used to look up the objects in the managed heap. You can find the gcroot class in the <vcclr.h> header file.

ref class ManagedType
{
    // ...
};

class NativeType
{
    gcroot<ManagedType^> m_managed;
};

The problem with gcroot is that it doesn't do anything other than providing a type-safe, CLR-aware object handle for use on the native heap. It's not much better than a raw pointer in this regard. What about managed types that need to be disposed? For that, you need to look elsewhere. Enter the auto_gcroot class defined in the <msclr\auto_gcroot.h> header file. The auto_gcroot native template class provides the same transfer-of-ownership semantics as the auto_ptr class from the Standard C++ Library. As with the gcroot class, its purpose is to simplify the task of storing managed types within native types, but it also takes care of disposing the object that it owns.

ref class ManagedType
{
    ~ManagedType();
};

class NativeType
{
    msclr::auto_gcroot<ManagedType^> m_managed;
};

Mixing types is indeed possible, and it can be done safely and reliably, provided that you take into account the resource management and ownership issues discussed in this section. Use an AutoPtr-like class for embedding native types within managed types. Use gcroot and auto_gcroot, as appropriate, for embedding managed types within native types. Avoid hand-written destructors when the compiler is more than happy to write them for you.

Take Advantage of C++ Interop

In the previous section, we took a look at how you can mix managed and native types. In this section, we're going to focus on mixing managed and natively-compiled code.

No doubt you've heard of P/Invoke, and may be quite familiar with it. P/Invoke is the name given to a set of services designed to allow managed code to call natively-compiled functions. It is certainly vital to enable languages such as C# and Visual Basic to call native code from the Windows SDK, COM classes, and functions exported from DLLs. Although P/Invoke can be used from C++, there is something that is not only simpler in most cases, but that can also be faster and more powerful.

Enter C++ Interop. In some older documentation, C++ Interop is referred to as It Just Works (IJW). Compared to P/Invoke, C++ Interop is almost completely invisible. You might be using C++ Interop today, without even knowing it. One of the main drawbacks to P/Invoke is that it is not type-safe. You need to re-declare the type signatures and marshalling semantics, and any error on your part is only reported at runtime. C++ Interop, on the other hand, makes use of the native definitions directly, thereby benefiting from the type information that is available, and reducing these kinds of upfront errors. On the other hand, P/Invoke does a good job of providing a reliable mechanism for marshalling managed type parameters to appropriate native types. C++ Interop is more focused on providing performance first, and leaving the programmer to worry about explicitly marshalling types.

By default, the compiler will attempt to compile all member and non-member functions to managed code—in other words, MSIL. This is regardless of whether the function belongs to a managed or native type. If you think about it, this makes sense, because otherwise it would be very hard to make use of managed types from existing code. But what if you really want to compile a given function natively? Well, there are a few things that you can do. Obviously, one choice is to separate the native functions into their own DLL. Another solution is to simply separate the native functions into a separate source file, and compile the source file without the /clr switch, so that it is compiled natively while the rest of the project is compiled to managed code. The compiler takes care of automatically injecting the necessary thunks to allow managed code to make calls to these natively-compiled functions, without your having to do anything special in your code. Yet another option is to use pragmas to delineate managed and native code in a single source file.

#pragma unmanaged

void NativeFunction()
{
    // ...
}

#pragma managed

void ManagedFunction()
{
    NativeFunction();
}

As you can see, calling native code from managed code couldn't be simpler. One thing to keep in mind is that, even though C++ Interop will, in general, provide better performance when compared to P/Invoke, there is still a cost to the managed-to-native transition; therefore, keeping transitions to a minimum will help performance. Consider the following slightly contrived example.

#pragma unmanaged

void NativeFunction()
{
    // ...
}

#pragma managed

void ManagedFunction()
{
    for (int index = 0; index < 100000000; ++index)
    {
        NativeFunction();
    }
}

Because the native function is called inside a tight loop, the overhead of repeatedly transitioning between managed and native code may outweigh the cost of calling the function itself. Considerably better performance can be achieved simply by moving the loop into native code.

#pragma unmanaged

void NativeFunction()
{
    for (int index = 0; index < 100000000; ++index)
    {
        // ...
    }
}

#pragma managed

void ManagedFunction()
{
    NativeFunction();
}

C++ Interop can also greatly simplify access to COM classes, which would traditionally require a separate assembly generated by the Type Library Importer tool.

A future article may cover C++ Interop in much greater detail than I can offer in this short section. C++ Interop is not the answer to every interop challenge, but it certainly comes close. Know what C++ Interop offers. Know what P/Invoke offers. Plan your managed–unmanaged transitions, and use the appropriate techniques to provide the best performance. Don't assume that using native code will necessarily provide better performance.

Conclusion

Visual C++ has a great deal to offer to managed code. It includes the most advanced optimizer for generating managed code, and, combined with language and library features, it can enable you to build rich applications that have great performance, and that are reliable and maintainable.

About the author

Kenny Kerr spends most of his time designing and building distributed applications for the Microsoft Windows platform. He also has a particular passion for C++ and security programming. Reach Kenny at https://weblogs.asp.net/kennykerr/ or visit his Web site: https://www.kennyandkarin.com/Kenny/.

Best Practices for Writing Efficient and Reliable Code with C++/CLI

Best Practices for Writing Efficient and Reliable Code with C++/CLI

Contents

Introduction

Prefer for each Statements to Hand-Written Loops

Practice Safe Resource Management

Mix Types Safely and Correctly

Take Advantage of C++ Interop

Conclusion

Additional resources