How choice of Data Structures and Algorithms affect Performance

Saturday, February 7, 2009


The main data structure issues that you need to consider are as under:

Table 1



Choosing a collection without evaluating your needs (size, adding, deleting, updating)

Reduced efficiency; overly complex code.

Using the wrong collection for a given task

Reduced efficiency; overly complex code.

Excessive type conversion

Passing value type to reference type causing boxing and unboxing overhead, causing performance hit.

Inefficient lookups

Complete scan of all the content in the data structure, resulting in slow performance.

Not measuring the cost of your data structures or algorithms in your actual scenarios

Undetected bottlenecks due to inefficient code,


What are Collections

There are two basic types of collections: lists and dictionaries. Lists are index-based. Dictionaries are key-based, which means you store a key with the value. Table 2 summarizes the various list and dictionary types provided by the .NET Framework class libraries.

Collections are types that implement the IEnumerable, ICollection, or IList interfaces. If the types implement IDictionary separately or in addition to these three interfaces, they are referred to as Dictionaries. Table 2 lists the guidelines for each of these collection types.

Table 2




This is a dynamically sizable array. It is useful when you do not know the required array size at design time.


This is a collection of key/value pairs that are organized based on the hash code of the key. It is appropriate when you need to search but not sort.


This uses a ListDictionary when the collection is small, and switches to Hashtable when the collection gets large.


This is useful for storing 10 or less key/value pairs.


This is a sorted collection of associated String keys and String values that can be accessed either with the key or with the index.


This is a first-in, first-out collection that implements ICollection.


This is a collection of key/value pairs that are sorted by the keys and are accessible by key and by index.


This is a simple last-in, first-out collection of objects.


This is a strongly typed array list for strings.


This is a hash table with the key strongly typed to be a string rather than an object.


The correct use of data structures and algorithms plays a significant role in building high-performance applications. Your choices in these areas can significantly affect memory consumption and CPU loading.

The following guidelines help you to use efficient data structures and algorithms:

●    Choose an appropriate data structure.

●    Pre-assign size for large dynamic growth data types.

●    Use value and reference types appropriately.


Choose an Appropriate Data Structure

Before choosing the collection type for your scenarios, you should spend time analyzing your specific requirements by using the following common criteria:

●    Data storage. Consider how much data will be stored. Will you store a few records or a few thousand records? Do you know the amount of data to be stored ahead of time instead of at run time? How do you need to store the data? Does it need to be stored in order or randomly?

●    Type. What type of data do you need to store? Is it strongly typed data? Do you store variant objects or value types?

●    Growth. How will your data grow? What size of growth? What frequency?

●    Access. Do you need indexed access? Do you need to access data by using a key-value pair? Do you need sorting in addition to searching?

●    Concurrency. Does access to the data need to be synchronized? If the data is regularly updated, you need synchronized access. You may not need synchronization if the data is read-only.

●    Marshaling. Do you need to marshal your data structure across boundaries? For example, do you need to store your data in a cache or a session store? If so, you need to make sure that the data structure supports serialization in an efficient way.


Pre-Assign Size for Large Dynamic Growth Data Types

If you know that you need to add a lot of data to a dynamic data type, assign an approximate size up front wherever you can. This helps avoid unnecessary memory re-allocations.


Use Value and Reference Types Appropriately

Value types are stack-based and are passed by value, while reference types are heap-based and are passed by reference. Use the following guidelines when choosing between pass-by-value and pass-by-reference semantics:

●          Avoid passing large value types by value to local methods. If the target method is in the same process or application domain, the data is copied onto the stack. You can improve performance by passing a reference to a large structure through a method parameter, rather than passing the structure by value.

●          Consider passing reference types by value across process boundaries. If you pass an object reference across a process boundary, a callback to the client process is required each time the objects’ fields or methods are accessed. By passing the object by value, you avoid this overhead. If you pass a set of objects or a set of connected objects, make sure all of them can be passed by value.

●          Consider passing a reference type when the size of the object is very large or the state is relevant only within the current process boundaries. For example, objects that maintain handles to local server resources, such as files.



Filed Under: Guidelines

Add comment