You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Writing fast code requires understanding all aspects of your application and how it interacts with the system. This topic suggests alternatives to some of the more obvious coding techniques to help you ensure that the time-critical portions of your code perform satisfactorily.
10
+
Writing fast code requires understanding all aspects of your application and how it interacts with the system. This article suggests alternatives to some of the more obvious coding techniques to help you ensure that the time-critical portions of your code perform satisfactorily.
11
11
12
12
To summarize, improving time-critical code requires that you:
13
13
@@ -41,13 +41,13 @@ To gather information on the performance of your code, you can use the performan
41
41
42
42
Missed cache hits, on both the internal and external cache, as well as page faults (going to secondary storage for program instructions and data) slow the performance of a program.
43
43
44
-
A CPU cache hit can cost your program 10-20 clock cycles. An external cache hit can cost 20-40 clock cycles. A page fault can cost one million clock cycles (assuming a processor that handles 500 million instructions/second and a time of 2 millisecond for a page fault). Therefore, it is in the best interest of program execution to write code that will reduce the number of missed cache hits and page faults.
44
+
A CPU cache hit can cost your program 10-20 clock cycles. An external cache hit can cost 20-40 clock cycles. A page fault can cost a million clock cycles (assuming a processor that handles 500 million instructions/second and a time of 2 millisecond for a page fault). Therefore, it is in the best interest of program execution to write code that will reduce the number of missed cache hits and page faults.
45
45
46
-
One reason for slow programs is that they take more page faults or miss the cache more often than necessary. To avoid this, it's important to use data structures with good locality of reference, which means keeping related things together. Sometimes a data structure that looks great turns out to be horrible because of poor locality of reference, and sometimes the reverse is true. Here are two examples:
46
+
One reason for slow programs is that they take more page faults or miss the cache more often than necessary. To avoid this problem, it's important to use data structures with good locality of reference, which means keeping related things together. Sometimes a data structure that looks great turns out to be horrible because of poor locality of reference, and sometimes the reverse is true. Here are two examples:
47
47
48
-
- Dynamically allocated linked lists can reduce program performance because when you search for an item or when you traverse a list to the end, each skipped link could miss the cache or cause a page fault. A list implementation based on simple arrays might actually be much faster because of better caching and fewer page faults even — allowing for the fact that the array would be harder to grow, it still might be faster.
48
+
- Dynamically allocated linked lists can reduce program performance. When you search for an item, or when you traverse a list to the end, each skipped link could miss the cache or cause a page fault. A list implementation based on simple arrays might be faster because of better caching and fewer page faults. Even if you allow for the fact that the array would be harder to grow, it still might be faster.
49
49
50
-
- Hash tables that use dynamically allocated linked lists can degrade performance. By extension, hash tables that use dynamically allocated linked lists to store their contents might perform substantially worse. In fact, in the final analysis, a simple linear search through an array might actually be faster (depending on the circumstances). Array-based hash tables (so-called "closed hashing") is an often-overlooked implementation which frequently has superior performance.
50
+
- Hash tables that use dynamically allocated linked lists can degrade performance. By extension, hash tables that use dynamically allocated linked lists to store their contents might perform substantially worse. In fact, in the final analysis, a simple linear search through an array might actually be faster (depending on the circumstances). Use of an array-based hash table (so-called "closed hashing") is an often-overlooked implementation that frequently has superior performance.
51
51
52
52
## <aname="_core_sorting_and_searching"></a> Sorting and Searching
53
53
@@ -59,57 +59,57 @@ Sorting is inherently time consuming compared to many typical operations. The be
59
59
60
60
- Sort only the part of the data that truly needs sorting.
61
61
62
-
Sometimes, you can build the list in sorted order. Be careful, because if you need to insert data in sorted order, you may require a more complicated data structure with poor locality of reference, leading to cache misses and page faults. There is no approach that works in all cases. Try several approaches and measure the differences.
62
+
Sometimes, you can build the list in sorted order. Be careful, because if you need to insert data in sorted order, you may require a more complicated data structure with poor locality of reference, leading to cache misses and page faults. There's no approach that works in all cases. Try several approaches and measure the differences.
63
63
64
64
Here are some general tips for sorting:
65
65
66
66
- Use a stock sort to minimize bugs.
67
67
68
-
- Any work you can do beforehand to reduce the complexity of the sort is worthwhile. If a one-time pass over your data simplifies the comparisons and reduces the sort from O(n log n) to O(n), you will almost certainly come out ahead.
68
+
- Any work you can do beforehand to reduce the complexity of the sort is worthwhile. If a one-time pass over your data simplifies the comparisons and reduces the sort from O(n log n) to O(n), you'll almost certainly come out ahead.
69
69
70
70
- Think about the locality of reference of the sort algorithm and the data you expect it to run on.
71
71
72
-
There are fewer alternatives for searches than for sorting. If the search is time-critical, a binary search or hash table lookup is almost always best, but as with sorting, you must keep locality in mind. A linear search through a small array can be faster than a binary search through a data structure with a lot of pointers that causes page faults or cache misses.
72
+
There are fewer alternatives for searches than for sorting. If the search is time-critical, a binary search or hash table lookup is almost always best, but as with sorting, you must keep locality in mind. A linear search through a small array can be faster than a binary search through a data structure with many pointers that causes page faults or cache misses.
73
73
74
74
## <aname="_core_mfc_and_class_libraries"></a> MFC and Class Libraries
75
75
76
76
The Microsoft Foundation Classes (MFC) can greatly simplify writing code. When writing time-critical code, you should be aware of the overhead inherent in some of the classes. Examine the MFC code that your time-critical code uses to see if it meets your performance requirements. The following list identifies MFC classes and functions you should be aware of:
77
77
78
-
-`CString` MFC calls the C run-time library to allocate memory for a [CString](../atl-mfc-shared/reference/cstringt-class.md) dynamically. Generally speaking, `CString` is as efficient as any other dynamically-allocated string. As with any dynamically allocated string, it has the overhead of dynamic allocation and release. Often, a simple **`char`** array on the stack can serve the same purpose and is faster. Don't use a `CString` to store a constant string. Use `const char *` instead. Any operation you perform with a `CString` object has some overhead. Using the run-time library [string functions](../c-runtime-library/string-manipulation-crt.md) may be faster.
78
+
-`CString` MFC calls the C run-time library to allocate memory for a [`CString`](../atl-mfc-shared/reference/cstringt-class.md) dynamically. Generally speaking, `CString` is as efficient as any other dynamicallyallocated string. As with any dynamically allocated string, it has the overhead of dynamic allocation and release. Often, a simple **`char`** array on the stack can serve the same purpose and is faster. Don't use a `CString` to store a constant string. Use `const char *` instead. Any operation you perform with a `CString` object has some overhead. Using the run-time library [string functions](../c-runtime-library/string-manipulation-crt.md) may be faster.
79
79
80
-
-`CArray` A [CArray](../mfc/reference/carray-class.md) provides flexibility that a regular array doesn't, but your program may not need that. If you know the specific limits for the array, you can use a global fixed array instead. If you use `CArray`, use `CArray::SetSize` to establish its size and specify the number of elements by which it grows when a reallocation is necessary. Otherwise, adding elements can cause your array to be frequently reallocated and copied, which is inefficient and can fragment memory. Also be aware that if you insert an item into an array, `CArray` moves subsequent items in memory and may need to grow the array. These actions can cause cache misses and page faults. If you look through the code that MFC uses, you may see that you can write something more specific to your scenario to improve performance. Since `CArray` is a template, for example, you might provide `CArray` specializations for specific types.
80
+
-`CArray` A [`CArray`](../mfc/reference/carray-class.md) provides flexibility that a regular array doesn't, but your program may not need that. If you know the specific limits for the array, you can use a global fixed array instead. If you use `CArray`, use `CArray::SetSize` to establish its size and specify the number of elements by which it grows when a reallocation is necessary. Otherwise, adding elements can cause your array to be frequently reallocated and copied, which is inefficient and can fragment memory. Also, if you insert an item into an array, `CArray` moves subsequent items in memory and may need to grow the array. These actions can cause cache misses and page faults. If you look through the code that MFC uses, you may see that you can write something more specific to your scenario to improve performance. Since `CArray` is a template, for example, you might provide `CArray` specializations for specific types.
81
81
82
-
-`CList`[CList](../mfc/reference/clist-class.md) is a doubly linked list, so element insertion is fast at the head, tail, and at a known position (`POSITION`) in the list. Looking up an element by value or index requires a sequential search, however, which can be slow if the list is long. If your code does not require a doubly linked list you may want to reconsider using `CList`. Using a singly linked list saves the overhead of updating an additional pointer for all operations as well as the memory for that pointer. The additional memory is not great, but it is another opportunity for cache misses or page faults.
82
+
-`CList`[`CList`](../mfc/reference/clist-class.md) is a doubly linked list, so element insertion is fast at the head, tail, and at a known position (`POSITION`) in the list. Looking up an element by value or index requires a sequential search, however, which can be slow if the list is long. If your code doesn't require a doubly linked list, you may want to reconsider using `CList`. Using a singly linked list saves the overhead of updating another pointer for all operations and the memory for that pointer. The extra memory isn't large, but it's another opportunity for cache misses or page faults.
83
83
84
-
-`IsKindOf` This function can generate many calls and access a lot of memory in different data areas, leading to bad locality of reference. It is useful for a debug build (in an ASSERT call, for example), but try to avoid using it in a release build.
84
+
-`IsKindOf` This function can generate many calls and may access memory in different data areas, leading to bad locality of reference. It's useful for a debug build (in an ASSERT call, for example), but try to avoid using it in a release build.
85
85
86
-
-`PreTranslateMessage` Use `PreTranslateMessage` when a particular tree of windows needs different keyboard accelerators or when you must insert message handling into the message pump. `PreTranslateMessage` alters MFC dispatch messages. If you override `PreTranslateMessage`, do so only at the level needed. For example, it is not necessary to override `CMainFrame::PreTranslateMessage` if you are interested only in messages going to children of a particular view. Override `PreTranslateMessage` for the view class instead.
86
+
-`PreTranslateMessage` Use `PreTranslateMessage` when a particular tree of windows needs different keyboard accelerators or when you must insert message handling into the message pump. `PreTranslateMessage` alters MFC dispatch messages. If you override `PreTranslateMessage`, do so only at the level needed. For example, it isn't necessary to override `CMainFrame::PreTranslateMessage` if you're interested only in messages going to children of a particular view. Override `PreTranslateMessage` for the view class instead.
87
87
88
-
Do not circumvent the normal dispatch path by using `PreTranslateMessage` to handle any message sent to any window. Use [window procedures](../mfc/registering-window-classes.md) and MFC message maps for that purpose.
88
+
Don't circumvent the normal dispatch path by using `PreTranslateMessage` to handle any message sent to any window. Use [window procedures](../mfc/registering-window-classes.md) and MFC message maps for that purpose.
89
89
90
-
-`OnIdle` Idle events can occur at times you do not expect, such as between `WM_KEYDOWN` and `WM_KEYUP` events. Timers may be a more efficient way to trigger your code. Do not force `OnIdle` to be called repeatedly by generating false messages or by always returning `TRUE` from an override of `OnIdle`, which would never allow your thread to sleep. Again, a timer or a separate thread might be more appropriate.
90
+
-`OnIdle` Idle events can occur at times you don't expect, such as between `WM_KEYDOWN` and `WM_KEYUP` events. Timers may be a more efficient way to trigger your code. Don't force `OnIdle` to be called repeatedly by generating false messages or by always returning `TRUE` from an override of `OnIdle`, which would never allow your thread to sleep. Again, a timer or a separate thread might be more appropriate.
Code reuse is desirable. However, if you are going to use someone else's code, you should make sure you know exactly what it does in those cases where performance is critical to you. The best way to understand this is by stepping through the source code or by measuring with tools such as PView or Performance Monitor.
94
+
Code reuse is desirable. However, if you're going to use someone else's code, you should make sure you know exactly what it does in those cases where performance is critical to you. The best way to understand it is by stepping through the source code or by measuring with tools such as PView or Performance Monitor.
95
95
96
96
## <aname="_core_heaps"></a> Heaps
97
97
98
98
Use multiple heaps with discretion. Additional heaps created with `HeapCreate` and `HeapAlloc` let you manage and then dispose of a related set of allocations. Don't commit too much memory. If you're using multiple heaps, pay special attention to the amount of memory that is initially committed.
99
99
100
-
Instead of multiple heaps, you can use helper functions to interface between your code and the default heap. Helper functions facilitate custom allocation strategies that can improve the performance of your application. For example, if you frequently perform small allocations, you may want to localize these allocations to one part of the default heap. You can allocate a large block of memory and then use a helper function to suballocate from that block. If you do this, you will not have additional heaps with unused memory because the allocation is coming out of the default heap.
100
+
Instead of multiple heaps, you can use helper functions to interface between your code and the default heap. Helper functions facilitate custom allocation strategies that can improve the performance of your application. For example, if you frequently perform small allocations, you may want to localize these allocations to one part of the default heap. You can allocate a large block of memory and then use a helper function to suballocate from that block. Then you won't have multiple heaps with unused memory, because the allocation is coming out of the default heap.
101
101
102
102
In some cases, however, using the default heap can reduce locality of reference. Use Process Viewer, Spy++, or Performance Monitor to measure the effects of moving objects from heap to heap.
103
103
104
-
Measure your heaps so you can account for every allocation on the heap. Use the C run-time [debug heap routines](../c-runtime-library/crt-debug-heap-details.md) to checkpoint and dump your heap. You can read the output into a spreadsheet program like Microsoft Excel and use pivot tables to view the results. Note the total number, size, and distribution of allocations. Compare these with the size of working sets. Also look at the clustering of related-sized objects.
104
+
Measure your heaps so you can account for every allocation on the heap. Use the C run-time [debug heap routines](../c-runtime-library/crt-debug-heap-details.md) to checkpoint and dump your heap. You can read the output into a spreadsheet program like Microsoft Excel and use pivot tables to view the results. Note the total number, size, and distribution of allocations. Compare these results with the size of working sets. Also look at the clustering of related-sized objects.
105
105
106
106
You can also use the performance counters to monitor memory usage.
107
107
108
108
## <aname="_core_threads"></a> Threads
109
109
110
110
For background tasks, effective idle handling of events may be faster than using threads. It's easier to understand locality of reference in a single-threaded program.
111
111
112
-
A good rule of thumb is to use a thread only if an operating system notification that you block on is at the root of the background work. Threads are the best solution in such a case because it is impractical to block a main thread on an event.
112
+
A good rule of thumb is to use a thread only if an operating system notification that you block on is at the root of the background work. Threads are the best solution in such a case because it's impractical to block a main thread on an event.
113
113
114
114
Threads also present communication problems. You must manage the communication link between your threads, with a list of messages or by allocating and using shared memory. Managing the communication link usually requires synchronization to avoid race conditions and deadlock problems. This complexity can easily turn into bugs and performance problems.
The topics in this part of the documentation describe the important features of the Concurrency Runtime Task Scheduler. The Task Scheduler is useful when you want fine-tune the performance of your existing code that uses the Concurrency Runtime.
10
+
The topics in this part of the documentation describe the important features of the Concurrency Runtime Task Scheduler. The Task Scheduler is useful when you want to fine-tune the performance of your existing code that uses the Concurrency Runtime.
11
11
12
12
> [!IMPORTANT]
13
13
> The Task Scheduler is not available from a Universal Windows Platform (UWP) app. For more information, see [Creating Asynchronous Operations in C++ for UWP Apps](../../parallel/concrt/creating-asynchronous-operations-in-cpp-for-windows-store-apps.md).
0 commit comments