Foreward |
|
vii | |
|
|
|
|
Preface |
|
xiii | |
|
|
1 | (14) |
|
|
2 | (4) |
|
A First Glimpse of OpenMP |
|
|
6 | (2) |
|
The OpenMP Parallel Computer |
|
|
8 | (1) |
|
|
9 | (4) |
|
|
13 | (1) |
|
Navigating the Rest of the Book |
|
|
14 | (1) |
|
Getting Started with OpenMP |
|
|
15 | (26) |
|
|
15 | (1) |
|
OpenMP from 10,000 Meters |
|
|
16 | (7) |
|
OpenMP Compiler Directives or Pragmas |
|
|
17 | (3) |
|
Parallel Control Structures |
|
|
20 | (1) |
|
Communication and Data Environment |
|
|
20 | (2) |
|
|
22 | (1) |
|
Parallelizing a Simple Loop |
|
|
23 | (6) |
|
Runtime Execution Model of an OpenMP Program |
|
|
24 | (1) |
|
Communication and Data Scoping |
|
|
25 | (2) |
|
Synchronization in the Simple Loop Example |
|
|
27 | (1) |
|
Final Words on the Simple Loop Example |
|
|
28 | (1) |
|
|
29 | (3) |
|
|
32 | (3) |
|
|
35 | (1) |
|
Expressing Parallelism with Parallel Regions |
|
|
36 | (3) |
|
|
39 | (1) |
|
|
40 | (1) |
|
Exploiting Loop-Level Parallelism |
|
|
41 | (52) |
|
|
41 | (1) |
|
Form and Usage of the parallel do Directive |
|
|
42 | (4) |
|
|
43 | (1) |
|
Restrictions on Parallel Loops |
|
|
44 | (2) |
|
Meaning of the parallel do Directive |
|
|
46 | (1) |
|
Loop Nests and Parallelism |
|
|
46 | (1) |
|
|
47 | (18) |
|
General Properties of Data Scope Clause |
|
|
49 | (1) |
|
|
50 | (1) |
|
|
51 | (2) |
|
|
53 | (3) |
|
Changing Default Scoping Rules |
|
|
56 | (3) |
|
Parallelizing Reduction Operations |
|
|
59 | (4) |
|
Private Variable Initialization and Finalization |
|
|
63 | (2) |
|
Removing Data Dependences |
|
|
65 | (17) |
|
Why Data Dependences Are a Problem |
|
|
66 | (1) |
|
The First Step: Detection |
|
|
67 | (4) |
|
The Second Step: Classification |
|
|
71 | (2) |
|
|
73 | (8) |
|
|
81 | (1) |
|
|
82 | (8) |
|
|
82 | (3) |
|
Scheduling Loops to Balance the Load |
|
|
85 | (1) |
|
Static and Dynamic Scheduling |
|
|
86 | (1) |
|
|
86 | (2) |
|
Comparison of Runtime Scheduling Behavior |
|
|
88 | (2) |
|
|
90 | (1) |
|
|
90 | (3) |
|
Beyond Loop-Level Parallelism: Parallel Regions |
|
|
93 | (48) |
|
|
93 | (1) |
|
Form and Usage of the parallel Directive |
|
|
94 | (3) |
|
Clauses on the parallel Directive |
|
|
95 | (1) |
|
Restrictions on the parallel Directive |
|
|
96 | (1) |
|
Meaning of the parallel Directive |
|
|
97 | (3) |
|
Parallel Regions and SPMD-Style Parallelism |
|
|
100 | (1) |
|
threadprivate Variables and the copyin Clause |
|
|
100 | (8) |
|
The threadprivate Directive |
|
|
103 | (3) |
|
|
106 | (2) |
|
Work-Sharing in Parallel Regions |
|
|
108 | (11) |
|
|
108 | (1) |
|
Dividing Work Based on Thread Number |
|
|
109 | (2) |
|
Work-Sharing Constructs in OpenMP |
|
|
111 | (8) |
|
Restrictions on Work-Sharing Constructs |
|
|
119 | (4) |
|
|
119 | (1) |
|
|
120 | (2) |
|
Nesting of Work-Sharing Constructs |
|
|
122 | (1) |
|
Orphaning of Work-Sharing Constructs |
|
|
123 | (3) |
|
Data Scoping of Orphaned Constructs |
|
|
125 | (1) |
|
Writing Code with Orphaned Work-Sharing Constructs |
|
|
126 | (1) |
|
|
126 | (4) |
|
Directive Nesting and Binding |
|
|
129 | (1) |
|
Controlling Parallelism in an OpenMP Program |
|
|
130 | (7) |
|
Dynamically Disabling the parallel Directives |
|
|
130 | (1) |
|
Controlling the Number of Threads |
|
|
131 | (2) |
|
|
133 | (2) |
|
Runtime Library Calls and Environment Variables |
|
|
135 | (2) |
|
|
137 | (1) |
|
|
138 | (3) |
|
|
141 | (30) |
|
|
141 | (1) |
|
Data Conflicts and the Need for Synchronization |
|
|
142 | (5) |
|
Getting Rid of Data Races |
|
|
143 | (1) |
|
Examples of Acceptable Data Races |
|
|
144 | (2) |
|
Synchronization Mechanisms in OpenMP |
|
|
146 | (1) |
|
Mutual Exclusion Synchronization |
|
|
147 | (10) |
|
The Critical Section Directive |
|
|
147 | (5) |
|
|
152 | (3) |
|
Runtime Library Lock Routines |
|
|
155 | (2) |
|
|
157 | (5) |
|
|
157 | (2) |
|
|
159 | (2) |
|
|
161 | (1) |
|
Custom Synchronization: Rolling Your Own |
|
|
162 | (3) |
|
|
163 | (2) |
|
Some Practical Considerations |
|
|
165 | (6) |
|
|
168 | (1) |
|
|
168 | (3) |
|
|
171 | (40) |
|
|
171 | (2) |
|
Key Factors That Impact Performance |
|
|
173 | (25) |
|
|
173 | (2) |
|
|
175 | (4) |
|
|
179 | (13) |
|
|
192 | (6) |
|
Performance-Tuning Methodology |
|
|
198 | (3) |
|
|
201 | (3) |
|
Bus-Based and NUMA Machines |
|
|
204 | (3) |
|
|
207 | (1) |
|
|
207 | (4) |
Appendix A A Quick Reference to OpenMP |
|
211 | (6) |
References |
|
217 | (4) |
Index |
|
221 | |