Scalable Shared-Memory Multiprocessing

by ;
Format: Hardcover
Pub. Date: 1995-03-01
Publisher(s): Elsevier Science Ltd
  • Free Shipping Icon

    This Item Qualifies for Free Shipping!*

    *Excludes marketplace orders.

  • Complimentary 7-Day eTextbook Access - Read more
    When you rent or buy this book, you will receive complimentary 7-day online access to the eTextbook version from your PC, Mac, tablet, or smartphone. Feature not included on Marketplace Items.
List Price: $79.75

Buy New

Arriving Soon. Will ship when available.
$75.95

Rent Textbook

Select for Price
There was a problem. Please try again later.

Rent Digital

Rent Digital Options
Online:1825 Days access
Downloadable:Lifetime Access
$87.54
$87.54

Used Textbook

We're Sorry
Sold Out

How Marketplace Works:

  • This item is offered by an independent seller and not shipped from our warehouse
  • Item details like edition and cover design may differ from our description; see seller's comments before ordering.
  • Sellers much confirm and ship within two business days; otherwise, the order will be cancelled and refunded.
  • Marketplace purchases cannot be returned to eCampus.com. Contact the seller directly for inquiries; if no response within two days, contact customer service.
  • Additional shipping costs apply to Marketplace purchases. Review shipping costs at checkout.

Summary

Dr. Lenoski and Dr. Weber have experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. They were key contributors to the architecture and design of the DASH multiprocessor. Currently, they are involved with commercializing scalable shared-memory technology.

Table of Contents

Foreword vii
Preface xvii
PART 1 GENERAL CONCEPTS
Multiprocessing and Scalability
3(38)
Multiprocessor Architecture
6(7)
Single versus Multiple Instruction Streams
7(1)
Message-Passing versus Shared-Memory Architectures
8(5)
Cache Coherence
13(7)
Uniprocessor Caches
14(2)
Multiprocessor Caches
16(4)
Scalability
20(17)
Scalable Interconnection Networks
24(7)
Scalable Cache Coherence
31(2)
Scalable I/O
33(1)
Summary of Hardware Architecture Scalability
34(1)
Scalability of Parallel Software
35(2)
Scaling and Processor Grain Size
37(2)
Chapter Conclusions
39(2)
Shared-Memory Parallel Programs
41(46)
Basic Concepts
41(5)
Parallel Application Set
46(4)
MP3D
48(1)
Water
48(1)
PTHOR
49(1)
LocusRoute
49(1)
Cholesky
49(1)
Barnes-Hut
50(1)
Simulation Environment
50(2)
Basic Program Characteristics
51(1)
Parallel Application Execution Model
52(1)
Parallel Execution under a PRAM Memory Model
53(2)
Parallel Execution with Shared Data Uncached
55(1)
Parallel Execution with Shared Data Cached
56(2)
Summary of Results with Different Memory System Models
58(1)
Communication Behavior of Parallel Applications
59(1)
Communication-to-Computation Ratios
59(3)
Invalidation Patterns
62(22)
Classification of Data Objects
62(2)
Average Invalidation Characteristics
64(1)
Basic Invalidation Patterns for Each Application
65(2)
MP3D
67(1)
Water
67(2)
PTHOR
69(2)
LocusRoute
71(2)
Cholesky
73(1)
Barnes-Hut
73(3)
Summary of Individual Invalidation Distributions
76(1)
Effect of Problem Size
76(1)
Effect of Number of Processors
76(2)
Effect of Finite Caches and Replacement Hints
78(2)
Effect of Cache Line Size
80(3)
Invalidation Patterns Summary
83(1)
Chapter Conclusions
84(3)
System Performance Issues
87(30)
Memory Latency
88(1)
Memory Latency Reduction
89(6)
Nonuniform Memory Access (NUMA)
90(1)
Cache-Only Memory Architecture (COMA)
91(2)
Direct Interconnect Networks
93(1)
Hierarchical Access
93(1)
Protocol Optimizations
94(1)
Latency Reduction Summary
95(1)
Latency Hiding
95(16)
Weak Consistency Models
96(4)
Prefetch
100(3)
Multiple-Context Processors
103(5)
Producer-Initiated Communication
108(2)
Latency Hiding Summary
110(1)
Memory Bandwidth
111(5)
Hot Spots
112(1)
Synchronization Support
113(3)
Chapter Conclusions
116(1)
System Implementation
117(26)
Scalability of System Costs
117(17)
Directory Storage Overhead
119(8)
Sparse Directories
127(5)
Hierarchical Directories
132(1)
Summary of Directory Storage Overhead
133(1)
Implementation Issues and Design Correctness
134(8)
Unbounded Number of Requests
134(2)
Distributed Memory Operations
136(3)
Request Starvation
139(1)
Error Detection and Fault Tolerance
139(2)
Design Verification
141(1)
Chapter Conclusions
142(1)
Scalable Shared-Memory Systems
143(30)
Directory-Based Systems
143(7)
DASH
144(1)
Alewife
144(2)
S3.mp
146(1)
IEEE Scalable Coherent Interface
147(2)
Convex Exemplar
149(1)
Hierarchical Systems
150(7)
Encore GigaMax
151(1)
ParaDiGM
152(2)
Data Diffusion Machine
154(1)
Kendall Square Research KSR-1 and KSR-2
155(2)
Reflective Memory Systems
157(2)
Plus
157(1)
Merlin and Sesame
158(1)
Non-Cache-Coherent Systems
159(3)
NYU Ultracomputer
159(1)
IBM RP3 and BBN TC2000
160(1)
Cray Research T3D
161(1)
Vector Supercomputer Systems
162(4)
Cray Research Y-MP C90
163(1)
Tera Computer MTA
164(2)
Virtual Shared-Memory Systems
166(4)
Ivy and Munin/Treadmarks
166(1)
J-Machine
167(2)
MIT/Motorola *T and *T-NG
169(1)
Chapter Conclusions
170(3)
PART 2 EXPERIENCE WITH DASH
DASH Prototype System
173(32)
System Organization
174(7)
Cluster Organization
175(2)
Directory Logic
177(3)
Interconnection Network
180(1)
Programmer's Model
181(3)
Coherence Protocol
184(14)
Nomenclature
185(2)
Basic Memory Operations
187(5)
Prefetch Operations
192(1)
DMA/Uncached Operations
193(5)
Synchronization Protocol
198(3)
Granting Locks
198(2)
Fetch&Op Variables
200(1)
Fence Operations
200(1)
Protocol General Exceptions
201(1)
Chapter Conclusions
202(3)
Prototype Hardware Structures
205(32)
Base Cluster Hardware
206(5)
SGI Multiprocessor Bus (MPBUS)
206(1)
SGI CPU Board
207(3)
SGI Memory Board
210(1)
SGI I/O Board
211(1)
Directory Controller
211(7)
Reply Controller
218(6)
Pseudo-CPU
224(2)
Network and Network Interface
226(3)
Performance Monitor
229(3)
Logic Overhead of Directory-Based Coherence
232(4)
Chapter Conclusions
236(1)
Prototype Performance Analysis
237(40)
Base Memory Performance
237(9)
Overall Memory System Bandwidth
238(2)
Other Memory Bandwidth Limits
240(1)
Processor Issue Bandwidth and Latency
241(3)
Interprocessor Latency
244(1)
Summary of Memory System Bandwidth and Latency
244(2)
Parallel Application Performance
246(14)
Application Run-Time Environment
246(1)
Application Speedups
247(3)
Detailed Case Studies
250(7)
Application Speedup Summary
257(3)
Protocol Effectiveness
260(11)
Base Protocol Features
260(4)
Alternative Memory Operations
264(7)
Chapter Conclusions
271(6)
PART 3 FUTURE TRENDS
TeraDASH
277(28)
TeraDASH System Organization
277(9)
TeraDASH Cluster Structure
278(2)
Intracluster Operations
280(3)
TeraDASH Mesh Network
283(1)
TeraDASH Directory Structure
284(2)
TeraDASH Coherence Protocol
286(10)
Required Changes for the Scalable Directory Structure
286(2)
Enhancements for Increased Protocol Robustness
288(6)
Enhancements for Increased Performance
294(2)
TeraDASH Performance
296(7)
Access Latencies
297(1)
Potential Application Speedup
298(5)
Chapter Conclusions
303(2)
Conclusions and Future Directions
305(6)
SSMP Design Conclusions
306(1)
Current Trends
307(1)
Future Trends
308(3)
Appendix Multiprocessor Systems 311(6)
References 317(16)
Index 333

An electronic version of this book is available through VitalSource.

This book is viewable on PC, Mac, iPhone, iPad, iPod Touch, and most smartphones.

By purchasing, you will be able to view this book online, as well as download it, for the chosen number of days.

Digital License

You are licensing a digital product for a set duration. Durations are set forth in the product description, with "Lifetime" typically meaning five (5) years of online access and permanent download to a supported device. All licenses are non-transferable.

More details can be found here.

A downloadable version of this book is available through the eCampus Reader or compatible Adobe readers.

Applications are available on iOS, Android, PC, Mac, and Windows Mobile platforms.

Please view the compatibility matrix prior to purchase.