Scalable Shared-Memory Multiprocessing

by Lenoski, Daniel E.; Weber, Wolf-Dietrich

ISBN13: 9781558603158

ISBN10: 1558603158

Format: Hardcover

Pub. Date: 1995-03-01

Publisher(s): Elsevier Science Ltd

Other versions by this Author

This Item Qualifies for Free Shipping!*

*Excludes marketplace orders.
Complimentary 7-Day eTextbook Access - Read more

When you rent or buy this book, you will receive complimentary 7-day online access to the eTextbook version from your PC, Mac, tablet, or smartphone. Feature not included on Marketplace Items.

List Price: ~~$79.75~~

Buy New

Arriving Soon. Will ship when available.

$75.95

Add to Cart

Rent Textbook

Select for Price

Add to Cart

There was a problem. Please try again later.

Rent Digital

Online:1825 Days access
Downloadable:Lifetime Access

$87.54

Add to Cart

Used Textbook

We're Sorry
Sold Out

Buy from our Marketplace starting at $17.05

Summary

Dr. Lenoski and Dr. Weber have experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. They were key contributors to the architecture and design of the DASH multiprocessor. Currently, they are involved with commercializing scalable shared-memory technology.

Foreword

vii

Preface

xvii

PART 1 GENERAL CONCEPTS

Multiprocessing and Scalability

(38)

Multiprocessor Architecture

(7)

Single versus Multiple Instruction Streams

(1)

Message-Passing versus Shared-Memory Architectures

(5)

Cache Coherence

(7)

Uniprocessor Caches

(2)

Multiprocessor Caches

(4)

Scalability

(17)

Scalable Interconnection Networks

(7)

Scalable Cache Coherence

(2)

Scalable I/O

(1)

Summary of Hardware Architecture Scalability

(1)

Scalability of Parallel Software

(2)

Scaling and Processor Grain Size

(2)

Chapter Conclusions

(2)

Shared-Memory Parallel Programs

(46)

Basic Concepts

(5)

Parallel Application Set

(4)

MP3D

(1)

Water

(1)

PTHOR

(1)

LocusRoute

(1)

Cholesky

(1)

Barnes-Hut

(1)

Simulation Environment

(2)

Basic Program Characteristics

(1)

Parallel Application Execution Model

(1)

Parallel Execution under a PRAM Memory Model

(2)

Parallel Execution with Shared Data Uncached

(1)

Parallel Execution with Shared Data Cached

(2)

Summary of Results with Different Memory System Models

(1)

Communication Behavior of Parallel Applications

(1)

Communication-to-Computation Ratios

(3)

Invalidation Patterns

(22)

Classification of Data Objects

(2)

Average Invalidation Characteristics

(1)

Basic Invalidation Patterns for Each Application

(2)

MP3D

(1)

Water

(2)

PTHOR

(2)

LocusRoute

(2)

Cholesky

(1)

Barnes-Hut

(3)

Summary of Individual Invalidation Distributions

(1)

Effect of Problem Size

(1)

Effect of Number of Processors

(2)

Effect of Finite Caches and Replacement Hints

(2)

Effect of Cache Line Size

(3)

Invalidation Patterns Summary

(1)

Chapter Conclusions

(3)

System Performance Issues

(30)

Memory Latency

(1)

Memory Latency Reduction

(6)

Nonuniform Memory Access (NUMA)

(1)

Cache-Only Memory Architecture (COMA)

(2)

Direct Interconnect Networks

(1)

Hierarchical Access

(1)

Protocol Optimizations

(1)

Latency Reduction Summary

(1)

Latency Hiding

(16)

Weak Consistency Models

(4)

Prefetch

100

(3)

Multiple-Context Processors

103

(5)

Producer-Initiated Communication

108

(2)

Latency Hiding Summary

110

(1)

Memory Bandwidth

111

(5)

Hot Spots

112

(1)

Synchronization Support

113

(3)

Chapter Conclusions

116

(1)

System Implementation

117

(26)

Scalability of System Costs

117

(17)

Directory Storage Overhead

119

(8)

Sparse Directories

127

(5)

Hierarchical Directories

132

(1)

Summary of Directory Storage Overhead

133

(1)

Implementation Issues and Design Correctness

134

(8)

Unbounded Number of Requests

134

(2)

Distributed Memory Operations

136

(3)

Request Starvation

139

(1)

Error Detection and Fault Tolerance

139

(2)

Design Verification

141

(1)

Chapter Conclusions

142

(1)

Scalable Shared-Memory Systems

143

(30)

Directory-Based Systems

143

(7)

DASH

144

(1)

Alewife

144

(2)

S3.mp

146

(1)

IEEE Scalable Coherent Interface

147

(2)

Convex Exemplar

149

(1)

Hierarchical Systems

150

(7)

Encore GigaMax

151

(1)

ParaDiGM

152

(2)

Data Diffusion Machine

154

(1)

Kendall Square Research KSR-1 and KSR-2

155

(2)

Reflective Memory Systems

157

(2)

Plus

157

(1)

Merlin and Sesame

158

(1)

Non-Cache-Coherent Systems

159

(3)

NYU Ultracomputer

159

(1)

IBM RP3 and BBN TC2000

160

(1)

Cray Research T3D

161

(1)

Vector Supercomputer Systems

162

(4)

Cray Research Y-MP C90

163

(1)

Tera Computer MTA

164

(2)

Virtual Shared-Memory Systems

166

(4)

Ivy and Munin/Treadmarks

166

(1)

J-Machine

167

(2)

MIT/Motorola *T and *T-NG

169

(1)

Chapter Conclusions

170

(3)

PART 2 EXPERIENCE WITH DASH

DASH Prototype System

173

(32)

System Organization

174

(7)

Cluster Organization

175

(2)

Directory Logic

177

(3)

Interconnection Network

180

(1)

Programmer's Model

181

(3)

Coherence Protocol

184

(14)

Nomenclature

185

(2)

Basic Memory Operations

187

(5)

Prefetch Operations

192

(1)

DMA/Uncached Operations

193

(5)

Synchronization Protocol

198

(3)

Granting Locks

198

(2)

Fetch&Op Variables

200

(1)

Fence Operations

200

(1)

Protocol General Exceptions

201

(1)

Chapter Conclusions

202

(3)

Prototype Hardware Structures

205

(32)

Base Cluster Hardware

206

(5)

SGI Multiprocessor Bus (MPBUS)

206

(1)

SGI CPU Board

207

(3)

SGI Memory Board

210

(1)

SGI I/O Board

211

(1)

Directory Controller

211

(7)

Reply Controller

218

(6)

Pseudo-CPU

224

(2)

Network and Network Interface

226

(3)

Performance Monitor

229

(3)

Logic Overhead of Directory-Based Coherence

232

(4)

Chapter Conclusions

236

(1)

Prototype Performance Analysis

237

(40)

Base Memory Performance

237

(9)

Overall Memory System Bandwidth

238

(2)

Other Memory Bandwidth Limits

240

(1)

Processor Issue Bandwidth and Latency

241

(3)

Interprocessor Latency

244

(1)

Summary of Memory System Bandwidth and Latency

244

(2)

Parallel Application Performance

246

(14)

Application Run-Time Environment

246

(1)

Application Speedups

247

(3)

Detailed Case Studies

250

(7)

Application Speedup Summary

257

(3)

Protocol Effectiveness

260

(11)

Base Protocol Features

260

(4)

Alternative Memory Operations

264

(7)

Chapter Conclusions

271

(6)

PART 3 FUTURE TRENDS

TeraDASH

277

(28)

TeraDASH System Organization

277

(9)

TeraDASH Cluster Structure

278

(2)

Intracluster Operations

280

(3)

TeraDASH Mesh Network

283

(1)

TeraDASH Directory Structure

284

(2)

TeraDASH Coherence Protocol

286

(10)

Required Changes for the Scalable Directory Structure

286

(2)

Enhancements for Increased Protocol Robustness

288

(6)

Enhancements for Increased Performance

294

(2)

TeraDASH Performance

296

(7)

Access Latencies

297

(1)

Potential Application Speedup

298

(5)

Chapter Conclusions

303

(2)

Conclusions and Future Directions

305

(6)

SSMP Design Conclusions

306

(1)

Current Trends

307

(1)

Future Trends

308

(3)

Appendix Multiprocessor Systems

311

(6)

References

317

(16)

Index

333

Kids

Men

Women

For You

For Your Car

For Your Home

For Your Pet

For Your Tech

Artwork

Games

Gift Wraps

Holiday

Home Decor

Mascot

Office Decor

Outdoor/Recreation

Graduation Gear

Graduation Gifts

Art Supplies

For Your Office

For Your Tech

Office Supplies

School Supplies

Scalable Shared-Memory Multiprocessing

Buy New

Rent Textbook

Rent Digital

Used Textbook

Summary

Table of Contents

Scalable Shared-Memory Multiprocessing

Buy New

Rent Textbook

Rent Digital

Used Textbook

How Marketplace Works:

Summary

Table of Contents

Digital License