Introduction to parallel computing /

Guardado en:
Detalles Bibliográficos
Autor principal: Grama, Ananth
Otros Autores: Gupta, Anshul, Kumar, Vipin, Karypis, George
Formato: Libro
Lenguaje:Inglés
Publicado: Harlow : Pearson Education, 2003
Edición:2nd. ed.
Materias:
Aporte de:Registro referencial: Solicitar el recurso aquí
LEADER 15685nam a2200349Ia 4500
001 2528
003 WAA
005 20220629062820.0
006 a||||fr|||| 001 0
007 ta
008 t ag-|||||r|||| 001 0 rpa d
999 |c 2528  |d 2528 
020 |a 9780201648652 
040 |c WAA  |a WAA 
041 |a eng 
100 1 |a Grama, Ananth  |9 2973 
245 0 |a Introduction to parallel computing /   |c Ananth Grama... [et al.] 
250 |a 2nd. ed. 
260 3 0 |a Harlow :   |b Pearson Education,   |c 2003 
300 |a 636 p. 
505 |a Preface – Acknowledgments -- 1. Introduction to Parallel Computing -- 1.1. Motivating Parallelism -- 1.1.1. The Computational Power Argument – from Transistors to FLOPS -- 1.1.2. The Memory/Disk Speed Argument -- 1.1.3. The Data Communication Argument -- 1.2. Scope of Parallel Computing -- 1.2.1. Applications in Engineering and Design -- 1.2.2. Scientific Applications -- 1.2.3. Commercial Applications -- 1.2.4. Applications in Computer Systems -- 1.3. Organization and Contents of the Text -- 1.4. Bibliographic Remarks Problems -- 2. Parallel Programming Platforms -- 2.1. Implicit Parallelism: Trends in Microprocessor Architectures -- 2.1.1. Pipelining and Superscalar Execution -- 2.1.2. Very Long Instruction Word Processors -- 2.2. Limitations of Memory System Performance -- 2.2.1. Improving Effective Memory Latency Using Caches -- 2.2.2. Impact of Memory Bandwidth -- 2.2.3. Alternate Approaches for Hiding Memory Latency -- Multithreading for Latency Hiding -- Prefetching for Latency Hiding -- 2.2.4. Tradeoffs of Multithreading and Prefetching -- 2.3. Dichotomy of Parallel Computing Platforms -- 2.3.1. Control Structure of Parallel Platforms -- 2.3.2. Communication Model of Parallel Platforms -- Shared-Address-Space Platforms -- Message-Passing Platforms -- 2.4. Physical Organization of Parallel Platforms -- 2.4.1. Architecture of an Ideal Parallel Computer -- Architectural Complexity of the Ideal Model -- 2.4.2. Interconnection Networks for Parallel Computers -- 2.4.3. Network Topologies -- Bus-Based Networks -- Crossbar Networks -- Multistage Networks -- Completely-Connected Network -- Star-Connected Network -- Linear Arrays, Meshes, and k-d Meshes -- Tree-Based Networks -- 2.4.4. Evaluating Static Interconnection Networks -- 2.4.5. Evaluating Dynamic Interconnection Networks -- 2.4.6. Cache Coherence in Multiprocessor Systems -- Snoopy Cache Systems -- Directory Based Systems -- 2.5. Communication Costs in Parallel Machines -- 2.5.1. Message Passing Costs in Parallel Computers -- Store-and-Forward Routing -- Packet Routing -- Cut-Through Routing -- A Simplified Cost Model for Communicating Messages -- 2.5.2. Communication Costs in Shared-Address-Space Machines -- 2.6. Routing Mechanisms for Interconnection Networks -- 2.7. Impact of Process-Processor Mapping and Mapping Techniques -- 2.7.1. Mapping Techniques for Graphs -- Embedding a Linear Array into a Hypercube -- Embedding a Mesh into a Hypercube -- Embedding a Mesh into a Linear Array -- Embedding a Hypercube into a 2-D Mesh -- Process-Processor Mapping and Design of Interconnection Networks -- 2.7.2. Cost-Performance Tradeoffs -- 2.8. Bibliographic Remarks -- Problems -- 3. Principles of Parallel Algorithm Design -- 3.1. Preliminaries -- 3.1.1. Decomposition, Tasks, and Dependency Graphs -- 3.1.2. Granularity, Concurrency, and Task-Interaction -- 3.1.3. Processes and Mapping -- 3.1.4. Processes versus Processors -- 3.2. Decomposition Techniques -- 3.2.1. Recursive Decomposition -- 3.2.2. Data Decomposition -- 3.2.3. Exploratory Decomposition -- 3.2.4. Speculative Decomposition -- 3.2.5. Hybrid Decompositions -- 3.3. Characteristics of Tasks and Interactions -- 3.3.1. Characteristics of Tasks -- 3.3.2. Characteristics of Inter-Task Interactions -- 3.4. Mapping Techniques for Load Balancing -- 3.4.1. Schemes for Static Mapping -- Mappings Based on Data Partitioning -- Block Distributions -- Cyclic and Block-Cyclic Distributions -- Randomized Block Distributions -- Mappings Based on Task Partitioning -- Hierarchical Mappings -- 3.4.2. Schemes for Dynamic Mapping Centralized Schemes -- Distributed Schemes -- Suitability to Parallel Architectures -- 3.5. Methods for Containing Interaction Overheads -- 3.5.1. Maximizing Data Locality -- 3.5.2. Minimizing Contention and Hot Spots -- 3.5.3. Overlapping Computations with Interactions -- 3.5.4. Replicating Data or Computations -- 3.5.5. Using Optimized Collective Interaction Operations -- 3.5.6. Overlapping Interactions with Other Interactions -- 3.6. Parallel Algorithm Models -- 3.6.1. The Data-Parallel Model -- 3.6.2. The Task Graph Model -- 3.6.3. The Work Pool Model -- 3.6.4. The Master-Slave Model -- 3.6.5. The Pipeline or Producer-Consumer Model -- 3.6.6. Hybrid Models -- 3.7. Bibliographic Remarks -- Problems -- 4. Basic Communication Operations -- 4.1. One-to-All Broadcast and All-to-One Reduction -- 4.1.1. Ring or Linear Array -- 4.1.2. Mesh -- 4.1.3. Hypercube -- 4.1.4. Balanced Binary Tree -- 4.1.5. Detailed Algorithms -- 4.1.6. Cost Analysis -- 4.2. All-to-All Broadcast and Reduction -- 4.2.1. Linear Array and Ring -- 4.2.2. Mesh -- 4.2.3. Hypercube -- 4.2.4. Cost Analysis -- 4.3. All-Reduce and Prefix-Sum Operations -- 4.4. Scatter and Gather -- 4.5. All-to-All Personalized Communication -- 4.5.1. Ring - 4.5.2. Mesh -- 4.5.3. Hypercube -- An Optimal Algorithm -- 4.6. Circular Shift -- 4.6.1. Mesh -- 4.6.2. Hypercube -- 4.7. Improving the Speed of Some Communication Operations -- 4.7.1. Splitting and Routing Messages in Parts -- One-to-All Broadcast -- All-to-One -- Reduction -- All-Reduce -- 4.7.2. All-Port Communication -- 4.8. Summary -- 4.9. Bibliographic Remarks -- Problems -- 5. Analytical Modeling of Parallel Programs -- 5.1. Sources of Overhead in Parallel Programs -- 5.2. Performance Metrics for Parallel Systems -- 5.2.1. Execution Time -- 5.2.2. Total Parallel Overhead -- 5.2.3. Speedup -- 5.2.4. Efficiency -- 5.2.5. Cost -- 5.3. The Effect of Granularity on Performance -- 5.4. Scalability of Parallel Systems -- 5.4.1. Scaling Characteristics of Parallel Programs -- 5.4.2. The Isoefficiency Metric of Scalability -- The Isoefficiency Function -- 5.4.3. Cost-Optimality and the Isoefficiency Function -- 5.4.4. A Lower Bound on the Isoefficiency Function -- 5.4.5. The Degree of Concurrency and the Isoefficiency Function -- 5.5. Minimum Execution Time and Minimum Cost-Optimal Execution Time -- 5.6. Asymptotic Analysis of Parallel Programs -- 5.7. Other Scalability Metrics -- 5.8. Bibliographic Remarks -- Problems -- 6. Programming Using the Message-Passing Paradigm -- 6.1. Principles of Message-Passing Programming -- 6.2. The Building Blocks: Send and Receive Operations -- 6.2.1. Blocking Message Passing Operations -- Blocking Non-Buffered Send/Receive -- Blocking Buffered Send/Receive -- 6.2.2. Non-Blocking Message Passing Operations -- 6.3. MPI: the Message Passing Interface -- 6.3.1. Starting and Terminating the MPI Library -- 6.3.2. Communicators -- 6.3.3. Getting Information -- 6.3.4. Sending and Receiving Messages -- 6.3.5. Example: Odd-Even Sort -- 6.4. Topologies and Embedding -- 6.4.1. Creating and Using Cartesian Topologies -- 6.4.2. Example: Cannon’s Matrix-Matrix Multiplication -- 6.5. Overlapping Communication with Computation -- 6.5.1. Non-Blocking Communication Operations -- Example: Cannon’s Matrix-Matrix Multiplication (Using Non-Blocking Operations) -- 6.6. Collective Communication and Computation Operations -- 6.6.1. Barrier -- 6.6.2. Broadcast -- 6.6.3. Reduction -- 6.6.4. Prefix -- 6.6.5. Gather -- 6.6.6. Scatter -- 6.6.7 All-to-All -- 6.6.8. Example: One-Dimensional Matrix-Vector Multiplication -- 6.6.9. Example: Single-Source Shortest-Path -- 6.6.10. Example: Sample Sort -- 6.7. Groups and Communicators -- 6.7.1. Example: Two-Dimensional Matrix-Vector Multiplication -- 6.8. Bibliographic Remarks -- Problems -- 7. Programming Shared Address Space Platforms -- 7.1. Thread Basics -- 7.2. Why Threads? -- 7.3. The POSIX Thread API -- 7.4. Thread Basics: Creation and Termination -- 7.5. Synchronization Primitives in Pthreads -- 7.5.1. Mutual Exclusion for Shared Variables -- Overheads of Locking -- Alleviating Locking Overheads -- 7.5.2. Condition Variables for Synchronization -- 7.6. Controlling Thread and Synchronization Attributes -- 7.6.1. Attributes Objects for Threads -- 7.6.2. Attributes Objects for Mutexes -- 7.7. Thread Cancellation -- 7.8. Composite Synchronization Constructs -- 7.8.1. Read-Write Locks -- 7.8.2. Barriers -- 7.9. Tips for Designing Asynchronous Programs -- 7.10. OpenMP: a Standard for Directive Based Parallel Programming -- 7.10.1. The OpenMP Programming Model -- 7.10.2. Specifying Concurrent Tasks in OpenMP -- The for Directive Assigning Iterations to Threads -- Synchronization Across Multiple for Directives -- The sections Directive -- Merging Directives -- Nesting parallel Directives -- 7.10.3. Synchronization Constructs in OpenMP -- Synchronization Point: The barrier Directive -- Single Thread Executions: The single and master Directives -- Critical Sections: The critical and atomic Directives -- In-Order Execution: The ordered Directive -- Memory Consistency: The flush Directive -- 7.10.4. Data Handling in OpenMP -- 7.10.5. OpenMP Library Functions -- Controlling Number of Threads and Processors -- Controlling and Monitoring Thread Creation -- Mutual Exclusion -- 7.10.6. Environment Variables in OpenMP -- 7.10.7. Explicit Threads versus OpenMP Based Programming -- 7.11. Bibliographic Remarks Problems -- 8. Dense Matrix Algorithms -- 8.1. Matrix-Vector Multiplication -- 8.1.1. Rowwise 1-D Partitioning -- One Row Per Process -- Using Fewer than n Processes -- 8.1.2. 2-D Partitioning -- One Element Per Process -- Using Fewer than n2 Processes -- Comparison of 1-D and 2-D Partitionings -- 8.2. Matrix-Matrix Multiplication -- 8.2.1. A Simple Parallel Algorithm -- 8.2.2. Cannon’s Algorithm -- 8.2.3. The DNS Algorithm -- DNS Algorithm with Fewer than n3 Processes -- 8.3. Solving a System of Linear Equations -- 8.3.1. A Simple Gaussian Elimination Algorithm -- Parallel Implementation with 1-D Partitioning -- Parallel Implementation with 2-D Partitioning -- 8.3.2. Gaussian Elimination with Partial Pivoting -- 1-D Partitioning -- 2-D Partitioning -- 8.3.3. Solving a Triangular System: Back-Substitution -- 8.3.4. Numerical Considerations in Solving Systems of Linear Equations -- 8.4. Bibliographic Remarks -- Problems -- 9. Sorting -- 9.1. Is 
650 4 |6 Informática  |9 13 
650 4 |a Programación informática  |9 14 
650 4 |a Diseño de sistemas  |9 705 
650 4 |a Programación paralela  |9 3018 
700 1 |9 2974  |a Gupta, Anshul 
700 1 |9 2976  |a Kumar, Vipin 
700 1 |9 2975  |a Karypis, George 
942 |2 CDU  |c LIBRO 
952 |0 0  |1 0  |2 CDU  |4 0  |6 004_420000000000000_G7617  |7 0  |9 390  |a 09  |b 09  |d 2017-10-20  |l 0  |o 004.42 G7617  |p 10-04480  |r 2017-10-20  |w 2017-10-20  |y LIBRO NPP 
952 |0 0  |1 0  |2 CDU  |4 0  |6 004_420000000000000_G7617  |7 0  |9 391  |a 09  |b 09  |d 2017-10-20  |l 0  |o 004.42 G7617  |p 10-04481  |r 2017-10-20  |w 2017-10-20  |y LIBRO 
952 |0 0  |1 0  |2 CDU  |4 0  |6 004_420000000000000_G7617  |7 0  |9 10631  |a 09  |b 09  |d 2018-03-23  |l 0  |o 004.42 G7617  |p 10-06171  |r 2018-03-23  |w 2018-03-23  |y LIBRO