Introduction to parallel computing /

Guardado en:

Detalles Bibliográficos
Autor principal:	Grama, Ananth
Otros Autores:	Gupta, Anshul, Kumar, Vipin, Karypis, George
Formato:	Libro
Lenguaje:	Inglés
Publicado:	Harlow : Pearson Education, 2003
Edición:	2nd. ed.
Materias:	Programación informática Diseño de sistemas Programación paralela
Aporte de:	Registro referencial: Solicitar el recurso aquí Sistema de Bibliotecas UNRN de Universidad Nacional de Río Negro


LEADER	15685nam a2200349Ia 4500
001	2528
003	WAA
005	20220629062820.0
006	a\|\|\|\|fr\|\|\|\| 001 0
007	ta
008	t ag-\|\|\|\|\|r\|\|\|\| 001 0 rpa d
999			\|c 2528 \|d 2528
020			\|a 9780201648652
040			\|c WAA \|a WAA
041			\|a eng
100	1		\|a Grama, Ananth \|9 2973
245		0	\|a Introduction to parallel computing / \|c Ananth Grama... [et al.]
250			\|a 2nd. ed.
260	3	0	\|a Harlow : \|b Pearson Education, \|c 2003
300			\|a 636 p.
505			\|a Preface – Acknowledgments -- 1. Introduction to Parallel Computing -- 1.1. Motivating Parallelism -- 1.1.1. The Computational Power Argument – from Transistors to FLOPS -- 1.1.2. The Memory/Disk Speed Argument -- 1.1.3. The Data Communication Argument -- 1.2. Scope of Parallel Computing -- 1.2.1. Applications in Engineering and Design -- 1.2.2. Scientific Applications -- 1.2.3. Commercial Applications -- 1.2.4. Applications in Computer Systems -- 1.3. Organization and Contents of the Text -- 1.4. Bibliographic Remarks Problems -- 2. Parallel Programming Platforms -- 2.1. Implicit Parallelism: Trends in Microprocessor Architectures -- 2.1.1. Pipelining and Superscalar Execution -- 2.1.2. Very Long Instruction Word Processors -- 2.2. Limitations of Memory System Performance -- 2.2.1. Improving Effective Memory Latency Using Caches -- 2.2.2. Impact of Memory Bandwidth -- 2.2.3. Alternate Approaches for Hiding Memory Latency -- Multithreading for Latency Hiding -- Prefetching for Latency Hiding -- 2.2.4. Tradeoffs of Multithreading and Prefetching -- 2.3. Dichotomy of Parallel Computing Platforms -- 2.3.1. Control Structure of Parallel Platforms -- 2.3.2. Communication Model of Parallel Platforms -- Shared-Address-Space Platforms -- Message-Passing Platforms -- 2.4. Physical Organization of Parallel Platforms -- 2.4.1. Architecture of an Ideal Parallel Computer -- Architectural Complexity of the Ideal Model -- 2.4.2. Interconnection Networks for Parallel Computers -- 2.4.3. Network Topologies -- Bus-Based Networks -- Crossbar Networks -- Multistage Networks -- Completely-Connected Network -- Star-Connected Network -- Linear Arrays, Meshes, and k-d Meshes -- Tree-Based Networks -- 2.4.4. Evaluating Static Interconnection Networks -- 2.4.5. Evaluating Dynamic Interconnection Networks -- 2.4.6. Cache Coherence in Multiprocessor Systems -- Snoopy Cache Systems -- Directory Based Systems -- 2.5. Communication Costs in Parallel Machines -- 2.5.1. Message Passing Costs in Parallel Computers -- Store-and-Forward Routing -- Packet Routing -- Cut-Through Routing -- A Simplified Cost Model for Communicating Messages -- 2.5.2. Communication Costs in Shared-Address-Space Machines -- 2.6. Routing Mechanisms for Interconnection Networks -- 2.7. Impact of Process-Processor Mapping and Mapping Techniques -- 2.7.1. Mapping Techniques for Graphs -- Embedding a Linear Array into a Hypercube -- Embedding a Mesh into a Hypercube -- Embedding a Mesh into a Linear Array -- Embedding a Hypercube into a 2-D Mesh -- Process-Processor Mapping and Design of Interconnection Networks -- 2.7.2. Cost-Performance Tradeoffs -- 2.8. Bibliographic Remarks -- Problems -- 3. Principles of Parallel Algorithm Design -- 3.1. Preliminaries -- 3.1.1. Decomposition, Tasks, and Dependency Graphs -- 3.1.2. Granularity, Concurrency, and Task-Interaction -- 3.1.3. Processes and Mapping -- 3.1.4. Processes versus Processors -- 3.2. Decomposition Techniques -- 3.2.1. Recursive Decomposition -- 3.2.2. Data Decomposition -- 3.2.3. Exploratory Decomposition -- 3.2.4. Speculative Decomposition -- 3.2.5. Hybrid Decompositions -- 3.3. Characteristics of Tasks and Interactions -- 3.3.1. Characteristics of Tasks -- 3.3.2. Characteristics of Inter-Task Interactions -- 3.4. Mapping Techniques for Load Balancing -- 3.4.1. Schemes for Static Mapping -- Mappings Based on Data Partitioning -- Block Distributions -- Cyclic and Block-Cyclic Distributions -- Randomized Block Distributions -- Mappings Based on Task Partitioning -- Hierarchical Mappings -- 3.4.2. Schemes for Dynamic Mapping Centralized Schemes -- Distributed Schemes -- Suitability to Parallel Architectures -- 3.5. Methods for Containing Interaction Overheads -- 3.5.1. Maximizing Data Locality -- 3.5.2. Minimizing Contention and Hot Spots -- 3.5.3. Overlapping Computations with Interactions -- 3.5.4. Replicating Data or Computations -- 3.5.5. Using Optimized Collective Interaction Operations -- 3.5.6. Overlapping Interactions with Other Interactions -- 3.6. Parallel Algorithm Models -- 3.6.1. The Data-Parallel Model -- 3.6.2. The Task Graph Model -- 3.6.3. The Work Pool Model -- 3.6.4. The Master-Slave Model -- 3.6.5. The Pipeline or Producer-Consumer Model -- 3.6.6. Hybrid Models -- 3.7. Bibliographic Remarks -- Problems -- 4. Basic Communication Operations -- 4.1. One-to-All Broadcast and All-to-One Reduction -- 4.1.1. Ring or Linear Array -- 4.1.2. Mesh -- 4.1.3. Hypercube -- 4.1.4. Balanced Binary Tree -- 4.1.5. Detailed Algorithms -- 4.1.6. Cost Analysis -- 4.2. All-to-All Broadcast and Reduction -- 4.2.1. Linear Array and Ring -- 4.2.2. Mesh -- 4.2.3. Hypercube -- 4.2.4. Cost Analysis -- 4.3. All-Reduce and Prefix-Sum Operations -- 4.4. Scatter and Gather -- 4.5. All-to-All Personalized Communication -- 4.5.1. Ring - 4.5.2. Mesh -- 4.5.3. Hypercube -- An Optimal Algorithm -- 4.6. Circular Shift -- 4.6.1. Mesh -- 4.6.2. Hypercube -- 4.7. Improving the Speed of Some Communication Operations -- 4.7.1. Splitting and Routing Messages in Parts -- One-to-All Broadcast -- All-to-One -- Reduction -- All-Reduce -- 4.7.2. All-Port Communication -- 4.8. Summary -- 4.9. Bibliographic Remarks -- Problems -- 5. Analytical Modeling of Parallel Programs -- 5.1. Sources of Overhead in Parallel Programs -- 5.2. Performance Metrics for Parallel Systems -- 5.2.1. Execution Time -- 5.2.2. Total Parallel Overhead -- 5.2.3. Speedup -- 5.2.4. Efficiency -- 5.2.5. Cost -- 5.3. The Effect of Granularity on Performance -- 5.4. Scalability of Parallel Systems -- 5.4.1. Scaling Characteristics of Parallel Programs -- 5.4.2. The Isoefficiency Metric of Scalability -- The Isoefficiency Function -- 5.4.3. Cost-Optimality and the Isoefficiency Function -- 5.4.4. A Lower Bound on the Isoefficiency Function -- 5.4.5. The Degree of Concurrency and the Isoefficiency Function -- 5.5. Minimum Execution Time and Minimum Cost-Optimal Execution Time -- 5.6. Asymptotic Analysis of Parallel Programs -- 5.7. Other Scalability Metrics -- 5.8. Bibliographic Remarks -- Problems -- 6. Programming Using the Message-Passing Paradigm -- 6.1. Principles of Message-Passing Programming -- 6.2. The Building Blocks: Send and Receive Operations -- 6.2.1. Blocking Message Passing Operations -- Blocking Non-Buffered Send/Receive -- Blocking Buffered Send/Receive -- 6.2.2. Non-Blocking Message Passing Operations -- 6.3. MPI: the Message Passing Interface -- 6.3.1. Starting and Terminating the MPI Library -- 6.3.2. Communicators -- 6.3.3. Getting Information -- 6.3.4. Sending and Receiving Messages -- 6.3.5. Example: Odd-Even Sort -- 6.4. Topologies and Embedding -- 6.4.1. Creating and Using Cartesian Topologies -- 6.4.2. Example: Cannon’s Matrix-Matrix Multiplication -- 6.5. Overlapping Communication with Computation -- 6.5.1. Non-Blocking Communication Operations -- Example: Cannon’s Matrix-Matrix Multiplication (Using Non-Blocking Operations) -- 6.6. Collective Communication and Computation Operations -- 6.6.1. Barrier -- 6.6.2. Broadcast -- 6.6.3. Reduction -- 6.6.4. Prefix -- 6.6.5. Gather -- 6.6.6. Scatter -- 6.6.7 All-to-All -- 6.6.8. Example: One-Dimensional Matrix-Vector Multiplication -- 6.6.9. Example: Single-Source Shortest-Path -- 6.6.10. Example: Sample Sort -- 6.7. Groups and Communicators -- 6.7.1. Example: Two-Dimensional Matrix-Vector Multiplication -- 6.8. Bibliographic Remarks -- Problems -- 7. Programming Shared Address Space Platforms -- 7.1. Thread Basics -- 7.2. Why Threads? -- 7.3. The POSIX Thread API -- 7.4. Thread Basics: Creation and Termination -- 7.5. Synchronization Primitives in Pthreads -- 7.5.1. Mutual Exclusion for Shared Variables -- Overheads of Locking -- Alleviating Locking Overheads -- 7.5.2. Condition Variables for Synchronization -- 7.6. Controlling Thread and Synchronization Attributes -- 7.6.1. Attributes Objects for Threads -- 7.6.2. Attributes Objects for Mutexes -- 7.7. Thread Cancellation -- 7.8. Composite Synchronization Constructs -- 7.8.1. Read-Write Locks -- 7.8.2. Barriers -- 7.9. Tips for Designing Asynchronous Programs -- 7.10. OpenMP: a Standard for Directive Based Parallel Programming -- 7.10.1. The OpenMP Programming Model -- 7.10.2. Specifying Concurrent Tasks in OpenMP -- The for Directive Assigning Iterations to Threads -- Synchronization Across Multiple for Directives -- The sections Directive -- Merging Directives -- Nesting parallel Directives -- 7.10.3. Synchronization Constructs in OpenMP -- Synchronization Point: The barrier Directive -- Single Thread Executions: The single and master Directives -- Critical Sections: The critical and atomic Directives -- In-Order Execution: The ordered Directive -- Memory Consistency: The flush Directive -- 7.10.4. Data Handling in OpenMP -- 7.10.5. OpenMP Library Functions -- Controlling Number of Threads and Processors -- Controlling and Monitoring Thread Creation -- Mutual Exclusion -- 7.10.6. Environment Variables in OpenMP -- 7.10.7. Explicit Threads versus OpenMP Based Programming -- 7.11. Bibliographic Remarks Problems -- 8. Dense Matrix Algorithms -- 8.1. Matrix-Vector Multiplication -- 8.1.1. Rowwise 1-D Partitioning -- One Row Per Process -- Using Fewer than n Processes -- 8.1.2. 2-D Partitioning -- One Element Per Process -- Using Fewer than n2 Processes -- Comparison of 1-D and 2-D Partitionings -- 8.2. Matrix-Matrix Multiplication -- 8.2.1. A Simple Parallel Algorithm -- 8.2.2. Cannon’s Algorithm -- 8.2.3. The DNS Algorithm -- DNS Algorithm with Fewer than n3 Processes -- 8.3. Solving a System of Linear Equations -- 8.3.1. A Simple Gaussian Elimination Algorithm -- Parallel Implementation with 1-D Partitioning -- Parallel Implementation with 2-D Partitioning -- 8.3.2. Gaussian Elimination with Partial Pivoting -- 1-D Partitioning -- 2-D Partitioning -- 8.3.3. Solving a Triangular System: Back-Substitution -- 8.3.4. Numerical Considerations in Solving Systems of Linear Equations -- 8.4. Bibliographic Remarks -- Problems -- 9. Sorting -- 9.1. Is
650		4	\|6 Informática \|9 13
650		4	\|a Programación informática \|9 14
650		4	\|a Diseño de sistemas \|9 705
650		4	\|a Programación paralela \|9 3018
700	1		\|9 2974 \|a Gupta, Anshul
700	1		\|9 2976 \|a Kumar, Vipin
700	1		\|9 2975 \|a Karypis, George
942			\|2 CDU \|c LIBRO
952			\|0 0 \|1 0 \|2 CDU \|4 0 \|6 004_420000000000000_G7617 \|7 0 \|9 390 \|a 09 \|b 09 \|d 2017-10-20 \|l 0 \|o 004.42 G7617 \|p 10-04480 \|r 2017-10-20 \|w 2017-10-20 \|y LIBRO NPP
952			\|0 0 \|1 0 \|2 CDU \|4 0 \|6 004_420000000000000_G7617 \|7 0 \|9 391 \|a 09 \|b 09 \|d 2017-10-20 \|l 0 \|o 004.42 G7617 \|p 10-04481 \|r 2017-10-20 \|w 2017-10-20 \|y LIBRO
952			\|0 0 \|1 0 \|2 CDU \|4 0 \|6 004_420000000000000_G7617 \|7 0 \|9 10631 \|a 09 \|b 09 \|d 2018-03-23 \|l 0 \|o 004.42 G7617 \|p 10-06171 \|r 2018-03-23 \|w 2018-03-23 \|y LIBRO

Introduction to parallel computing /

Ejemplares similares