High Performance Computing
first Edition

High Performance Computing

by Charles Severance, Kevin Dowd

The purpose of this book, High Performance Computing has always been to teach new programmers and scientists about the basics of High Performance Computing. This book is for learners with a basic understanding of modern computer architecture, not advanced degrees in computer engineering, as it is an easily understood introduction and overview of the topic.

Charles Severance is a Clinical Associate Professor in the School of Information at the University of Michigan where he teaches Informatics courses; he has also taught Computer Science at Michigan State University. He is active in Open Source and Open Educational Resources and teaches a number of free Massively Open Online Courses (MOOCs) on Python and Web Technologies on Coursera. Previously he was the Executive Director of the Sakai Foundation and the Chief Architect of the Sakai Project (www.sakaiproject.org).

Kevin Dowd is a consultant to the aerospace and commercial industries, specializing in performance computing and information infrastructures. He is a veteran of two computer companies (which no longer make computers), and the nuclear power plant business (not many more of those have been made either). Kevin is a principal in the Atlantic Computing Technology Corporation, located in Wethersfield, Connecticut.

Creative Commons Attribution License (by 3.0)

CC-BY 3.0

Under this license, any user of this textbook or the textbook contents herein must provide proper attribution as follows:

The OpenStax name, OpenStax logo, OpenStax book covers, OpenStax CNX name, and OpenStax CNX logo are not subject to the creative commons license and may not be reproduced without the prior and express written consent of Rice University. For questions regarding this license, please contact support@openstax.org.

  • If you use this textbook as a bibliographic reference, then you should cite it as follows:

    Charles SeveranceKevin Dowd, High Performance Computing. OpenStax CNX. 25 aug. 2010  http://cnx.org/contents/bb821554-7f76-44b1-89e7-8a2a759d1347@5.2.

  • If you redistribute this textbook in a print format, then you must include on every physical page the following attribution:
    Download for free at http://cnx.org/contents/bb821554-7f76-44b1-89e7-8a2a759d1347@5.2.
  • If you redistribute part of this textbook, then you must retain in every digital format page view (including but not limited to EPUB, PDF, and HTML) and on every physical printed page the following attribution:
    Download for free at http://cnx.org/contents/bb821554-7f76-44b1-89e7-8a2a759d1347@5.2.
Introduction. High Performance Computing
Introduction.1 - Introduction to the Connexions Edition
Introduction.2 - Introduction to High Performance Computing
Introduction.2.1 - Why Worry About Performance?
Introduction.2.2 - Scope of High Performance Computing
Introduction.2.3 - Studying High Performance Computing
Introduction.2.4 - Measuring Performance
Introduction.2.5 - The Next Step
Chapter One. Modern Computer Architectures
1.1 - Memory
1.1.1 - Introduction
1.1.2 - Memory Technology
1.1.3 - Registers
1.1.4 - Caches
1.1.5 - Cache Organization
1.1.6 - Virtual Memory
1.1.7 - Improving Memory Performance
1.1.8 - Closing Notes
1.1.9 - Exercises
1.2 - Floating-Point Numbers
1.2.1 - Introduction
1.2.2 - Reality
1.2.3 - Representation
1.2.4 - Effects of Floating-Point Representation
1.2.5 - More Algebra That Doesn't Work
1.2.6 - Improving Accuracy Using Guard Digits
1.2.7 - History of IEEE Floating-Point Format
1.2.8 - IEEE Operations
1.2.9 - Special Values
1.2.10 - Exceptions and Traps
1.2.11 - Compiler Issues
1.2.12 - Closing Notes
1.2.13 - Exercises
Chapter Two. Programming and Tuning Software
2.1 - What a Compiler Does
2.1.1 - Introduction
2.1.2 - History of Compilers
2.1.3 - Which Language To Optimize
2.1.4 - Optimizing Compiler Tour
2.1.5 - Optimization Levels
2.1.6 - Classical Optimizations
2.1.7 - Closing Notes
2.1.8 - Exercises
2.2 - Timing and Profiling
2.2.1 - Introduction
2.2.2 - Timing
2.2.3 - Subroutine Profiling
2.2.4 - Basic Block Profilers
2.2.5 - Virtual Memory
2.2.6 - Closing Notes
2.2.7 - Exercises
2.3 - Eliminating Clutter
2.3.1 - Introduction
2.3.2 - Subroutine Calls
2.3.3 - Branches
2.3.4 - Branches With Loops
2.3.5 - Other Clutter
2.3.6 - Closing Notes
2.3.7 - Exercises
2.4 - Loop Optimizations
2.4.1 - Introduction
2.4.2 - Operation Counting
2.4.3 - Basic Loop Unrolling
2.4.4 - Qualifying Candidates for Loop Unrolling Up one level
2.4.5 - Nested Loops
2.4.6 - Loop Interchange
2.4.7 - Memory Access Patterns
2.4.8 - When Interchange Won't Work
2.4.9 - Blocking to Ease Memory Access Patterns
2.4.10 - Programs That Require More Memory Than You Have
2.4.11 - Closing Notes
2.4.12 - Exercises
Chapter Three. Shared-Memory Parallel Processors
3.1 - Understanding Parallelism
3.1.1 - Introduction
3.1.2 - Dependencies
3.1.3 - Loops
3.1.4 - Loop-Carried Dependencies
3.1.5 - Ambiguous References
3.1.6 - Closing Notes
3.1.7 - Exercises
3.2 - Shared-Memory Multiprocessors
3.2.1 - Introduction
3.2.2 - Symmetric Multiprocessing Hardware
3.2.3 - Multiprocessor Software Concepts
3.2.4 - Techniques for Multithreaded Programs
3.2.5 - A Real Example
3.2.6 - Closing Notes
3.2.7 - Exercises
3.3 - Programming Shared-Memory Multiprocessors
3.3.1 - Introduction
3.3.2 - Automatic Parallelization
3.3.3 - Assisting the Compiler
3.3.4 - Closing Notes
3.3.5 - Exercises
Chapter Four. Scalable Parallel Processing
4.1 - Language Support for Performance
4.1.1 - Introduction
4.1.2 - Data-Parallel Problem: Heat Flow
4.1.3 - Explicity Parallel Languages
4.1.4 - FORTRAN 90
4.1.5 - Problem Decomposition
4.1.6 - High Performance FORTRAN (HPF)
4.1.7 - Closing Notes
4.2 - Message-Passing Environments
4.2.1 - Introduction
4.2.2 - Parallel Virtual Machine
4.2.3 - Message-Passing Interface
4.2.4 - Closing Notes
Chapter Five. Appendixes
5.1 - Appendix C: High Performance Microprocessors
5.1.1 - Introduction
5.1.2 - Why CISC?
5.1.3 - Fundamental of RISC
5.1.4 - Second-Generation RISC Processors
5.1.5 - RISC Means Fast
5.1.6 - Out-of-Order Execution: The Post-RISC Architecture
5.1.7 - Closing Notes
5.1.8 - Exercises
5.2 - Appendix B: Looking at Assembly Language
5.2.1 - Assembly Language
Chapter Six. Attributions