This folder is used for my notes for the TM10028000 - Computer Architecture class.
Course Summary #1 (chapter1-6.pdf)
- Instructor
- Table of Contents
- Introduction
- Class of Computers
- Defining Computer Architecture
- Trends in Technology
- Define and Quantity Dependability
- Definition of Performance
- 5 Quantitative Principles of Computer Design
- Fallacies and Pitfalls
Course Summary #2 (chapter2.pdf)
- Von Neumann Model: An Execution Model of Computers
- Classifying Instruction Set Architectures
- Classification of Instructions based on Number of Operands
- Memory Addressing
- Addressing Mode
- Type and Size of Operands
- Operations in the Instruction Set
- Instructions for Control Flow
- Encoding an Instruction Set
- Example: MIPS Architecture
- Fallacies and Pitfalls
- Conclusion
Course Summary #3 (chapter3.pdf)
- How computers handle machine instructions
- What is Pipelining?
- Pipelining: A Mechanism to Increase the Throughput in General-Purpose Register Architecture Processors
- Example of a Pipelined RISC Processor (Cont’d)
- A RISC Data Path Drawn in a Pipeline Fashion
- A Pipeline with Pipeline Registers
- Basic Performance Issues in Pipelining
- Major Hurdle of Pipelining: Pipeline Hazards
- Performance of Pipelines with Stalls
- Performance of Pipelines with Stalls (cont’d)
- The Major Hurdle of Pipelining-Pipeline Hazards
- Structure Hazards
- A Processor with Only One Memory Port
- A Pipeline Stalled for a Structural Hazard
- Consideration about Structural Hazard
- Data Hazards
- Minimizing Data Hazard Stalls by Forwarding
- Data forwarding
- Implementation of Data Forwarding
- Data Hazards Requiring Stalls
- Pipeline Interlocking to Preserve Correct Execution
- Control (Branch) Hazards
- Reducing Pipeline Branch Penalties
- Scheduling the Branch Delay Slot
- Implementation of the MIPS Data Path (non-pipelined)
- How is Pipelining Implemented?
- Extending the MIPS Pipeline to Handle Muticycle Operations
- Example: MIPS R4000 Pipeline
- Conclusion
Course Summary #4 (chapter4.pdf)
- Introduction
- Cache Performance
- Definition
- Four memory hierarchy questions
- Two options on a write miss
- Miss Comparison of Instruction, Data and Unified Caches
- Cache Performance
- How Memory Time can be Improved by Caches: Six Basic Cache Optimizations
- 3C’s of Cache Misses
- 1. Larger Block Size to Reduce Miss Rate
- 2. Larger Caches to Reduces Miss Rate
- 3. Higher Associativity to Reduce Miss Rate
- 4. Multilevel Caches to Reduce Miss Penalty
- Design of 2nd Level Cache: Inclusion or Exclusion?
- 5. Giving Priority to Read Misses over Writes to Reduce Miss Penalty
- Solutions to Read-After-Write Hazard in Memory
- 6. Avoiding Address Translation During Indexing of the Cache to Reduce Hit Time
- Ten Cache Optimizations
- 1. Small and simple first-level caches to reduce hit time and power
- 2. Way prediction to reduce hit time
- 3. Pipelined cache access to increase cache bandwidth
- 4. Non-blocking caches to increase cache bandwidth
- 5. Multi-banked caches to increase cache bandwidth
- 6. Critical word first and early restart to reduce miss penalty
- 7. Merging write buffer to reduce miss penalty
- 8. Compiler optimizations to reduce miss rate
- 9. Hardware prefetching of instructions and data to reduce miss penalty or miss rate
- 10. Compiler-controlled prefetching to reduce miss penalty or miss rate
- Virtual Memory
- Terminology
- Further Difference Between Caches and Virtual Memory
- Paging versus Segmentation
- Four Memory Hierarchy Questions for Virtual Memory
- Techniques for Fast Address Translation
- Fast Translation Using a TLB
- TLB Miss
- Page Fault Handler
- Selecting a Page Size
- Protection with Virtual Memory
- Implementing Protection with Virtual Memory
- Memory Hierarchies in the ARM Cortex-A8
- 2-Level TLB Organization
- 3-Level Cache Organization
- Implemented Miss Penalty Reduction Mechanisms in Nehalem-EX
- Fallacies and Pitfalls
- Conclusions
Course Summary #5 (chapter5.pdf)
- Trends in Computer Architecture Design
- Importance of Data Dependences and Hazards
- Data Dependences
- Data Hazards
- Exploiting Instruction-Level Parallelism with Hardware Approaches
- Overcoming Data Hazards with Dynamic Scheduling
- Dynamic Scheduling: The Idea
- Some Concerns about Dynamic Scheduling
- New Pipeline Staging for Out-Of-Order Execution
- Basic Dynamic Scheduling
- Scoreboarding (First introduced in CDC6600)
- The basic four steps for scoreboarding
- Data Structures for Scoreboarding
- Required Checks and Bookkeeping Actions
- Factors Limiting the Scoreboard Performance
- Tomasulo’s Approach: Solve the Problems of Scoreboarding!
- Reducing Branch Costs with Dynamic Hardware Prediction
- Control (Branch) Hazards
- Basic Branch Prediction and Branch-Prediction Buffers
- 1-bit Prediction Scheme
- 2-bit Prediction Scheme
- What Kind of Accuracy Can Be Expected From A 2-bit Branch Prediction
- How Can We Improve the Accuracy of Branch Prediction?
- Correlating/Two-Level Predictors: Basic Idea
- (m, n) Predictors
- Comparison of 2-bit predictors
- Pipelining with Branch Prediction
- Penalties of Branch Misprediction
- Taking Advantage of More ILP with Multiple Issue
- Speculation
- The ARM Cortex-A8
- The Intel Core i7
- Exploiting Instruction-Level Parallelism with Software Approaches
Course Summary #6 (chapter6.pdf)
WIP