2017/18
30237 - Multiprocessors
110 - Escuela de Ingeniería y Arquitectura
439 - Bachelor's Degree in Informatics Engineering
Compulsory
5.3. Syllabus
Module I: Pipelined Vector Processors: Supercomputers
1. Introduction. parallelism
+ Numerical scientific problems
+ Performance of an addition of vectors by scalar processors
- Pipelined, superpipelined, and superscalar
+ Vector version of the vector addition.
2. Vector Extension of a ld/st architecture
+ Architecture and Organization
+ Basic instruction set (DLXV)
+ Organizations and pipelining
- Vector register file
- Functional units (ALUs)
- Multibank memory (synchronous and concurrent access)
+ Five organizations of vector processor and basic pipelining
+ Performance measures without strip mining: Rn, R∞, N½, Nv
+ ZV processor organization: a pipelined vector processor supporting DLXV
3. Two aspects of programming: vector length and vector stride
+ Vector length and strip mining
+ Two schemes for strip mining code generation. AXPY example
+ Performance with strip mining:
- Assembler example AXPY
- Rn, R∞, N½, Nv when processing noncontiguous elements of a vector (stride)
4. Conflicts in accessing memory banks
+ Introduction. Storage scheme. Fundamental property.
+ Tight Systems
+ Loose Systems
5. DLXV architecture: full instruction set
6. Vector Compilation = automatic extraction of vector operations
+ Introduction
+ Previous transformations that simplify dependency analysis
+ Analysis of dependencies. Dependency graph. Approximate tests
+ Architecture independent optimizations: rename, scalar expansion, vector copy
+ Vectorization
- Basic Procedure. Full vs. partial vectorization: loop distribution and loop exchange. Reduction
7. Final Thoughts: Amdahl's Law
8. Commercial Vector Processors
+ Introduction
+ Table of Supercomputers
+ Family NEC SX-4 and SX-9 ACE (may change)
- Concept of partitioned data path
+ Vector Extensions Intel: from SSE to AVX512 (may change)
Module II: Shared Memory Multiprocessors
1. Classification of parallel computers from M.J. Flynn
+ SISD, SIMD and MIMD
2. Objectives and problems of the MIMD machines
3. Simple model of H.S. Stone to distribute processes in processors
4. Shared-memory multiprocessors. Overview
+ Architecture-Programming: communication, synchronization, process creation
+ Organization: caches, interconnection network, main memory
5. Interconnection Network
+ Conflict, degradation, topology, cost, circuit switching or packet switching , performance, availability
+ Dynamic Topologies (indirect networks): bus, multibus, crossbar, multi-stage networks
+ Static Topologies (direct network): star, ring, mesh, tree, hypercube
6. Synchronization Mechanisms
+ Instruction set: Test & Set, Fetch & Op, Load Linked
+ Implementation. Combination of requests
+ Barriers
7. Parallel Compilation
+ Automatic extraction of parallel tasks
8. The problem of consistency
+ System, multiprocessor, multi-level cache, more examples
+ Copy-back and write-through
9. The memory model
+ Sequential consistency, pros and cons
+ A definition of consistency
10. coherence protocols based on diffusion
+ Invalidation. Diffusion vs. selective shipping
+ Examples of invalidation + CB + Bus: MSI, EI, Write Once, MESI
+ Snoopy protocols
11. Hierarchy of multilevel caches
12. coherence protocols based on directory
+ Hw requirements and some sample transactions
+ Simple protocol directory
13. Examples of current chip with more than one processor (core)
+ SUN, Intel, AMD, ARM, ...