This is the webpage of the course “Algorithms and Parallel Computing” that will be held at Politecnico di Milano from October 2016 to January 2017.
This is a 10 credits course for graduate students.
Danilo Ardagna, firstname.lastname@example.org
Eugenio Gianniti, email@example.com
Mehrnoosh Askarpour, firstname.lastname@example.org
Historically, parallel computing has been considered to be the high-end of computing and has been used to model difficult problems in many areas of science and engineering. Today, commercial applications provide an equal or greater driving force in the development of faster programs. These applications require the processing of large amounts of data in sophisticated ways. Data-intensive applications such as data mining, recommender systems, financial modelling and multimedia processing have implications on the design of algorithms and provide a new challenge for the modern generation of computing platforms. Parallel processing is the only cost-effective method for the fast solution of these big-data problems. The emergence of inexpensive parallel computers such as commodity desktop multiprocessors, graphic processors, and clusters of PCs has made parallel methods generally applicable, as have software standards for portable parallel programming.
This course provides the students with all the skills necessary to write efficient algorithms, able to solve large-scale problems on parallel computers. The emphasis is on teaching concepts applicable across a wide variety of problem domains, and transferable across a broad set of computer architectures.
The course is structured in four parts.
- The first part of the course covers modern object-oriented programming (OOP) and introduces the fundamentals of the C++11 programming language. C++11 is used as the reference language through the rest of the course. This part is mainly hands-on, and students gain experience in designing simple but powerful object-oriented applications and in writing code using the C++11 language. Example problems covers both traditional computer science algorithms (sorting, searching, lists) as well as simple scientific computing algorithms (matrix computations, gradient descent).
- The second part covers data-intensive algorithms for information retrieval and data-mining problems and will focus on Spark, the new open source framework for in memory big-data computations, which includes also an extensive machine learning library.
- The third part covers the main aspects of parallel computing: parallel architectures, programming paradigms, parallel algorithms. Parallel architectures range from inexpensive commodity multi-core desktops, to general-purpose graphic processors, to clusters of computers, to massively parallel computers containing tens of thousands of processors. Students learn how to analyse and classify these architectures in terms of their components (processor architecture, memory organisation, and interconnection network). Pros and cons of different parallel programming paradigms (e.g., functional programming, shared memory, message passing) are evaluated by means of simple case studies.
- The fourth part of the course introduces MPI, one of the most widely used standards for writing portable parallel programs. This part includes a significant programming component in which students program concrete examples from big-data domains such as data mining, information retrieval, machine learning and operations research.
Prerequisite is the knowledge of a programming language, preferentially C.
Written test and possibly a project, only for students who got at least 27 in the written test.
- Object Oriented Programming Introduction & C++
- C++ vs. C
- Pointers and References
- Classes & Constructors
- Bash shell, compiling & linking
- Destructor. Copy constructor
- Inheritance & Polymorphism
- Smart pointers
- Overloading method and operators
- Template functions and template classes
- Complexity of Algorithms
- Big Data & Spark
- Functional programming
- Map Reduce & Spark
- Introduction to Python
- SQL & Spark Dataframe
- Apache Zeppelin
- Microsoft Azure & HDInsight
- IBM Bluemix & InfoSphere
- Parallel Programming
- Introduction to Parallel Programming
- Flynn’s taxonomy
- Performance and speedup
- Introduction to MPI
- MPI data types
- Point-to-point and collective communication
- Examples of algorithm parallelisation