Tuesday March 1, 2016 from 9:00 AM to 4:00 PM CST
Add to Calendar 


Rice University 
Bioscience Research Collaborative
Room 280 - 2nd Floor Lecture Hall
6500 Main Street
Houston, TX 77005

Driving Directions 


Marissa Rogers 
Plan 365 Inc. 

Parallel Programming and Optimization with
Intel® Xeon® and Xeon Phi™ Platforms
Developer Training Event
Houston, TX

Event Details: Hybrid Course

Tuesday, March 1, 2016

Rice University
Bioscience Research Collaborative
2nd Floor Lecture Hall
Room 280
6500 Main Street
Houston, TX 77005

Click here to see where it is.
Parking: Underground garage parking located below Biosciences building. 
Rate for BRC Garage is $11 daily maximum
More information on BRC parking can be found here


This class is very popular.  If it is closed due to capacity,  please sign-up here.  If we get a cancellation, you will be moved into the class - first come, first served.  You will receive an email notification if you get moved to the class.

*Please bring your own laptop to the CDT 102 training, below is a list of the necessary specifications:

  • Windows (XP or newer), Mac OS X (10.5 or later), or Linux (something from the 21st century)
  • Wired (Ethernet) and wireless (Wi-Fi 802.11g or later) network connectivity
  • Web Browser (any)
  • On Windows: Putty and Pageant (www.putty.org) and WinSCP (www.winscp.net)
  • On Mac OS X and Linux: ssh client
  • Optional: on all operating systems, the free software NoMachine (www.nomachine.com). This is only necessary if you are not comfortable programming in Linux in a text terminal over an SSH connection. System requirements and installation instructions for NoMachine can be found here

This one-day training features presentations and hands-on exercises on the available programming models and best optimization practices for the Intel Xeon Phi coprocessors, and on the usage of the Intel software development and diagnostic tools. 

  • Offload and Native:  "Hello World" to complex, using MPI.
  • Case Study:  All aspects of tuning in the N-body calculation.
  • Optimization I:  Strip-mining for vectorization, parallel reduction.
  • Optimization II:  Loop tiling, thread affinity.
  • Intel Xeon Phi architecture: purpose, organization, prerequisites for good performance, future technology
  • Programming models: native, offload, heterogeneous clustering 
  • Parallel frameworks: automatic vectorization, OpenMP, MPI
  • Optimization methods: general, scalar math, vectorization, multithreading, memory access, communication and special topics

Seminar abstract can be found here

Labs abstract can be found here