PANEL MODERATOR & SPEAKER



Guang Gao

Guang Gao

ACM Fellow and IEEE Fellow,
Endowed Distinguished Professor
University of Delaware, USA
And
A Founder and Chair
IEEE/CS Dataflow STC (Special Technology Community)

Biography

PANELISTS


Hong An

Hong An

Professor,
Director of Advanced Computer System Architecture Laboratory,
School of Computer Science and Technology,
University of Science and Technology of China
Anhui, China

Biography

Title: A Dataflow-based Runtime Support Implementation on a 100P Actual System

Abstract. The high availability of a 100P or more actual system such as Sunway TaihuLight for computing science and engineering applications keeps to be very attractive to the supercomputing community, since obtaining peak performances on irregular applications such as computer algebra problems remains a challenging problem. In this short talk, we will introduce our preliminary work of the dataflow-based runtime support implementation on Sunway TaihuLight, to exploit with great efficiency the computation resources of a 100P actual system.





Kemal Ebcioglu

Kemal Ebcioglu

PhD,
President, Global Supercomputing Corporation, USA

Biography

Title: Toward cloud center architectures for achieving performance in the limit

Abstract. The dataflow computing model introduced an elegant and highly influential parallel alternative to the Von Neumann sequential computing model. But the dataflow approach introduced new programming models which are fundamentally incompatible with Von Neumann's sequential single-threaded instruction execution. Furthemore, there is a misconception that in the sequential Von Neumann computing model (much criticized by the dataflow community) instructions have to be executed sequentially. While there exist proofs of non-achievability of optimal theoretical performance in corner cases, in reality there is nothing that prevents a modern Von Neumann computer from executing an application within a time period which is little more than the critical path length of the entire execution trace of the application plus speed of light delays, while remaining fully compatible with the Von Neumann sequential execution model. The exascale, very large cloud centers of the near future comprising millions of FPGAs and ASICs will provide the infrastructure for enabling such performance, using application-specific, customized hardware compiled from a sequential application.





Dongrui Fan

Dongrui Fan

Professor,
Director of High Throughput Computer Research Center,
Institute of Computing Technology,
Chinese Academy of Sciences (ICT, CAS)
Beijing, China

Biography

Title: SmarCo – Design High-end Processor with Dataflow Execution Model

Abstract. Dataflow architecture are becoming important role in high-end computing. In this short talk, we will present a feasible design method of dataflow processors and our newborn typical dataflow processor – SmarCo, which proves the outstanding efficiency of dataflow execution model with higher performance.





Michael Gschwind

Michael Gschwind

PhD,
Chief Engineer, Machine and Deep Learning,
IBM Corp, USA

Biography

Title: High Performance Computing is the foundation for creating value from Big Data with AI

Abstract. The emergence of Deep Artificial Neural Networks (DNNs) is revolutionizing information technology with an emphasis on extracting information from massive data corpora. Deep Learning is the process training of a DNN and is a highly numerically intensive operation with an emphasis on a small number of computational kernels that are well known in the high-performance computing community, such as generalized matrix/matrix multiplication and other dense stencil computations. In 2016, IBM introduced the new S822LC for HPC server designed to deliver unprecedented performance for both Artificial Intelligence as well as traditional High-Performance Computing (HPC) workloads.. With its high-performance NVlink connection, the S822LC for HPC server offers a sweet spot of scalability, performance and efficiency for Deep Learning applications. The next generation S822LC for HPC systems combine the balanced high-performance Power server design with four high-performance P100 GPUs which exploit dataflow principles to maximize throughput by scheduling groups of computational threads based on operand availability to hide latency and deliver peak performance. The GPUs are connected via NVlink for enhanced peer-to-peer GPU multiprocessing, and CPU-GPU NVlink for enhanced performance and programmability.
    Since its introduction in 2016, these accelerator-based server designs have demonstrated the benefits of numeric accelerators first introduced in the IBM RoadRunner supercomputer system based on IBM's Cell BE design. In 2016, we demonstrated training one of the most common DNNs, Alexnet, on the full Imagenet 2012 dataset (a de-facto industry standard training corpus) setting a new industry record with the first training time of under an hour using a single a single S822LC for HPC server. In 2017, we demonstrated training of the most complex DNNs in use today in a cluster configuration with 64 server nodes, achieving a training time of just 7 hours for the ResNet-101 neural network model. as well as achieving record image recognition accuracy of 33.8% on 7.5M images from the ImageNet-22k dataset compared to the previous best published result of 29.8%. We also achieved a record in fastest absolute training time of 50 minutes by training the ResNet-50 model with ImageNet-1K dataset.





Aaron Smith

Aaron Smith

Principle Research Manager,
Microsoft, USA

Biography

Title: From Datacenters to Client Devices – How Microsoft is Preparing for the End of Moore's Law

Abstract. In this talk I will discuss two projects at Microsoft that deal with the end of Moore’s Law and silicon scaling. Project Catapult uses reconfigurable computing to accelerate datacenter services such as Bing search and Azure networking in Microsoft datacenters. Project E2 is a next-generation Explicit Data Graph Execution (EDGE) architecture that utilizes a hybrid von Neuman dataflow model to overcome the limitations of traditional CISC/RISC instruction set architectures.





Greg Wright

Greg Wright

Director, Engineering,
Head of Qualcomm Research Raleigh, North Carolina, USA
And
Leader of Processor Research

Biography

Title: Practical Dataflow – von Neumann Hybrid Architecture

Abstract. In this short talk, we will consider the relative strengths and weaknesses of the dataflow and conventional von Neumann models, and how they can be combined to obtain the best features of each for high performance and power efficiency, while maintaining compatibility with existing high level software stacks and programming models.






HOME TOP