Deming Chen Lecture Presented
NATIONAL UNIVERSITY OF SINGAPORE
School of Computing
TITLE: "Reconfigurable Computing for High Performance"
SPEAKER: Prof. Deming Chen, ECE Department, University of Illinois at Urbana-Champaign
TIME: Thursday, March 25, 2010, 10:00-11:30 a.m.
VENUE: SR7, COM1, Level 2 Room 7, School of Computing, National University of Singapore
As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore's law, the computing industry has switched its route to higher performance through parallel processing. GPUs and FPGAs are becoming popular systems for speeding up computation-intensive kernels of scientific, imaging, and simulation applications. GPUs can execute hundreds of concurrent threads, while FPGAs provide customized concurrency for highly parallel kernels. However, exploiting the parallelism available in these applications is currently not a push-button task. Often the programmer has to expose the application's fine- and coarse-grained parallelism by using special programming languages. CUDA is such a parallel language that is driven by the GPU industry and is gaining significant popularity. In the first half of my talk, I will introduce a new FPGA design flow called FCUDA, which efficiently maps the coarse- and fine-grained parallelism exposed in CUDA onto the reconfigurable fabric. In the second half of the talk, I will introduce an FPGA implementation for fast face detection. Face detection is the cornerstone of a wide range of applications such as video surveillance, robotic vision, and biometric authentication. One of the biggest challenges in face-detection-based applications is the speed at which faces can be accurately detected. We implemented a novel FPGA solution for ultra-fast face detection in video or other image rich content. Our implementation is based on an efficient and robust algorithm that uses a cascade of Artificial Neural Network (ANN) classifiers on AdaBoost trained Haar features. Performance evaluations indicate that a speedup of around 100X can be achieved over the software implementation running on a 2.4GHz Core-2 Quad CPU. The detection speed reaches 625 frames per second.
Dr. Deming Chen obtained his BS in computer science from the University of Pittsburgh, Pennsylvania in 1995, and his MS and PhD in computer science from the University of California at Los Angeles in 2001 and 2005 respectively. He worked as a software engineer 1995-1999 and 2001-2002. He has been an assistant professor in the ECE department of the University of Illinois at Urbana-Champaign since 2005. He is a research assistant professor in the Coordinated Science Laboratory and an affiliate assistant professor in the CS department. His current research interests include nano-systems design and nano-centric CAD techniques, reconfigurable computing, FPGA synthesis and physical design, high-level synthesis, and microarchitecture and SoC design under parameter variation. He is a technical committee member for a series of conferences and symposia and a TPC subcommittee or CAD track chair for several conferences. He is an associated editor for TVLSI, TCAS-I, JCSC, and JOLPE. He obtained the Achievement Award for Excellent Teamwork from Aplus Design Technologies in 2001, the Arnold O. Beckman Research Award from UIUC in 2007, the NSF CAREER Award in 2008, the ASPDAC'09 Best Paper Award, and the SASP'09 Best Paper Award. He is included in the List of Teachers Ranked as Excellent in 2008.