Xiaodong Zhang (computer scientist)
From Wikipedia, the free encyclopedia
Xiaodong Zhang is a computer scientist and academic. He is a University Distinguished Scholar and Robert M. Critchfield Professor in Engineering[1][2] at The Ohio State University. His research focuses on data management in computer memory, storage, and distributed systems. Zhang is also a founding member of the Asian American Scholar Forum (AASF) and a member of its board of directors.[1][3]
Xiaodong Zhang | |
---|---|
Alma mater | Beijing University of Technology (BS) University of Colorado Boulder (MS, PhD) |
Known for | Interleaved memory LIRS and other caching methods RCFile and big data systems Hardware acceleration |
Awards | IEEE Fellow (2009) Distinguished Alumni Award, University of Colorado Boulder (2011) ACM Fellow (2012) ACM Microarchitecture Test of Time Award (2020) VLDB Endowment Test of Time Award (2024) IEEE Data Engineering Impact Award (2025) |
Scientific career | |
Fields | Data management Memory systems Storage systems High-performance computing |
Institutions | University of Texas at San Antonio Rice University College of William & Mary National Science Foundation The Ohio State University |
Doctoral advisor | Robert B. Schnabel |
Other academic advisors | Ralph J. Slutz |
Early life and education
Born in Beijing, China, Zhang attended the Beijing University of Technology and received a bachelor of science in electrical engineering in 1982. After working in the same university for about two years, He moved to the United States of America and went to The University of Colorado-Boulder, where he studied computer science and received a master of science in 1985, followed by a PhD in 1989, supervised by Robert B. Schnabel.
Career
Summarize
Perspective
In 1983, Zhang started to work as a research assistant in the COADS Project (now ICOADS) under physicist and computer architect Ralph J. Slutz (1917-2005), and completed his Master thesis in 1985. Later, in 2010, he endowed the Ralph J. Sultz Student Excellence Scholarship program in the Computer Science Department at the University of Colorado-Boulder, offering annual awards to selected students.[3]
Following his PhD in 1989, Zhang joined the University of Texas at San Antonio, where he served as an assistant professor and associate professor of computer science until 1997.[4] He was also a visiting scholar at the Center for Research on Parallel Computation (CRPC) at Rice University in Houston, Texas, from summer to the end of 1990. His visit was hosted by John E. Dennis for a research collaboration.
In 1997, Zhang joined the College of William and Mary, where he worked as the Lattie P. Evans Professor and the Chairman of the Computer Science Department until the end of 2005. During this time, he also served as a Program Director at the National Science Foundation, where he was in charge of evaluation and recommendation of research proposals for grants in the field of high-performance computing from 2001 to 2003.
After serving at the College of William and Mary, Zhang joined Ohio State University in 2006 as the Robert M. Critchfield Professor in Engineering and the Chairman of the Computer Science and Engineering Department. He served the Department Chair at the Ohio State University until 2018.[4]
Research advancement and technology impact
Summarize
Perspective
His research focuses primarily on data management in computer and distributed systems. With his students and collaborators, Zhang has published a list of papers on algorithms and their system implementations, which have been adopted in mainstream operating and database systems, as well as commercial processors, including Sun Microsystems, MySQL, BSD operating system, Fusion Drive by Apple, Geometric Performance Primitives (GPP) by Nvidia and others.
- In 2000, together with Zhao Zhang and Zhichun Zhu, they identified a structural issue for data transfers between CPU cache and DRAM memory in existing computer architectures. Specifically, a conflict miss in the CPU cache would inevitably lead to a row buffer miss in DRAM, resulting significant memory access delays. To address this problem, they proposed a permutation-based page interleaving method, which they presented and published in the International Symposium on Microarchitecture (MICRO). This method influenced the interleaved memory design and was quickly adopted by commercial computer products, first by Sun Microsystems, and later by AMD, Intel, and NVIDIA. Twenty years later, in 2020, the three authors were honored with the ACM Microarchitecture Test of Time Award [5][6] for their high impact work.
- In 2002, Song Jiang and Zhang published and presented their LIRS cache replacement algorithm in ACM SIGMETRICS Conference. The LIRS algorithm addressed the fundamental issues in the LRU replacement algorithm. The LIRS algorithm, LIRS-like, and its approximation Clock-Pro have been widely adopted in many data management production systems, including MySQL Database, H2 Database, Key-value databases of Cassandra, RocksDB, Memcached, in-memory data systems of GridGain (now Ignite), Infinispan, Cloudera Impala, Red Hat data grid, Spark in data repository systems of Apache Jackrabbit, and Red Hat virtualization system. The LIRS algorithm has also influenced the replacement algorithm implementation of operating systems, including Berkeley Software Distribution (BSD) and Linux. LIRS approximation Clock-Pro has been a part Rust Library, as an open system utility. Recently the LIRS concept has been used in cache block replacement in Intel CPU cache. The method is called Re-Reference Interval Prediction (RRIP)[7].
- In 2008, with Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, and P. Sadayappan, he published a paper on using operating system to allocate pages in the Last-Level-Cache (LLC) of multicore processors to avoid cache conflicts among different running processes. The published methods along with the open-source code in Linux, has been adopted by Intel[8].
- In 2011, with Rubao lee and Yin Huai at Ohio State, Namit Jain and Zheng Shao at Facebook, and Yongqiang He and Zhi-Wei Xu at the Chinese Academy of Sciences, he published and presented the paper of RCFile in IEEE International Conference on Data Engineering (ICDE), defining an effective data storage format for databases and for big data processing on large-scale distributed systems. RCFile and its optimized version Apache_ORC have been widely adopted in many data systems, including Apache Hive, Meta’s Data Lake, Cloudera’s Impala and Amazon Athena and S3. RCFile and ORC have also been adopted in commercial data systems including IBM, Microsoft, Oracle, SAS, Teradata, and others.
- In 2011, Rubao Lee, Tian Luo, Yin Huai, Fusheng Wang, Yongqiang He, and Zhang published and presented their paper, titled “YSmart: Yet another SQL-to-MapReduce translator” in the International Conference on Distributed Computing Systems (ICDCS). YSmart automatically converts SQL queries into MapReduce programs for execution. It is adopted by Apache Hive to help SQL users to automatically generate their MapReduce programs[9].
- In 2011, with Feng Chen and David Koufaty, he published a paper, titled “Hystor: making the best use of solid-state drives in high performance storage systems”, in ACM International Conference on Supercomputing (ICS). Hystor is a design and implementation in Linux for a hybrid storage of both hard disk drive (HDD) and solid-state drive (SSD), which influenced the Apple's hybrid storage product Fusion Drive.
- In 2012, with a group of researchers both at Ohio State and Emory University Medical School, the algorithm PixelBox and its GPU implementation in his paper on accelerating pathology image data processing was included in NVIDIA Developer[10] Geometric Performance Primitives.
- In 2013, with a group of researchers both at Ohio State and Emory University Medical School, he published paper, titled “Hadoop-GIS: a high-performance spatial data warehousing systems over MapReduce”, in the International Conference on Very Large Data Bases. Hadoop-GIS open-source software was released in 2011. This work initiated the development of a new spatial data analytical ecosystem characterized by its large-scale capacity in both computing and data storage, high scalability, compatibility with low-cost commodity processors in clusters and open-source software. After more than a decade of research and development, this ecosystem has matured and is now serving many applications across various fields. The authors of the Hadoop-GIS paper received the 2024 VLDB Endowment Test of Time Award[11][12].
A major theme of his work involves designing algorithms and systems for practical applications running in production systems and contributing to the development of computer systems.
Awards and honors
- Elected as an IEEE Fellow for his contributions to computer memory systems by the Institute of Electrical and Electronics Engineers (2009)[13]
- Received Distinguished Engineering and Applied Science Alumni Award, University of Colorado at Boulder (2011)[14]
- Elected as an ACM Fellow for his contributions to data and memory management in distributed systems by The Association for Computing Machinery (2012)[15]
- Received Joel and Ruth Spira Award for Excellence in Education Leadership by Lutron Foundation (2018)[4]
- Received Microarchitecture Test of Time Award by The Association for Computing Machinery (2020)[5][6]
- Received the Distinguished Scholar Award from Ohio State University (2023)[1]
- Received the 2024 Very Large Data Bases (VLDB) Endowment Test of Time Award[11][12]
- Received the 2025 IEEE Data Engineering Impact Award by Computer Society Technical Committee on Data Engineering (TCDE)[16][17]
Selected works
- A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit locality;Proceedings of the 33rd Annual International Symposium on Microarchitecture (Micro-33); Z. Zhang, Z. Zhu, X Zhang; 2000
- LIRS: and efficient low inter-reference recency set replacement to improve buffere cache performance; Proceedings of the 2002 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2002); S. Jiang and X Zhang; 2002
- Clock-Pro: an effective improvement of the Clock replacement; Proceedings of USENIX Annual Technical Conference (ATC 2005); S. Jiang, F. Chen, X. Ding, X Zhang; 2005;
- Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems; Proceedings of the 14th International Symposium on High Performance Computer Architecture (HPCA-14); J. Lin, Q. Lu, X. Ding, Z. Zhang, X Zhang, P. Sadayappan; 2008
- RCFile a fast and space-efficient data placement structure in MapReduce-based warehouse systems; Proceedings of International Conference on Data Engineering (ICDE 2011); Y. He, R. Lee, Y. Huai, Z. Shao, N. Jain, X. Zhang, Z. Xu; 2011
- Hystor: making the best usage of solid state drives in high performance storage systems; Proceedings of 25th ACM International Conference on Supercomputing (ICS 2011); F. Chen, D. Koufaty, X Zhang; 2011
- YSmart: Yet another SQL-to-MapReduce Translator; Proceedings of 31st International Conference on Distributed Computing Systems (ICDCS 2011); R Lee, T. Luo, Y. Huai, F. Wang, Y. He, X Zhang; 2011
- Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems; Proceedings of the VLDB Endowment, Vol. 5, Issue 11; K. Wang, Y. Huai, R. Lee, F. Wang, X Zhang, J. Saltz; 2012
- Hadoop-GIS: a high performance spatial data ware housing system over MapReduce; Proceedings of the VLDB Endowment, Vol. 6, Issue 11; A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, J. Saltz; 2013
- LDPC-in-SSD: making advanced error correction codes work effectively in Solid State Drives; Proceedings of 11th USENIX Conference on File and Storage Technologies (FAST'13); K. Zhao, W. Zhao, H. Sun, X Zhang, N. Zheng, T. Zhang; 2013
- Yin and Yang of processing data warehousing queries on GPU devices; Proceedings of the VLDB Endowment, Vol. 6, Issue 10; Y. Yuan, R Lee, X Zhang; 2013
- Data Management: Interactions with Computer Architecture and Systems; Cambridge Press, X. Zhang, R. Lee, 2024
References
Wikiwand - on
Seamless Wikipedia browsing. On steroids.