Neal Crago, Ph.D.
Senior Research Scientist, NVIDIA
ncrago{at}nvidia.com google scholar

Neal Crago is a Senior Research Scientist at NVIDIA Research. His research specializes in hardware/software co-design of data-parallel, spatial, and domain-specific computer architectures, with a focus on managing parallelism and data movement in the memory subsystem. Dr. Crago received his B.S., M.S., and Ph.D. in Electrical and Computer Engineering from the University of Illinois at Urbana-Champaign (UIUC).

Awards:

Publications:

  1. Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing“, Michael Pellauer, Jason Clemons, Vignesh Balaji, Neal Crago, Aamer Jaleel, Donghyuk Lee, Mike O’Connor, Angshuman Parashar, Sean Treichler, Po-An Tsai, Stephen W. Keckler, Joel S. Emer, ACM Transactions on Computing Systems (TOCS), October 2023. [pdf]
  2. Community-based Matrix Reordering for Sparse Linear Algebra Optimization“, Vignesh Balaji, Neal Crago, Aamer Jaleel, Stephen W Keckler, International Symposium on Performance Analysis of Systems and Software (ISPASS), April 2023. [pdf]
  3. Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling“, Toluwanimi O. Odemuyiwa, Hadi Asghari-Moghaddam, Michael Pellauer, Kartik Hegde, Po-An Tsai, Neal Crago, Aamer Jaleel, John D Owens, Edgar Solomonik, Joel S Emer, Christopher W Fletcher, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2023. [pdf]
  4. P-OPT: Practical Optimal Cache Replacement for Graph Analytics“, Vignesh Balaji, Neal Crago, Aamer Jaleel, Brandon Lucia, International Symposium on High-Performance Computer Architecture (HPCA), February 2021. (Best Paper Nominee) [pdf]
  5. ExTensor: An Accelerator for Sparse Tensor Algebra“, Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, Christopher W Fletcher, International Symposium on Microarchitecture (MICRO), October 2019. (IEEE MICRO Top Picks Honorable Mention) [pdf]
  6. Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration“, Michael Pellauer, Yakun Sophia Shao, Jason Clemons, Neal Crago, Kartik Hegde, Rangharajan Venkatesan, Stephen W Keckler, Christopher W. Fletcher, Joel Emer, International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), April 2019. (IEEE MICRO Top Picks Honorable Mention) [pdf]
  7. Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs“, Neal Crago, Mark Stephenson, Stephen W Keckler, ACM Transactions on Architecture and Code Optimization (TACO), October 2018. [pdf]
  8. Efficient Control and Communication Paradigms for Coarse-grained Spatial Architectures“, Michael Pellauer, Angshuman Parashar, Michael Adler, Bushra Ahsan, Randy Allmon, Neal Crago, Kermin Fleming, Mohit Gambhir, Aamer Jaleel, Tushar Krishna, Daniel Lustig, Stephen Maresh, Vladimir Pavlov, Rachid Rayess, Antonia Zhai, Joel Emer. ACM Transactions on Computer Systems (TOCS), September 2015. [pdf]
  9. Exploiting Spatial Architectures for Edit Distance Algorithms“, Jesmin Jahan Tithi, Neal Crago, Joel Emer. International Symposium on Performance Analysis of Systems and Software (ISPASS), March 2014. (Best Paper Nominee) [pdf]
  10. Triggered Instructions: A Control Paradigm for Spatially-programmed Architectures“, Angshuman Parashar, Michael Pellauer, Michael Adler, Bushra Ahsan, Neal Crago, Daniel Lustig, Vladimir Pavlov, Antonia Zhai, Mohit Gambhir, Aamer Jaleel, Randy Allmon, Rachid Rayess, Stephen Maresh, Joel Emer, International Symposium on Computer Architecture (ISCA), June 2013. (IEEE MICRO Top Picks Awardee) [pdf, pdf]
  11. Hybrid Latency Tolerance for Robust Energy-efficiency on 1000-core Data Parallel Processors“, Neal Crago, Omid Azizi, Steven S Lumetta, Sanjay J Patel, International Symposium on High Performance Computer Architecture (HPCA), February 2013. (Best Paper Nominee) [pdf]
  12. Decoupled Architectures as a Low-Complexity Alternative to Out-of-order Execution“, Neal Crago, Sanjay J Patel. International Conference on Parallel Architectures and Compilation Techniques (PACT), October 2011. [pdf]
  13. OUTRIDER: Efficient Memory Latency Tolerance with Decoupled Strands“, Neal Crago, Sanjay J Patel, International Symposium of Computer Architecture (ISCA), June 2011. [pdf]
  14. Rigel: An Architecture and Scalable Programming Interface for a 1000-core Accelerator“, John H Kelm, Daniel R Johnson, Matthew R Johnson, Neal Crago, William Tuohy, Aqeel Mahesri, Steven S Lumetta, Matthew I Frank, Sanjay J Patel, International Symposium of Computer Architecture (ISCA), June 2009. [pdf]
  15. Tradeoffs in Designing Accelerator Architectures for Visual Computing“, Aqeel Mahesri, Daniel Johnson, Neal Crago, Sanjay J Patel, International Symposium on Microarchitecture (MICRO), November 2008. [pdf]