Evaluation Scheme for NoC-based CMP with Integrated Processor Management System
Divisions of PAS
<jats:title>Evaluation Scheme for NoC-based CMP with Integrated Processor Management System</jats:title> <jats:p>With the opportunities and benefits offered by Chip Multiprocessors (CMPs), there are many challenges that need to be addressed in order to exploit the full potential of CMPs. Such aspects as parallel programs, interconnection design, cache arrangement and on-chip cores allocation become a limiting factor. To ensure validity of approaches and research, we propose an evaluation system for CMPs with Network-on-Chip (NoC) and processor management system integrated on one die. The suggested experimentation system is described in details. The proposed system that is used for tests and results of the experiments are presented and discussed. As decision making criteria, we consider energy efficiency of Processor Allocator (PA) and NoC, as well as NoC traffic characteristic (load balance). In order to improve the system understanding, brief overview on most important NoC and PA architectures is also presented. Analyzed results reveal that CMP with a PA controlled by IFF allocation algorithm for mesh systems and torus-based NoC driven by DORLB routing with express-virtual-channel flow control achieved the best traffic balance and energy characteristic.</jats:p>
ISSN 2081-8491 (until 2012) ; eISSN 2300-1933 (since 2013)
Ababneh I. (2006), An efficient free-list submesh allocation scheme for two-dimensional mesh-connected multicomputers, Journal of Systems and Software, 79, 8, 1168, doi.org/10.1016/j.jss.2006.01.019 ; Balfour J. (2006), Design tradeoffs for tiled CMP onchip networks, null, 187. ; T. Bjerregaard, "The mango clockless network-on-chip: Concepts and implementation," Ph.D. dissertation, Technical University of Denmark, 2005. ; Boura Y. (1994), Efficient fully adaptive wormhole routing in n-dimensional meshes, null, 589. ; S. Bourduas, "Modeling, evaluation, and implementation of ring-based interconnects for network-on-chip," Ph.D. dissertation, McGill University, 2008. ; Chmaj G. (2004), Comparison of task allocation algorithms for mesh-structured systems, null, 39. ; Dally W. (1990), Performance analysis of k-ary n-cube interconnection networks, IEEE Transaction on Computers, 39, 6, 775, doi.org/10.1109/12.53599 ; Dally W. (1992), Virtual-channel flow control, IEEE Transaction on Parallel and Distributed Systems, 3, 2, 194, doi.org/10.1109/71.127260 ; Dally W. (1986), The torus routing chip, Journal of Distributed Computing, 1, 4, 187, doi.org/10.1007/BF01660031 ; Dally W. (1987), Deadlock-free message routing in multiprocessor interconnection networks, IEEE Transactions on Computers, 36, 5, 547, doi.org/10.1109/TC.1987.1676939 ; Dally W. (2001), Route packets, not wires: On-chip interconnection networks, null, 684. ; Dally W. (2004), Principles and Practices of Interconnection Networks. ; Duato J. (2003), Interconnection Networks. ; D. N. Jayasimha, B. Zafar, and Y. Hoskote, "On-chip interconnection networks: Why they are different and how to compare them," Intel, Tech. Rep., 2006. ; N. K. Kavaldjiev, "A run-time reconfigurable network-on-chip for streaming DSP applications," Ph.D. dissertation, University of Twente, 2007. ; Krueger P. (1994), Job scheduling is more important than processor allocation for hypercube computers, IEEE Transactions on Parallel and Distributed Systems, 5, 5, 488, doi.org/10.1109/71.282559 ; Kumar A. (2007), Express virtual channels: Towards the ideal interconnection fabric, ACM SIGARCH Computer Architecture News, 35, 2, 150. ; Mohapatra P. (1993), A lazy scheduling for improving hypercube performance, Proceedings of the 1993 International Conference on Parallel Processing (ICPP '93), 1, 110, doi.org/10.1109/ICPP.1993.26 ; Rose C. (2007), Distributed dynamic processor allocation for multicomputers, Parallel Computing, 33, 3, 145, doi.org/10.1016/j.parco.2006.11.010 ; E. Salminen, A. Kulmala, and T. D. Hamalainen, "Survey of network-on-chip proposals," in <i>White Paper, OCP-IP</i>, 2008, pp. 1-13. ; Su C. (1993), Adaptive deadlock-free routing in multicomputers using only one extra virtual channel, Proceedings of the 1993 International Conference on Parallel Processing, 1, 227. ; Taylor M. (2002), The raw microprocessor: A computational fabric for software circuits and general-purpose programs, IEEE Micro, 22, 2, 25, doi.org/10.1109/MM.2002.997877 ; Upadhyay J. (1997), A traffic-balanced adaptive wormhole routing scheme for two-dimensional meshes, IEEE Transactions on Computers, 46, 2, 190, doi.org/10.1109/12.565594 ; Valiant L. (1981), Universal schemes for parallel communication, null, 263. ; D. Wiklund, "Development and performance evaluation of networks on chip," Ph.D. dissertation, Linkoping University, 2005. ; Yoo B. (2002), A fast and efficient processor allocation scheme for mesh-connected multicomputers, IEEE Transaction on Computers, 51, 1, 46, doi.org/10.1109/12.980016 ; Zhu Y. (1992), Efficient processor allocation strategies for mesh-connected parallel computers, Journal of Parallel and Distributed Computing, 16, 4, 328, doi.org/10.1016/0743-7315(92)90016-G ; Zydek D. (2011), Fast and efficient processor allocation algorithm for torus-based chip multiprocessors, Journal of Computers & Electrical Engineering, 37, 1, 91, doi.org/10.1016/j.compeleceng.2010.10.001 ; Zydek D. (2009), Processor allocation problem for NoC-based chip multiprocessors, null, 96, doi.org/10.1109/ITNG.2009.182 ; Zydek D. (2010), Hardware implementation of processor allocation schemes for mesh-based chip multiprocessors, Journal of Microprocessors and Microsystems, 34, 1, 39, doi.org/10.1016/j.micpro.2009.11.003 ; Zydek D. (2010), Energy characteristic of processor allocator and network-on-chip, Journal of Applied Mathematics and Computer Science, doi.org/10.2478/v10006-011-0029-7 ; Zydek D. (2010), Synthesis of processor allocator for torus-based chip multiprocessors, null, 13, doi.org/10.1109/ITNG.2010.145 ; Zydek D. (2008), Review of packet switching technologies for future NoC, null, 306.