Search results

Filters

  • Journals
  • Authors
  • Keywords
  • Date
  • Type

Search results

Number of results: 4
items per page: 25 50 75
Sort by:
Download PDF Download RIS Download Bibtex

Abstract

This paper presents an alternative approach to the sequential data classification, based on traditional machine learning algorithms (neural networks, principal component analysis, multivariate Gaussian anomaly detector) and finding the shortest path in a directed acyclic graph, using A* algorithm with a regression-based heuristic. Palm gestures were used as an example of the sequential data and a quadrocopter was the controlled object. The study includes creation of a conceptual model and practical construction of a system using the GPU to ensure the realtime operation. The results present the classification accuracy of chosen gestures and comparison of the computation time between the CPU- and GPU-based solutions.

Go to article

Authors and Affiliations

Marek Wodziński
Aleksandra Krzyżanowska
Download PDF Download RIS Download Bibtex

Abstract

The paper presents the results of work leading to the construction of a spatial hybrid model based on finite element (FE) and Monte Carlo (MC) methods allowing the computer simulation of physical phenomena accompanying the steel sample testing at temperatures that are characteristic for soft-reduction process. The proposed solution includes local density variations at the level of mechanical solution (the incompressibility condition was replaced with the condition of mass conservation), and at the same time simulates the grain growth in a comprehensive resistance heating process combined with a local remelting followed by free/controlled cooling of the sample tested. Simulation of grain growth in the entire computing domain would not be possible without the support of GPU processors. There was a 59-fold increase in the computing speed on the GPU compared to single-threaded computing on the CPU. The study was complemented by examples of experimental and computer simulation results, showing the correctness of the adopted model assumptions.
Go to article

Authors and Affiliations

Marcin Hojny
Tomasz Dębiński
ORCID: ORCID

Download PDF Download RIS Download Bibtex

Abstract

We report on the first application of the graphics processing units (GPUs) accelerated computing technology to improve performance of numerical methods used for the optical characterization of evaporating microdroplets. Single microdroplets of various liquids with different volatility and molecular weight (glycerine, glycols, water, etc.), as well as mixtures of liquids and diverse suspensions evaporate inside the electrodynamic trap under the chosen temperature and composition of atmosphere. The series of scattering patterns recorded from the evaporating microdroplets are processed by fitting complete Mie theory predictions with gradientless lookup table method. We showed that computations on GPUs can be effectively applied to inverse scattering problems. In particular, our technique accelerated calculations of the Mie scattering theory on a single-core processor in a Matlab environment over 800 times and almost 100 times comparing to the corresponding code in C language. Additionally, we overcame problems of the time-consuming data post-processing when some of the parameters (particularly the refractive index) of an investigated liquid are uncertain. Our program allows us to track the parameters characterizing the evaporating droplet nearly simultaneously with the progress of evaporation.

Go to article

Authors and Affiliations

D. Jakubczyk
S. Migacz
G. Derkachov
M. Woźniak
J. Archer
K. Kolwas
Download PDF Download RIS Download Bibtex

Abstract

Parallel realizations of discrete transforms (DTs) computation algorithms (DTCAs) performed on graphics processing units (GPUs) play a significant role in many modern data processing methods utilized in numerous areas of human activity. In this paper the authors propose a novel execution time prediction model, which allows for accurate and rapid estimation of execution times of various kinds of structurally different DTCAs performed on GPUs of distinct architectures, without the necessity of conducting the actual experiments on physical hardware. The model can serve as a guide for the system analyst in making the optimal choice of the GPU hardware solution for a given computational task involving particular DT calculation, or can help in choosing the best appropriate parallel implementation of the selected DT, given the limitations imposed by available hardware. Restricting the model to exhaustively adhere only to the key common features of DTCAs enables the authors to significantly simplify its structure, leading consequently to its design as a hybrid, analytically–simulational method, exploiting jointly the main advantages of both of the mentioned techniques, namely: time-effectiveness and high prediction accuracy, while, at the same time, causing mutual elimination of the major weaknesses of both of the specified approaches within the proposed solution. The model is validated experimentally on two structurally different parallel methods of discrete wavelet transform (DWT) computation, i.e. the direct convolutionbased and lattice structure-based schemes, by comparing its prediction results with the actual measurements taken for 6 different graphics cards, representing a fairly broad spectrum of GPUs compute architectures. Experimental results reveal the overall average execution time and prediction accuracy of the model to be at a level of 97.2%, with global maximum prediction error of 14.5%, recorded throughout all the conducted experiments, maintaining at the same time high average evaluation speed of 3.5 ms for single simulation duration. The results facilitate inferring the model generality and possibility of extrapolation to other DTCAs and different GPU architectures, which along with the proposed model straightforwardness, time-effectiveness and ease of practical application, makes it, in the authors’ opinion, a very interesting alternative to the related existing solutions.
Go to article

Bibliography

  1.  U.N. Ahmed and K.R. Rao, Orthogonal Transforms for Digital Signal Process. Secaucus, NJ, USA: Springer-Verlag, New York, Inc., 1974.
  2.  Y. Su and Z. Xu, “Parallel implementation of wavelet-based image denoising on programmable pc-grade graphics hardware,” Signal Process., vol. 90, pp. 2396–2411, 2010, doi: 10.1016/j.sigpro.2009.06.019.
  3.  P. Lipinski and D. Puchala, “Digital image watermarking using fast parametric transforms,” Bull. Pol. Acad. Sci. Tech. Sci., vol. 67, pp. 463–477, 2019.
  4.  K.R. Rao and P. Yip, Discrete cosine transform: algorithms, advantages, applications. San Diego, CA, USA: Academic Press Professional, Inc., 1990.
  5.  D. Salomon, A Guide to Data Compression Methods. New York: Springer-Verlag
  6. D. Puchala and M. Yatsymirskyy, “Joint compression and encryption of visual data using orthogonal parametric transforms,” Bull. Pol. Acad. Sci. Tech. Sci., vol. 64, pp. 373–382, 2016.
  7.  M. Akay, Time Frequency and Wavelets in Biomedical Signal Process., ser. IEEE Press Series in Biomed. Eng. Wiley-IEEE Press, 1998.
  8.  S. Babichev, J. Skvor, J. Fiser, and V. Lytvynenko, “Technology of gene expression profiles filtering based on wavelet analysis,” Int. J. Intell. Syst. Appl., vol. 10, pp. 1–7, 2018.
  9.  Z. Jakovljevic, R. Puzovic, and M. Pajic, “Recognition of planar segments in point cloud based on wavelet transform,” IEEE Trans. Ind. Inf., vol. 11, no. 2, pp. 342–352, 2015.
  10.  J. Cheng, M. Grossman, and T. McKercher, Professional CUDA C Programming. Indianapolis, IN 46256: John Wiley & Sons, Inc., 2014.
  11.  J. Sanders and E. Kandrot, CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional, 2010.
  12.  G. Barlas, Multicore and GPU Programming: An Integrated Approach. Morgan Kaufmann Publishers, 2015.
  13.  K. Stokfiszewski and K. Wieloch, “ Time effectiveness optimization of cross correlation methods for OCR systems with the use of graphics processing units,” J. Appl. Comput. Sci., vol. 23, no. 2, pp. 79–100, 2015.
  14.  A. Wojciechowski and T. Gałaj, “GPU supported dual quaternions based skinning,” in Computer Game Innovations. A. Wojciechowski, P. Napieralski (Eds.), Lodz University of Technology Press, 2016, pp. 5–23.
  15.  M. Wawrzonowski, D. Szajerman, M. Daszuta, and P. Napieralski, “Mobile devices’ GPUs in cloth dynamics simulation,” in Proceedings of the Federated Conference on Computer Science and Information Systems. M. Ganzha, L. Maciaszek, M. Paprzycki (Eds.), 2017, pp. 1283–1290.
  16.  D. Puchala, K. Stokfiszewski, B. Szczepaniak, and M. Yatsymirskyy, “Effectiveness of fast fourier transform implementations on GPU and CPU,” Przegla˛d Elektrotechniczny, vol. 92, no. 7, pp. 69–71, 2016.
  17.  K. Stokfiszewski, K. Wieloch, and M. Yatsymirskyy, “The fast Fourier transform partitioning scheme for GPU’s computation effectiveness improvement,” in Advances in Intelligent Systems and Computing II (CSIT), N. Shakhovska and V. Stepashko (Eds.), Springer, Cham, 2017, vol. 689, no. 1, pp. 511–522.
  18.  B.H.H. Juurlink and H.A.G. Wijshoff, “A quantitive comparison of parallel computation models,” ACM Trans. Comput. Syst., vol. 16, no. 3, pp. 271–318, 1988.
  19.  S.G. Akl, Parallel computation. Models and methods. Upple Saddle River, NJ: Prentice Hall, 1997.
  20.  A. Madougou, S. Varbanescu, C. Laat, and R. van Nieuwpoort, “The landscape of GPGPU performance modeling tools,” Parallel Comput., vol. 56, pp. 18–33, 2016.
  21.  H. Sunpyo and K. Hyesoon, “An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness,” ACM SIGARCH Comput. Architect. News, vol. 37, pp. 152–163, 2009.
  22.  C. Luo and R. Suda, “An execution time prediction analytical model for GPU with instruction-level and thread-level parallelism awareness,” IPSJ SIG Tech. Rep., vol. 2011-HPC-130, no. 19, pp. 1–9, 2011.
  23.  M. Amaris, D. Cordeiro, A. Goldman, and R.Y. de Camargo, “A simple BSP-based model to predict execution time in GPU applications,” in Proc. IEEE 22nd International Conference on High Performance Computing (HiPC), 2015, pp. 285–294.
  24.  L. Ma, R.D. Chamberlain, and K. Agrawal, “Performance modeling for highly-threaded many-core GPUs,” in Proc. IEEE 25th International Conference on Application-Specific Systems, Arch’s and Processors, 2014, pp. 84–91.
  25.  K. Kothapalli, R. Mukherjee, M.S. Rehman, S. Patidar, P.J. Narayanan, and K. Srinathan, “A performance prediction model for the CUDA GPGPU platform,” in Proc. International Conference on High Performance Computing (HiPC), 2009, pp. 463–472.
  26.  M. Amaris, R.Y. de Camargo, M. Dyab, A. Goldman, and D. Trystram, “A comparison of GPU execution time prediction using machine learning and analytical modeling,” in Proc. 15th IEEE International Symposium on Network Computing and Applications (NCA), 2016, pp. 326–333.
  27.  A. Karami, S.A. Mirsoleimani, and F. Khunjush, “A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs,” in Proc. 17th CSI Int. Symposium on Computer Architecture & Digital Systems (CADS), 2013, pp. 15–22.
  28.  A. Kerr, E. Anger, G. Hendry, and S. Yalamanchili, “Eiger: A framework for the automated synthesis of statistical performance models,” in Proc. 19th Int. Conference on High Performance Computing, 2012, pp. 1–6.
  29.  Y. Zhang, Y. Hu, B. Li, and L. Peng, “Performance and power analysis of ATI GPU: A statistical approach,” in Proc. 6th IEEE International Conference on Networking, Architecture, and Storage, 2011, pp. 149–158.
  30.  G. Wu, J.L. Greathouse, A. Lyashevsky, N. Jayasena, and D. Chiou, “GPGPU performance and power estimation using machine learning,” in Proc. 21st IEEE Int. Symposium on High Performance Computer Architecture (HPCA), 2015, pp. 564– 576.
  31.  E. Ipek, B. Supinski, M. Schulz, and S. McKee, “An approach to performance prediction for parallel applications,” in Proc. 11th International Euro-Par Conference on Parallel Processing, 2005, pp. 196–205.
  32.  N. Ardalani, C. Lestourgeon, K. Sankaralingam, and X. Zhu, “Cross architecture performance prediction (XAPP) using CPU code to predict GPU performance,” in Proc. 48th Annual IEEE/ ACM International Symposium on Microarchitecture (MICRO), 2015, pp. 725–737.
  33.  “GPGPU-Sim project.” [Online]. Available: http://www.gpgpu-sim.org.
  34.  A. Bakhoda, W.L. Fung, H. Wong, and G.L. Yuan, “Analyzing CUDA workloads using a detailed GPU simulator,” in Proc. ISPASS International Symposium on Performance Analysis of Systems and Software, 2009, pp. 163–174.
  35.  “GPUSimPow – AES LPGPU Group Power Simulation Project.” [Online]. Available: https://www.aes.tu-berlin.de/menue/forschung/projekte/ gpusimpow_simulator/.
  36.  Z. Yu, L. Eeckhout, N. Goswami, T. Li, L.K. John, H. Jin, C. Xu, and J. Wu, “Accelerating GPGPU micro-architecture simulation,” IEEE Trans. Comput., vol. 64, no. 11, pp. 3153–3166, 2015.
  37.  R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli, “Multi2Sim: a simulation framework for CPU-GPU computing,” in Proc. 21st International Conf. on Parallel Architectures and Compilation Techniques (PACT), 2012, pp. 335–344.
  38.  G. Malhotra, S. Goel, and S. Sarangi, “GpuTejas: a parallel simulator for GPU architectures,” in Proc. 21st International Conference on High Performance Computing, 2014, pp. 1–10.
  39.  Y. Arafa, A.A. Badawy, G. Chennupati, N. Santhi, and S. Eidenbenz, “PPT-GPU: Scalable GPU performance modeling,” IEEE Comput. Archit. Lett., vol. 18, no. 1, pp. 55–58, 2019.
  40.  X. Wang, K. Huang, A. Knoll, and X. Qian, “A hybrid framework for fast and accurate GPU performance estimation through source-level analysis and trace-based simulation,” in Proc. IEEE International Symposium on High Performance Computer Architecture (HPCA), 2019, pp. 506–518.
  41.  K. Punniyamurthy, B. Boroujerdian, and A. Gerstlauer, “GATSim: Abstract timing simulation of GPUs,” in Proc. Design, Automation & Test, Europe Conf. & Exhibition (DATE), 2017, pp. 43–48.
  42.  M. Khairy, Z. Shen, T.M. Aamodt, and T.G. Rogers, “AccelSim: An extensible simulation framework for validated GPU modeling,” in Proc. 47th IEEE/ACM Int. Symposium on Computer Architecture (ISCA), 2020, pp. 473–486.
  43.  S. Collange, M. Daumas, D. Defour, and D. Parello, “Barra: A parallel functional simulator for GPGPU,” in Proc. IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2010, pp. 351–360.
  44.  “GPU Ocelot project: a dynamic compilation framework for GPU computing.” [Online]. Available: http://www.gpuocelot.gatech.edu/
  45.  J. Power, J. Hestness, M.S. Orr, M.D. Hill, and D.A. Wood, “gem5-gpu: A heterogeneous CPU-GPU simulator,” IEEE Comput. Archit. Lett., vol. 14, no. 1, pp. 34–36, 2015.
  46.  “FusionSim GPU simulator project.” [Online]. Available: https://sites.google.com/site/fusionsimulator/
  47.  A. Nakonechny and Z. Veres, “The wavelet based trained filter for image interpolation,” in Proc. IEEE 1st International Conference on Data Stream Mining & Processing, 2016, pp. 218–221.
  48.  G. Strang and T. Nguyen, Wavelets and Filter Banks. Welleslay, UK: Welleslay-Cambridge Press, 1996.
  49.  P. Lipiński and J. Stolarek, “Improving watermark resistance against removal attacks using orthogonal wavelet adaptation,” in Proc. 38th Conference on Current Trends in Theory and Practice of Computer Science, vol. 7147, 2012, pp. 588–599.
  50.  D. Bařina, M. Kula, and P. Zemčík, “Parallel wavelet schemes for images,” J. Real-Time Image Process., vol. 16, no. 5, pp. 1365–1381, 2019.
  51.  D. Bařina, M. Kula, M. Matýšek, and P. Zemčík, “Accelerating discrete wavelet transforms on GPUs,” in Proc. International Conference on Image Processing (ICIP), 2017, pp. 2707– 2710.
  52.  D. Bařina, M. Kula, M. Matýšek, and P. Zemčík, “Accelerating discrete wavelet transforms on parallel architectures,” J. WSCG, vol. 25, no. 2, pp. 77–85, 2017.
  53.  W. van der Laan, A. Jalba, and J. Roerdink, “Accelerating wavelet lifting on graphics hardware using CUDA,” IEEE Trans. Parallel Distrib. Syst., vol. 22, no. 1, pp. 132–146, 2011.
  54.  M. Yatsymirskyy, “A novel matrix model of two channel biorthogonal filter banks,” Metody Informatyki Stosowanej, pp. 205–212, 2011.
  55.  M. Yatsymirskyy and K. Stokfiszewski, “Effectiveness of lattice factorization of two-channel orthogonal filter banks,” in Proc. Joint Conference NTAV/SPA, 2012, pp. 275–279.
  56.  M. Yatsymirskyy, “Lattice structures for synthesis and implementation of wavelet transforms,” J. Appl. Comput. Sci., vol. 17, no. 1, pp. 133–141, 2009.
  57.  J. Stolarek, “Adaptive synthesis of a wavelet transform using fast neural network,” Bull. Pol. Acad. Sci. Tech. Sci., vol. 59, pp. 9– 13, 2011.
  58.  D. Puchala, K. Stokfiszewski, K. Wieloch, and M. Yatsymirskyy, “Comparative study of massively parallel GPU realizations of wavelet transform computation with lattice structure and matrixbased approach,” in Proc. IEEE International Conference on Data Stream Mining & Processing, 2018, pp. 88–93.
  59.  M. Harris, S. Sengupta, and J.D. Owens, “Parallel prefix sum (scan) with CUDA,” in GPU Gems 3, Part VI: GPU Computing, H. Nguyen, Ed. Addison Wesley, 2007, pp. 851–876.
  60.  S. Sengupta, A.E. Lefohn, and J.D. Owens, “A work-efficient step-efficient prefix sum algorithm,” in Proc. Workshop on Edge Computing Using New Commodity Architectures, 2006, pp. D–26–27.
  61.  J. Franco, G. Bernabe, J. Fernandez, and M.E. Acacio, “A parallel implementation of the 2d wavelet transform using CUDA,” in Proc. 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing, 2009, pp. 111–118.
  62.  H. Bantikyan, “CUDA based implementation of 2-D discrete Haar wavelet transformation,” in Proc. International Conference Parallel and Distributed Computing Systems, 2014, pp. 20–26.
  63.  M.J. Flynn and S.F. Oberman, Advanced Computer Arithmetic Design. New York, NY, USA: John Wiley & Sons, Inc., 2001.
  64.  Ł. Napierała, “Effectiveness measurements of linear transforms realized on graphics processing units with the use of GPGPUSim emulator” – MSc thesis, Institute of Information Technology, Łódz´ University of Technology, Poland, 2020.
Go to article

Authors and Affiliations

Dariusz Puchala
1
ORCID: ORCID
Kamil Stokfiszewski
1
ORCID: ORCID
Kamil Wieloch
1

  1. Institute of Information Technology, Łódź University of Technology, ul. Wólczańska 215, 90-924 Łódź, Poland

This page uses 'cookies'. Learn more