Parallel Implementations of ARX-Based Block Ciphers on Graphic Processing Units

An, SangWoo; Kim, YoungBeom; Kwon, Hyeokdong; Seo, Hwajeong; Seo, Seog Chung

Parallel Implementations of ARX-Based Block Ciphers on Graphic Processing Units

SangWoo An, YoungBeom Kim, Hyeokdong Kwon, Hwajeong Seo and Seog Chung Seo
Additional contact information
SangWoo An: Department of Financial Information Security, Kookmin University, Seoul 02707, Korea
YoungBeom Kim: Department of Information Security, Cryptology, and Mathematics, Kookmin University, Seoul 02707, Korea
Hyeokdong Kwon: Division of IT Convergence Engineering, Hansung University, Seoul 02876, Korea
Hwajeong Seo: Division of IT Convergence Engineering, Hansung University, Seoul 02876, Korea
Seog Chung Seo: Department of Information Security, Cryptology, and Mathematics, Kookmin University, Seoul 02707, Korea

Mathematics, 2020, vol. 8, issue 11, 1-25

Abstract: With the development of information and communication technology, various types of Internet of Things (IoT) devices have widely been used for convenient services. Many users with their IoT devices request various services to servers. Thus, the amount of users’ personal information that servers need to protect has dramatically increased. To quickly and safely protect users’ personal information, it is necessary to optimize the speed of the encryption process. Since it is difficult to provide the basic services of the server while encrypting a large amount of data in the existing CPU, several parallel optimization methods using Graphics Processing Units (GPUs) have been considered. In this paper, we propose several optimization techniques using GPU for efficient implementation of lightweight block cipher algorithms on the server-side. As the target algorithm, we select high security and light weight (HIGHT), Lightweight Encryption Algorithm (LEA), and revised CHAM, which are Add-Rotate-Xor (ARX)-based block ciphers, because they are used widely on IoT devices. We utilize the features of the counter (CTR) operation mode to reduce unnecessary memory copying and operations in the GPU environment. Besides, we optimize the memory usage by making full use of GPU’s on-chip memory such as registers and shared memory and implement the core function of each target algorithm with inline PTX assembly codes for maximizing the performance. With the application of our optimization methods and handcrafted PTX codes, we achieve excellent encryption throughput of 468, 2593, and 3063 Gbps for HIGHT, LEA, and revised CHAM on RTX 2070 NVIDIA GPU, respectively. In addition, we present optimized implementations of Counter Mode Based Deterministic Random Bit Generator (CTR_DRBG), which is one of the widely used deterministic random bit generators to provide a large amount of random data to the connected IoT devices. We apply several optimization techniques for maximizing the performance of CTR_DRBG, and we achieve 52.2, 24.8, and 34.2 times of performance improvement compared with CTR_DRBG implementation on CPU-side when HIGHT-64/128, LEA-128/128, and CHAM-128/128 are used as underlying block cipher algorithm of CTR_DRBG, respectively.

Keywords: CHAM; LEA; HIGHT; Graphic Processing Unit (GPU); CUDA; Counter (CTR) mode; parallel processing (search for similar items in EconPapers)
JEL-codes: C (search for similar items in EconPapers)
Date: 2020
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
https://www.mdpi.com/2227-7390/8/11/1894/pdf (application/pdf)
https://www.mdpi.com/2227-7390/8/11/1894/ (text/html)

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:gam:jmathe:v:8:y:2020:i:11:p:1894-:d:438051

Access Statistics for this article

Mathematics is currently edited by Ms. Emma He

More articles in Mathematics from MDPI
Bibliographic data for series maintained by MDPI Indexing Manager ().