A Generalized Crypto Logic Unit (GCLU) With Software and Hardware Implementations

Rabie A. Mahmoud¹, Magdy Saeb²

¹. General Organization of Remote Sensing (GORS), Damascus, Syria.
². Computer Engineering Department, Arab Academy of Science, Technology & Maritime Transport (AAST), Alexandria, Egypt.

mail@magdysaeb.net

Abstract: The Generalized Crypto Logic Unit (GCLU) is a key-driven encryption function modified from the Crypto Logic Unit (CLU) which is defined as the cipher engine of Metamorphic Stone Cipher. This Crypto Logic Unit uses eight bit-balanced operations. These operations are: XOR, INV, ROR, NOP, XNOR, SWAP, ROL, RevOr for bitwise xor, invert, rotate right, no operation, xnor, swap, rotate left, and reverse order respectively. In addition, we provide the Software and Field Programmable Gate Array (FPGA) implementation of the Generalized Crypto Logic Unit.

Keywords: Generalized Crypto Logic Unit, Metamorphic, Cipher, Cryptography, FPGA.

1. Introduction

The Crypto Logic Unit (CLU) is considered the cipher engine of the key-driven Stone Metamorphic Cipher [1], [2], and is used to modify many famous ciphers to increase the cipher’s entropy and improve its security. These modified ciphers include the Metamorphic Twofish Cipher [3], the Metamorphic MARS Cipher [4], and the Metamorphic-Key-Hopping GOST Cipher [5]. The CLU is built using four low-level bit-balanced operations. These operations are: XORing a key bit with a plaintext bit (XOR), inverting a plaintext bit (INV), exchanging one plaintext bit with another one in a given plaintext word using a right rotation operation (ROR), and producing a plaintext bit without any change (NOP). The Generalized Crypto Logic Unit (GCLU), on the other hand, extrapolates the idea of using the bit-balanced four low-level operations in eight low-level bit-balanced operations. These are: the four operations of the CLU plus four other low-level operations. These newly-added operations are: XNORing a key bit with a plaintext bit (XNOR), swapping a plaintext bit with another one in a given plaintext word (SWAP), a left rotation operation (ROL), and the reverse order operation that reverses a plaintext word (RevOr). In the following sections, we discuss the GCLU structure, and its software and hardware implementations. Finally, we provide a summary and our conclusions.

2. Generalized Crypto Logic Unit (GCLU)

As discussed in the introduction section, the Generalized Crypto Logic Unit (GCLU) is a modified Crypto Logic Unit, which is defined in the key-driven Stone Metamorphic cipher, by adding four more operations to the CLU operations. The resulting eight low-level operations are:

- (XOR) by XORing a key bit with a plaintext bit,
- (INV) by inverting a plaintext bit,
- (NOP) by producing the plaintext without any change,
- (ROR) by exchanging one plaintext bit with another one in a given plaintext word using a right rotation operation,
- (XNOR) by XNORing a key bit with a plaintext bit,
- (SWAP) by exchanging one plaintext bit with another one in a given plaintext word using a swap operation,
- (ROL) by exchanging one plaintext bit with another one in a given plaintext word using a left rotation operation,
- (RevOr) by exchanging one plaintext bit with another one in a given plaintext word using a reverse order operation.

Figure 1 shows the basic generalized crypto logic unit and Table 1 demonstrates the details of each one of the GCLU operations.
Table 1. GCLU operations

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Operation</th>
<th>Select Operation Code</th>
</tr>
</thead>
<tbody>
<tr>
<td>XOR</td>
<td>$C_i = K_i \oplus P_i$</td>
<td>“000”</td>
</tr>
<tr>
<td>INV</td>
<td>$C_i = \neg P_i$</td>
<td>“001”</td>
</tr>
<tr>
<td>ROR</td>
<td>$C_i = P_i \ggg m$</td>
<td>“010”</td>
</tr>
<tr>
<td>NOP</td>
<td>$C_i = P_i$</td>
<td>“011”</td>
</tr>
<tr>
<td>XNOR</td>
<td>$C_i = K_i \odot P_i$</td>
<td>“100”</td>
</tr>
<tr>
<td>SWAP</td>
<td>$C_i = # P_i$</td>
<td>“101”</td>
</tr>
<tr>
<td>ROL</td>
<td>$C_i = P_i \lll m$</td>
<td>“110”</td>
</tr>
<tr>
<td>RevOr</td>
<td>$C_i = \lll P_i$</td>
<td>“111”</td>
</tr>
</tbody>
</table>

Similar to the CLU, the GCLU can be used as the encryptor and the decryptor where by changing the output cipher bit to become an input plaintext bit, the new output will be the same as the old plain text bit. But, this is a feature for XOR, INV, NOP, XNOR, SWAP, or RevOr functions. The exceptions are in the cases of the decryptor of ROR will use ROL, and the decryptor of ROL will use ROR. Appendix A shows the truth table of GCLU. Likewise, the operation_selection_bits ($S_2; S_3; S_4$) can be chosen from any three key bits; the same idea applies for the rotation_selection_bits ($S'_0; S'_1; S'_2$). Figure 2 shows the locations of operation selection bits and rotation selection bits.

![Figure 2. The proposed key format where the location of the operation and rotation selection bits is shown](image2)

3. The Algorithm

In this section, we provide the formal description of the Ultra Crypto Logic Unit as follows:

**Function Ultra Crypto Logic Unit (GCLU)**

Begin
1. Read the next plaintext message $P_i$;
2. Read the next sub-key $K_i$;
3. Read n-bit rotation_selection_bits from sub-key where $2^n$=Block size $B$;
4. Read 3-bit operation_selection_bits form sub-key;
5. Use operation selection & rotation selection bits to select and perform the operation:
   - XOR when operation_selection_bits=“000”
   - INV when operation_selection_bits=“001”
   - ROR when operation_selection_bits=“010”
   - NOP when operation_selection_bits=“011”
   - XNOR when operation_selection_bits=“100”
   - SWAP when operation_selection_bits=“101”
   - ROL when operation_selection_bits=“110”
   - RevOr when operation_selection_bits=“111”;
6. Perform the encryption operation using plaintext bit and sub-key bit to get a cipher bit;
7. Store the resulting cipher bit;
End;

4. Software/Hardware Implementation

A pseudo C++ function [6] of the generalized crypto logic unit is applied representing the truth table of GCLU utilizing Microsoft Visual C++ 2010 Express. Appendix C provides a sample C++ code for the GCLU. Figure 3 shows the correct build solution of the C++ project of GCLU. Figure 4 is the execution screen of GCLU. Furthermore, a proof-of-concept FPGA-based implementation is used to encrypt a one byte plaintext using one byte sub-key word. We have implemented the GCLU applying the VHDL hardware description language and utilizing Altera design environment Quartus II 13.0 Service Pack 1 Web Edition [7]. The FPGA design was implemented using EP2C5AF256A7, Cyclone II family device. Appendix D represents the sample VHDL code for GCLU. The implementation results and the schematic diagram for GCLU are shown in Figure 5. The RTL screen and technology map viewer for GCLU are shown in Figures 6, and 7 respectively. Figure 8 demonstrates the floor plan for GCLU. The details of the analysis and synthesis summary and timing analyzer are shown in appendix B.

![Figure 3. C++ project of GCLU showing the correct build solution](image3)

![Figure 4. Execution screen of the C++ project of GCLU showing the truth table of the GCLU](image4)
5. Summary and Conclusions

We have presented the Generalized Crypto Logic Unit (GCLU) which is a modified version of the crypto logic unit (CLU) of the key-driven Stone Metamorphic Cipher. The GCLU is constructed using eight bit-balanced operations. The eight low-level operations are pseudo-randomly chosen using three key-dependent selection bits. These operations are: bitwise xor, invert, rotate right, no operation, xnor, swap, rotate left, and reverse order. In addition, we have shown that the generalized crypto logic unit can be implemented as Software or FPGA-based Hardware. We have included a proof-of-concept software and FPGA hardware implementations. The aim of modifying the CLU to be GCLU is to increase a cipher’s entropy by providing a higher degree of randomness and thus an enhanced security. This GCLU is then utilized to modify well-known ciphers in order to achieve key-dependent encryption.

References


Appendix A: The truth table of the GCLU

<table>
<thead>
<tr>
<th>Pi</th>
<th>Ki</th>
<th>→Pj</th>
<th>S2</th>
<th>S1</th>
<th>S0</th>
<th>Operation</th>
<th>Ci</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>XOR</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>INV</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>ROR</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>NOP</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>XNOR</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>SWAP</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>ROL</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>RevOr</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>XOR</td>
<td>0</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>INV</td>
<td>1</td>
</tr>
</tbody>
</table>
Appendix B: The analysis & synthesis and fitter report details

FPGA synthesis of GCLU for 1-byte inputs consumes 83 logic elements to perform multiplexers with no registers, and needs 17.255 ns from input port “Plaintext[7]” to output port “Ciphertext[7]”. Table 2 and Table 3 show the number of usage logic elements and the interconnections between them in Area, Speed, and Balanced optimization technique. Figure 9 shows the delays in the design of the GCLU.

Analysis & Synthesis and Fitter Summary

• Family: Cyclone II
• Device: EP2C5AF256A7
• Nominal Core Voltage: 1.20 V
• Minimum Core Junction Temperature: -40 °C
• Maximum Core Junction Temperature: 125 °C.

• Optimization Technique: Balanced
• Total logic elements: 83 out of 4,608 (2%)
  -- Combinational with no register:83
  -- Register only:0
  -- Combinational with a register:0

Logic element usage by number of LUT inputs

-- 4 input functions: 47
-- 3 input functions: 34
-- <=2 input functions: 2
-- Register only: 0

Logic elements by mode

-- Normal mode: 83
-- Arithmetic mode: 0

• Total LABs: 6 out of 288 (2 %)
• Total fan-out: 302
• Average fan-out: 2.75
• Highest non-global fan-out: 27
• Maximum fan-out: 27

• Block interconnects: 80 out of 15,666 (< 1 %)
• C16 interconnects: 7 out of 812 (< 1 %)
• C4 interconnects: 69 out of 11,424 (< 1 %)
• Direct links: 7 out of 15,666 (< 1 %)
• Global clocks: 0 out of 8 (0 %)
• Local interconnects: 48 out of 4,608 (1 %)
• R24 interconnects: 7 out of 652 (1 %)
• R4 interconnects: 34 out of 13,328 (< 1 %)

Table 2. A synthesis comparison between optimization technique implementations of GCLU

<table>
<thead>
<tr>
<th>Logic elements</th>
<th>Balanced</th>
<th>Area</th>
<th>Speed</th>
</tr>
</thead>
<tbody>
<tr>
<td>Total logic elements</td>
<td>83</td>
<td>84</td>
<td>83</td>
</tr>
<tr>
<td>Total combinational functions</td>
<td>83</td>
<td>84</td>
<td>83</td>
</tr>
<tr>
<td>4 input functions</td>
<td>47</td>
<td>44</td>
<td>60</td>
</tr>
<tr>
<td>3 input functions</td>
<td>34</td>
<td>39</td>
<td>19</td>
</tr>
<tr>
<td>&lt;=2 input functions</td>
<td>2</td>
<td>1</td>
<td>4</td>
</tr>
<tr>
<td>Total fan-out</td>
<td>302</td>
<td>303</td>
<td>313</td>
</tr>
<tr>
<td>Maximum fan-out</td>
<td>27</td>
<td>23</td>
<td>31</td>
</tr>
<tr>
<td>Average fan-out</td>
<td>2.75</td>
<td>2.73</td>
<td>2.85</td>
</tr>
</tbody>
</table>

Table 3. A fitter comparison between optimization technique implementation of GCLU

<table>
<thead>
<tr>
<th>Logic elements</th>
<th>Balanced</th>
<th>Area</th>
<th>Speed</th>
</tr>
</thead>
<tbody>
<tr>
<td>Total logic elements</td>
<td>83</td>
<td>84</td>
<td>83</td>
</tr>
<tr>
<td>Total combinational functions</td>
<td>83</td>
<td>84</td>
<td>83</td>
</tr>
<tr>
<td>4 input functions</td>
<td>47</td>
<td>44</td>
<td>60</td>
</tr>
<tr>
<td>3 input functions</td>
<td>34</td>
<td>39</td>
<td>19</td>
</tr>
<tr>
<td>&lt;=2 input functions</td>
<td>2</td>
<td>1</td>
<td>4</td>
</tr>
<tr>
<td>Total fan-out</td>
<td>302</td>
<td>303</td>
<td>313</td>
</tr>
<tr>
<td>Maximum fan-out</td>
<td>27</td>
<td>23</td>
<td>31</td>
</tr>
<tr>
<td>Average fan-out</td>
<td>2.75</td>
<td>2.73</td>
<td>2.85</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Logic elements</th>
<th>Balanced</th>
<th>Area</th>
<th>Speed</th>
</tr>
</thead>
<tbody>
<tr>
<td>Total logic elements</td>
<td>83</td>
<td>84</td>
<td>83</td>
</tr>
<tr>
<td>Total combinational functions</td>
<td>83</td>
<td>84</td>
<td>83</td>
</tr>
<tr>
<td>4 input functions</td>
<td>47</td>
<td>44</td>
<td>60</td>
</tr>
<tr>
<td>3 input functions</td>
<td>34</td>
<td>39</td>
<td>19</td>
</tr>
<tr>
<td>&lt;=2 input functions</td>
<td>2</td>
<td>1</td>
<td>4</td>
</tr>
<tr>
<td>Total fan-out</td>
<td>302</td>
<td>303</td>
<td>313</td>
</tr>
<tr>
<td>Maximum fan-out</td>
<td>27</td>
<td>23</td>
<td>31</td>
</tr>
<tr>
<td>Average fan-out</td>
<td>2.75</td>
<td>2.73</td>
<td>2.85</td>
</tr>
</tbody>
</table>
In Balanced Optimization
• Longest propagation delay was 17.255 ns from input port “Plaintext[7]” to output port “Ciphertext[7]”.
• Longest minimum propagation delay was 7.182 ns from input port “Plaintext[1]” to output port “Ciphertext[7]”.

In Area Optimization
• Longest propagation delay was 17.639 ns from input port “Plaintext[0]” to output port “Ciphertext[2]”.
• Longest minimum propagation delay was 7.229 ns from input port “Plaintext[2]” to output port “Ciphertext[4]”.

In Speed Optimization
• Longest propagation delay was 15.911 ns from input port “Plaintext[4]” to output port “Ciphertext[3]”.
• Longest minimum propagation delay was 6.708 ns from input port “Plaintext[4]” to output port “Ciphertext[5]”.

### Appendix C: Sample C++ code for GCLU

```cpp
#include <iostream>
#include <bitset>
using namespace std;

int main() {
    cout << encrypt(0,0,0,0,0,0,0,0,0) << endl;
    cout << encrypt(0,0,0,0,0,0,1,0,0) << endl;
    cout << encrypt(0,0,0,0,0,0,1,0,1) << endl;
    cout << encrypt(0,0,0,0,0,1,0,0,0) << endl;
    cout << encrypt(0,0,0,0,0,1,0,0,1) << endl;
    cout << encrypt(0,0,0,0,0,1,1,0,0) << endl;
    cout << encrypt(0,0,0,0,0,1,1,0,1) << endl;
    cout << encrypt(0,0,0,0,1,0,0,0,0) << endl;
    cout << encrypt(0,0,0,0,1,0,0,0,1) << endl;
    cout << encrypt(0,0,0,0,1,0,0,1,0) << endl;
    cout << encrypt(0,1,1,1,1,1,0,0,0) << endl;
    cout << encrypt(1,1,1,1,1,0,0,0,0) << endl;
    cout << encrypt(1,1,1,1,1,0,0,1,0) << endl;
    cout << encrypt(1,1,1,1,1,0,1,0,0) << endl;
    cout << encrypt(1,1,1,1,1,0,1,0,1) << endl;
    cout << encrypt(1,1,1,1,1,1,0,0,0) << endl;
    cout << encrypt(1,1,1,1,1,1,0,0,1) << endl;
    cout << encrypt(1,1,1,1,1,1,0,1,0) << endl;
    cout << encrypt(1,1,1,1,1,1,0,1,1) << endl;
    cout << encrypt(1,1,1,1,1,1,1,0,0) << endl;
    cout << encrypt(1,1,1,1,1,1,1,0,1) << endl;
    cout << encrypt(1,1,1,1,1,1,1,1,0) << endl;
    cout << encrypt(1,1,1,1,1,1,1,1,1) << endl;
    return 0;
}
```

### Appendix D: Sample VHDL code for GCLU

```vhdl
LIBRARY IEEE;
USE IEEE.STD_LOGIC_1164.ALL;
USE IEEE.STD_LOGIC_ARITH.ALL;
USE IEEE.STD_LOGIC_UNSIGNED.ALL;
USE IEEE.NUMERIC_STD.ALL;

ENTITY GCLU IS
PORT( P : in std_logic_vector (7 downto 0);
      K : in std_logic_vector (7 downto 0);
      C : out std_logic_vector (7 downto 0));
END GCLU;

ARCHITECTURE behavioral OF GCLU IS

SIGNAL Operation_sel_bits : std_logic_vector (7 downto 0);
SIGNAL Rotaion_sel_bits : std_logic_vector (7 downto 0);
SIGNAL SWAP_P : std_logic_vector (7 downto 0);
SIGNAL RevOr_P : std_logic_vector (7 downto 0);

BEGIN

Operation_sel_bits <= K(7) & K(5) & K(3);
Rotaion_sel_bits <= K(1) & K(0) & K(4);

C <= P XOR K WHEN Operation_sel_bits="000" ELSE
    NOT P WHEN Operation_sel_bits="001" ELSE
    P WHEN Operation_sel_bits="011" ELSE
    P XOR K WHEN Operation_sel_bits="100" ELSE
    SWAP_P WHEN Operation_sel_bits="101" ELSE
```

### Timing Analyzer Summary

![Figure 9. Delays in the design of the GCLU](image-url)
RevOr_P  WHEN Operation_sel_bits="111"  ELSE 

---ROR------
P  WHEN 
Operation_sel_bits="010" AND Rotaion_sel_bits="000"  ELSE 
P(0) & P(7 downto 1)  WHEN 
Operation_sel_bits="010" AND Rotaion_sel_bits="001"  ELSE 
P(1 downto 0) & P(7 downto 2)  WHEN 
Operation_sel_bits="010" AND Rotaion_sel_bits="010"  ELSE 
P(2 downto 0) & P(7 downto 3)  WHEN 
Operation_sel_bits="010" AND Rotaion_sel_bits="011"  ELSE 
P(3 downto 0) & P(7 downto 4)  WHEN 
Operation_sel_bits="010" AND Rotaion_sel_bits="100"  ELSE 
P(4 downto 0) & P(7 downto 5)  WHEN 
Operation_sel_bits="010" AND Rotaion_sel_bits="101"  ELSE 
P(5 downto 0) & P(7 downto 6)  WHEN 
Operation_sel_bits="010" AND Rotaion_sel_bits="110"  ELSE 
P(6 downto 0) & P(7)  WHEN 
Operation_sel_bits="010" AND Rotaion_sel_bits="111"  ELSE 
---ROL------
P  WHEN 
Operation_sel_bits="110" AND Rotaion_sel_bits="000"  ELSE 
P(6 downto 0) & P(7)  WHEN 
Operation_sel_bits="110" AND Rotaion_sel_bits="001"  ELSE 
P(5 downto 0) & P(7 downto 6)  WHEN 
Operation_sel_bits="110" AND Rotaion_sel_bits="010"  ELSE 
P(4 downto 0) & P(7 downto 5)  WHEN 
Operation_sel_bits="110" AND Rotaion_sel_bits="011"  ELSE 
P(3 downto 0) & P(7 downto 4)  WHEN 
Operation_sel_bits="110" AND Rotaion_sel_bits="100"  ELSE 
P(2 downto 0) & P(7 downto 3)  WHEN 
Operation_sel_bits="110" AND Rotaion_sel_bits="101"  ELSE 
P(1 downto 0) & P(7 downto 2)  WHEN 
Operation_sel_bits="110" AND Rotaion_sel_bits="110"  ELSE 
P(0) & P(7 downto 1)  WHEN 
Operation_sel_bits="110" AND Rotaion_sel_bits="111"  END behavioral;

Magdy Saeb received the BSEE, School of Engineering, Cairo University, in 1974, the MSEE, and Ph.D. degrees in Electrical & Computer Engineering, University of California, Irvine, in 1981 and 1985, respectively. He was with Kaiser Aerospace and Electronics, Irvine California, and The Atomic Energy Establishment, Anshas, Egypt. He is a professor and former head of the Department of Computer Engineering, Arab Academy for Science, Technology & Maritime Transport, Alexandria, Egypt; He was on-leave working as a principal researcher in the Malaysian Institute of Microelectronic Systems (MIMOS). He is the Chief Technology Officer of an Information Security Company GWIS. He holds five International Patents in Cryptography. His current research interests include Cryptography, FPGA Implementations of Cryptography and Steganography Data Security Techniques, Encryption Processors, Mobile Agent Security. www.magdysaeb.net

Rabie A. Mahmoud received the B.Sc. Degree, Faculty of Science, Tishreen University, Latakia-Syria, in 2001, the MS. and Ph.D.in Computational Science, Faculty of Science, Cairo University, Egypt, in 2007 and 2011 respectively. Currently, he is working in General Organization of Remote Sensing (GORS), Damascus, Syria. His current interests include Cryptography, FPGA Implementations of Cryptography and Data Security Techniques. rabiemah@yahoo.com