# Fast Big-Integer Arithmetic on SVP64 at up to 256-bits/cycle and beyond

Jacob R. Lifshay

FOSDEM 2024



Vectorization Extension for PowerISA developed by Libre-SOC



- Vectorization Extension for PowerISA developed by Libre-SOC
- Basically, a way to modify nearly any PowerISA instruction to run it in a HW loop.



- Vectorization Extension for PowerISA developed by Libre-SOC
- Basically, a way to modify nearly any PowerISA instruction to run it in a HW loop.

Simple Example:





- Vectorization Extension for PowerISA developed by Libre-SOC
- Basically, a way to modify nearly any PowerISA instruction to run it in a HW loop.

Simple Example:

| setvl 0, 0, 3, 0, | 1, 1 # makes stuff run 3 times                |
|-------------------|-----------------------------------------------|
| sv.add *r3, *r15, | r12 # adds 3 times                            |
|                   |                                               |
| # expands to:     |                                               |
| add r3, r15, r12  | <pre># no * means r12 doesn't increment</pre> |
| add r4, r16, r12  | # * means r3 and r15 increment                |
| add r5, r17, r12  |                                               |



setvl 0, 0, 4, 0, 1, 1 # makes stuff run 4 times
addic r0, r0, 0 # clear CA (carry flag)
sv.adde \*r4, \*r4, \*r8 # carry-propagating add







Disclaimer: SVP64 is designed for everything from tiny to big and fast CPUs, this example only shows a hypothetical big and fast CPU design



## Big-Integer Addition on an example CPU



## Big-Integer Addition on an example CPU





new instruction: maddedu RT, RA, RB, RC



- new instruction: maddedu RT, RA, RB, RC
- ▶  $64 \times 64 + 64 \rightarrow 128$ -bit Multiply-Add



- new instruction: maddedu RT, RA, RB, RC
- ▶  $64 \times 64 + 64 \rightarrow 128$ -bit Multiply-Add

Semantics as used in this presentation (somewhat simplified):

result = (RA \* RB) + RC
RT = LSB\_HALF(result)
RC = MSB\_HALF(result)



#### **Big-Integer Multiply on SVP64**



## Big-Integer Multiply on SVP64

```
# 320-bit output in r4-8
setvl 0, 0, 4, 0, 1, 1 # makes stuff run 4 times
li r8, 0 # clear carry register
sv.maddedu *r4, r3, *r20, r8 # carrying multiply
```



```
# 64-bit input in r3
# 256-bit input in r20-23
# 320-bit output in r4-8
setvl 0, 0, 4, 0, 1, 1 # makes stuff run 4 times
li r8, 0 # clear carry register
sv.maddedu *r4, r3, *r20, r8 # carrying multiply
li r8, 0
maddedu r4, r3, r20, r8
maddedu r5, r3, r21, r8
maddedu r6, r3, r22, r8
maddedu r7, r3, r23, r8
```



## Big-Integer Multiply on SVP64



# Big-Integer Multiply on an example CPU

#### sv.maddld \*r4, \*r8, \*r16, \*r20 # mul-add

#### Partial Products



## Big-Integer Multiply on an example CPU

#### sv.maddedu \*r4, r3, \*r20, r8 # carrying multiply

#### Partial Products



- Discussion: https://lists.libre-soc.org
- ▶ IRC #libre-soc on OFTC or Libera
- Matrix #\_oftc\_#libre-soc:matrix.org
- https://libre-soc.org/
- Thanks to NLnet for funding this: https://nlnet.nl/assure
- https://libre-soc.org/nlnet/#faq