3.59 128-bit Multiply

★★

The following code computes the 128-bit product of two 64-bit signed values x and y and stores the result in memory:

typedef __int128 int128_t;
void store_prod(int128_t *dest, int64_t x, int64_t y) {
    *dest = x * (int128_t) y;
}

GCC generates the following assembly code implementing the computation:

store_prod:
    movq     %rdx, %rax
    cqto
    movq     %rsi, %rcx
    sarq     $63, %rcx
    imulq     %rax, %rcx
    imulq     %rsi, %rdx
    addq     %rdx, %rcx
    mulq     %rsi
    addq     %rcx, %rdx
    movq     %rax, (%rdi)
    movq     %rdx, 8(%rdi)
    ret

This code uses three multiplications for the multiprecision arithmetic required to implement 128-bit arithmetic on a 64-bit machine. Describe the algorithm used to compute the product, and annotate the assembly code to show how it realizes your algorithm. Hint: When extending arguments of x and y to 128 bits, they can be rewritten as $x=2^{64}\cdot x_h+x_l$ and $y=2^{64}\cdot y_h+y_l$ , where $x_h$ , $x_l$ , $y_h$ , and $y_l$ are 64-bit values. Similary, the 128-bit product can be written as $p=2^{64}\cdot p_h+p_l$ , where $p_h$ , and $p_l$ are 64-bit values. Show how the code computes the values of $p_h$ and $p_l$ in terms of $x_h$ , $x_l$ , $y_h$ and $y_l$ .

Solve:

assume:

ux=x+x_{63}\cdot 2^{64}\\ uy=y+y_{63}\cdot 2^{64}

then:

ux\cdot uy=(x+x_{63}\cdot 2^{64})\cdot (y+y_{63}\cdot 2^{64})\\ = x\cdot y+x\cdot y_{63}\cdot 2^{64}+y\cdot x_{63}\cdot 2^{64}+x_{63}\cdot y_{63}\cdot 2^{128}

$2^{128}$ overflows and don't care. So:

x\cdot y=ux\cdot uy-(x\cdot y_{63}\cdot 2^{64}+y\cdot x_{63}\cdot 2^{64})

store_prod:
    ; dest in %rdi, x in %rsi, y in %rdx
    movq     %rdx, %rax         ; %rax = y
    cqto                        ; Singend Extend %rax=>%rdx:%rax, so %rdx = y_h, -1 if y_63=1, and 0 if y_63=0
    movq     %rsi, %rcx         ; %rcx = x
    sarq     $63, %rcx          ; %rcx = x >> 63, so %rcx = x_h, -1 if x_63=1, and 0 if x_63=0
    imulq    %rax, %rcx         ; %rcx = %rax * %rcx = y * -x_63
    imulq    %rsi, %rdx         ; %rdx = %rsi * %rdx = x * -y_63
    addq     %rdx, %rcx         ; %rcx = %rdx + %rcx = x * -y_63 + y * -x_63
    mulq     %rsi               ; %rdx:%rax = %rax * %rsi = uy * ux, because mulq is unsigned full multiply
    addq     %rcx, %rdx         ; %rdx = %rcx + %rdx = uy * ux + x * -y_63 + y * -x_63
    movq     %rax, (%rdi)       ; set lower 64 bits
    movq     %rdx, 8(%rdi)      ; set higher 64 bits
    ret

Previous3.58 Decode Next3.60 For Loop

Last updated 5 years ago

Was this helpful?