3.59 128-bit Multiply

★★

The following code computes the 128-bit product of two 64-bit signed values x and y and stores the result in memory:

typedef __int128 int128_t;
void store_prod(int128_t *dest, int64_t x, int64_t y) {
    *dest = x * (int128_t) y;
}

GCC generates the following assembly code implementing the computation:

store_prod:
    movq     %rdx, %rax
    cqto
    movq     %rsi, %rcx
    sarq     $63, %rcx
    imulq     %rax, %rcx
    imulq     %rsi, %rdx
    addq     %rdx, %rcx
    mulq     %rsi
    addq     %rcx, %rdx
    movq     %rax, (%rdi)
    movq     %rdx, 8(%rdi)
    ret

This code uses three multiplications for the multiprecision arithmetic required to implement 128-bit arithmetic on a 64-bit machine. Describe the algorithm used to compute the product, and annotate the assembly code to show how it realizes your algorithm. Hint: When extending arguments of x and y to 128 bits, they can be rewritten as x=264xh+xlx=2^{64}\cdot x_h+x_land y=264yh+yly=2^{64}\cdot y_h+y_l , where xhx_h , xlx_l , yhy_h , and yly_l are 64-bit values. Similary, the 128-bit product can be written as p=264ph+plp=2^{64}\cdot p_h+p_l , where php_h , and plp_l are 64-bit values. Show how the code computes the values of php_h and plp_l in terms of xhx_h , xlx_l , yhy_h and yly_l .

Solve:

assume:

ux=x+x63264uy=y+y63264ux=x+x_{63}\cdot 2^{64}\\ uy=y+y_{63}\cdot 2^{64}

then:

uxuy=(x+x63264)(y+y63264)=xy+xy63264+yx63264+x63y632128ux\cdot uy=(x+x_{63}\cdot 2^{64})\cdot (y+y_{63}\cdot 2^{64})\\ = x\cdot y+x\cdot y_{63}\cdot 2^{64}+y\cdot x_{63}\cdot 2^{64}+x_{63}\cdot y_{63}\cdot 2^{128}

21282^{128} overflows and don't care. So:

xy=uxuy(xy63264+yx63264)x\cdot y=ux\cdot uy-(x\cdot y_{63}\cdot 2^{64}+y\cdot x_{63}\cdot 2^{64})
store_prod:
    ; dest in %rdi, x in %rsi, y in %rdx
    movq     %rdx, %rax         ; %rax = y
    cqto                        ; Singend Extend %rax=>%rdx:%rax, so %rdx = y_h, -1 if y_63=1, and 0 if y_63=0
    movq     %rsi, %rcx         ; %rcx = x
    sarq     $63, %rcx          ; %rcx = x >> 63, so %rcx = x_h, -1 if x_63=1, and 0 if x_63=0
    imulq    %rax, %rcx         ; %rcx = %rax * %rcx = y * -x_63
    imulq    %rsi, %rdx         ; %rdx = %rsi * %rdx = x * -y_63
    addq     %rdx, %rcx         ; %rcx = %rdx + %rcx = x * -y_63 + y * -x_63
    mulq     %rsi               ; %rdx:%rax = %rax * %rsi = uy * ux, because mulq is unsigned full multiply
    addq     %rcx, %rdx         ; %rdx = %rcx + %rdx = uy * ux + x * -y_63 + y * -x_63
    movq     %rax, (%rdi)       ; set lower 64 bits
    movq     %rdx, 8(%rdi)      ; set higher 64 bits
    ret

Last updated