📘
Deuterium Wiki
  • Hello
  • Linux
    • cmd
      • du: 显式文件大小
      • seq: 生成序列
      • cat: 连接
      • cp: 复制
      • cd: 切换目录
      • mv: 移动
    • awk
      • 执行awk脚本
      • 删除空行
      • 个数统计
      • 文件的交集
      • 文件的差集
    • mysql
      • 删除重复数据
      • 导出数据不带标题
  • Reading
    • Novel
      • 《基督山伯爵》人物关系
    • Awesome CS Books
      • csapp-3e-homework-solution
        • 1. A Tour of Computer Systems
        • 2. Representing and Manipulating Information
          • 2.55 Compile and Run
          • 2.56 Another Try
          • 2.57 More show Procedures
          • 2.58 Check Little-Endian
          • 2.59 Bit Expressions
          • 2.60 Replace Byte
          • 2.61 More Bit Expressions
          • 2.62 Check Arithmetic Right Shift
          • 2.63 Logic & Arithmetic Right Shift
          • 2.64 Any Odd One
          • 2.65 Odd Ones
          • 2.66 Leftmost One
          • 2.67 Int Size is 32
          • 2.68 Lower One Mask
          • 2.69 Rotate Left
          • 2.70 Fits Bits
          • 2.71 Xbyte
          • 2.72 Copy Int
          • 2.73 Saturating Add
          • 2.74 Sub OK
          • 2.75 Unsigned High Prod
          • 2.76 calloc
          • 2.77 Multiple By Shifts
          • 2.78 Divide Power 2
          • 2.79 Mul3div4
          • 2.80 Three Fourths
          • 2.81 Generate Bits
          • 2.82 Signed and Unsigned
          • 2.83 Binary Floating Value
          • 2.84 Float Le
          • 2.85 Floating Point I
          • 2.86 Extend Precision
          • 2.87 Floating-Point II
          • 2.88 Floating-Point III
          • 2.89 Floating-Point IV
          • 2.90 fpwr2
          • 2.91 π
          • 2.92 Float Negate
          • 2.93 Float Absval
          • 2.94 Float Twice
          • 2.95 Float Half
          • 2.96 Float f2i
          • 2.97 Float i2f
        • 3. Machine-Level Representation of Programs
          • 3.58 Decode
          • 3.59 128-bit Multiply
          • 3.60 For Loop
          • 3.61 Conditional Data Transfer
          • 3.62 Switch I
          • 3.63 Switch II
          • 3.64 Multiple Dimension Array I
          • 3.65 Multiple Dimension Array II
          • 3.66 Multiple Dimension Array III
          • 3.67 Caller and Callee
          • 3.68 Alignment
          • 3.69 Struct
          • 3.70 Union
          • 3.71 fgets
          • 3.72 Variable-Size Stack
          • 3.73 Find Range I
          • 3.74 Find Range II
          • 3.75 Complex
      • tcpv1
        • ch01: Introduction
        • ch02: Link Layer
        • ch03: Internet Protocol
        • ch04: Address Resolutin Protocol
        • ch05: Reverse Address Resolution Protocol
        • ch06: Internet Control Message Protocol
        • ch07: Ping Program
        • ch08: Traceroute Program
        • ch09: IP Routing
        • ch10: Dynamic Routing Protocols
        • ch11: User Datagram Protocol
        • ch12: Broadcasting and Multicasting
        • ch13: Internet Group Management Protocol
        • ch14: The Domain Name System
        • ch15: Trivial File Transfer Protocol
        • ch16: Boostrap Protocol
        • ch17: Transmission Control Protocol
        • ch18: TCP Connection Establishment and Termination
        • ch 19: TCP Interactive Data Flow
        • ch20: TCP Bulk Data Flow
      • http
        • ch01: Overview of HTTP
        • ch02: URLs and Resources
        • ch03: HTTP Messages
        • ch04: Connection Management
        • ch05: Web Servers
        • ch06: Proxies
        • ch07: Caching
        • ch08: Integration Points
        • ch09: Web Robots
        • ch10: HTTP-NG
        • ch11: Client Identification and Cookies
        • ch12: Basic Authentication
        • ch13: Digest Authentication
        • ch14: Secure HTTP
        • ch15: Entities and Encodings
        • ch16: Internationalizated
        • ch17: Content Negotiation and Transcoding
        • ch18: Web Hosting
        • ch19: Publishing Systems
        • ch20: Redirections and Load Balancing
        • ch21: Logging and Usage Tracking
    • 提升认知
      • 《为什么需要生物学思维》
      • 《大话西方艺术史》
  • Mathematics
Powered by GitBook
On this page

Was this helpful?

  1. Reading
  2. Awesome CS Books
  3. csapp-3e-homework-solution
  4. 3. Machine-Level Representation of Programs

3.59 128-bit Multiply

★★

Previous3.58 DecodeNext3.60 For Loop

Last updated 4 years ago

Was this helpful?

The following code computes the 128-bit product of two 64-bit signed values x and y and stores the result in memory:

typedef __int128 int128_t;
void store_prod(int128_t *dest, int64_t x, int64_t y) {
    *dest = x * (int128_t) y;
}

GCC generates the following assembly code implementing the computation:

store_prod:
    movq     %rdx, %rax
    cqto
    movq     %rsi, %rcx
    sarq     $63, %rcx
    imulq     %rax, %rcx
    imulq     %rsi, %rdx
    addq     %rdx, %rcx
    mulq     %rsi
    addq     %rcx, %rdx
    movq     %rax, (%rdi)
    movq     %rdx, 8(%rdi)
    ret

This code uses three multiplications for the multiprecision arithmetic required to implement 128-bit arithmetic on a 64-bit machine. Describe the algorithm used to compute the product, and annotate the assembly code to show how it realizes your algorithm. Hint: When extending arguments of x and y to 128 bits, they can be rewritten as x=264⋅xh+xlx=2^{64}\cdot x_h+x_lx=264⋅xh​+xl​and y=264⋅yh+yly=2^{64}\cdot y_h+y_ly=264⋅yh​+yl​ , where xhx_hxh​ , xlx_lxl​ , yhy_hyh​ , and yly_lyl​ are 64-bit values. Similary, the 128-bit product can be written as p=264⋅ph+plp=2^{64}\cdot p_h+p_lp=264⋅ph​+pl​ , where php_hph​ , and plp_lpl​ are 64-bit values. Show how the code computes the values of php_hph​ and plp_lpl​ in terms of xhx_hxh​ , xlx_lxl​ , yhy_hyh​ and yly_lyl​ .

Solve:

assume:

then:

store_prod:
    ; dest in %rdi, x in %rsi, y in %rdx
    movq     %rdx, %rax         ; %rax = y
    cqto                        ; Singend Extend %rax=>%rdx:%rax, so %rdx = y_h, -1 if y_63=1, and 0 if y_63=0
    movq     %rsi, %rcx         ; %rcx = x
    sarq     $63, %rcx          ; %rcx = x >> 63, so %rcx = x_h, -1 if x_63=1, and 0 if x_63=0
    imulq    %rax, %rcx         ; %rcx = %rax * %rcx = y * -x_63
    imulq    %rsi, %rdx         ; %rdx = %rsi * %rdx = x * -y_63
    addq     %rdx, %rcx         ; %rcx = %rdx + %rcx = x * -y_63 + y * -x_63
    mulq     %rsi               ; %rdx:%rax = %rax * %rsi = uy * ux, because mulq is unsigned full multiply
    addq     %rcx, %rdx         ; %rdx = %rcx + %rdx = uy * ux + x * -y_63 + y * -x_63
    movq     %rax, (%rdi)       ; set lower 64 bits
    movq     %rdx, 8(%rdi)      ; set higher 64 bits
    ret
ux=x+x63⋅264uy=y+y63⋅264ux=x+x_{63}\cdot 2^{64}\\ uy=y+y_{63}\cdot 2^{64}ux=x+x63​⋅264uy=y+y63​⋅264
ux⋅uy=(x+x63⋅264)⋅(y+y63⋅264)=x⋅y+x⋅y63⋅264+y⋅x63⋅264+x63⋅y63⋅2128ux\cdot uy=(x+x_{63}\cdot 2^{64})\cdot (y+y_{63}\cdot 2^{64})\\ = x\cdot y+x\cdot y_{63}\cdot 2^{64}+y\cdot x_{63}\cdot 2^{64}+x_{63}\cdot y_{63}\cdot 2^{128}ux⋅uy=(x+x63​⋅264)⋅(y+y63​⋅264)=x⋅y+x⋅y63​⋅264+y⋅x63​⋅264+x63​⋅y63​⋅2128

21282^{128}2128 overflows and don't care. So:

x⋅y=ux⋅uy−(x⋅y63⋅264+y⋅x63⋅264)x\cdot y=ux\cdot uy-(x\cdot y_{63}\cdot 2^{64}+y\cdot x_{63}\cdot 2^{64})x⋅y=ux⋅uy−(x⋅y63​⋅264+y⋅x63​⋅264)