2020-03-03

(仮想通貨)Linux Mint で ROCm な OpenCL を動かしてみた

ちょっと時間があったので Linux Mint に ROCm をインストールして Etheminer を動かしてみました。

現在の構成は以下の通りです。

・Linux Mint 19.3(Linux Kernel 5.3.0-40-lowlatency)
・AMDGPU-PRO 19.10(OpenCL関連のみ部分的にインストール)
・GPU#1 は Ryzen 3 2200G の内蔵(画面表示にのみ使用)
・GPU#2 は Radeon RX 570(OpenCL 専用)


今回ためしてみたのは ROCm v3.1.0 です。




Deploying ROCm の Ubuntu を参考にインストールしてみました。 手順通りで特に問題なくインストールは終わったものの、インストール後の動作確認で rocminfo は正常終了、clinfo は segmentation fault 発生という状況です。
※インストールは DKMS のフェーズで CPU 使用率も上がり、だいぶ時間もかかる

AMDGPU-PRO はインストールしたままなのが良くないのかと思い、AMDGPU-PRO 関連のパッケージをアンインストールしてみたものの解消しないため、一時中断しました。

何か回避策はないかと README.md を読み進めて行くと、だいぶ下の方に以下のような記載を発見しました。

Similarly, a user that only wants to install OpenCL support instead of HCC and HIP may want to skip the rocm-dkms and rocm-dev packages. Instead, they could directly install rock-dkms, rocm-opencl, and rocm-opencl-dev and their dependencies.
こんな大事なことはもっとわかりやすいところに書いておいて欲しい・・・。

ということで、Timeshift でインストール前の状態にシステムを復元し、再度以下のパッケージのみをインストールしてみました。

apt install rock-dkms rocm-opencl rocm-opencl-dev rocm-utils

これでなんとなく成功した感じです。インストールするパッケージが少なくなったためディスクスペースにもやさしいし、インストール時間も短いです。

さて、動作確認です。まずは rocminfo から。

=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 3 2200G with Radeon Vega Graphics
  Marketing Name:          AMD Ryzen 3 2200G with Radeon Vega Graphics
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
  Chip ID:                 5597(0x15dd)                       
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3500                               
  BDFID:                   1792                               
  Internal Node ID:        0                                  
  Compute Unit:            4                                  
  SIMDs per CU:            4                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    8388224(0x7ffe80) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        TRUE                               
  ISA Info:                
    N/A                      
*******                  
Agent 2                  
*******                  
  Name:                    gfx902                             
  Marketing Name:          AMD Ryzen 3 2200G with Radeon Vega Graphics
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 5597(0x15dd)                       
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1100                               
  BDFID:                   1792                               
  Internal Node ID:        0                                  
  Compute Unit:            11                                 
  SIMDs per CU:            4                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        160(0xa0)                          
  Max Work-item Per CU:    10240(0x2800)                      
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Acessible by all:        FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx902+xnack    
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*******                  
Agent 3                  
*******                  
  Name:                    gfx803                             
  Marketing Name:          Ellesmere [Radeon RX 470/480/570/570X/580/580X]
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          4096(0x1000)                       
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26591(0x67df)                      
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1244                               
  BDFID:                   256                                
  Internal Node ID:        1                                  
  Compute Unit:            32                                 
  SIMDs per CU:            4                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      FALSE                              
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    4194304(0x400000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Acessible by all:        FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Acessible by all:        FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx803          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

Agent 1 が Ryzen 3 2200G の CPU部分、Agent 2 が同じく Ryzen 3 2200G の GPU部分、Agent 3 が Radeon RX570 で正しく認識されています。

すばらしい。


続いて、ROCm の clinfo の結果です。

おや?RX570 ではなく、Ryzen の GPU 部分が認識されちゃってますね。

この状態で Etheminer を動かすと動き出しはするのですが、メモリー容量が足りなくてエラーになってしまいます。Etheminer を動かすには 3GB超のメモリーが必要なのですが、Ryzen の内蔵 GPU は 2GB までしかメモリーを割り当てられません。


Number of platforms:     1
  Platform Profile:     FULL_PROFILE
  Platform Version:     OpenCL 2.1 AMD-APP (3084.0)
  Platform Name:     AMD Accelerated Parallel Processing
  Platform Vendor:     Advanced Micro Devices, Inc.
  Platform Extensions:     cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:     AMD Accelerated Parallel Processing
Number of devices:     1
  Device Type:      CL_DEVICE_TYPE_GPU
  Vendor ID:      1002h
  Board name:      AMD Ryzen 3 2200G with Radeon Vega Graphics
  Device Topology:     PCI[ B#7, D#0, F#0 ]
  Max compute units:     11
  Max work items dimensions:    3
    Max work items[0]:     1024
    Max work items[1]:     1024
    Max work items[2]:     1024
  Max work group size:     256
  Preferred vector width char:    4
  Preferred vector width short:    2
  Preferred vector width int:    1
  Preferred vector width long:    1
  Preferred vector width float:    1
  Preferred vector width double:   1
  Native vector width char:    4
  Native vector width short:    2
  Native vector width int:    1
  Native vector width long:    1
  Native vector width float:    1
  Native vector width double:    1
  Max clock frequency:     1100Mhz
  Address bits:      64
  Max memory allocation:    3102316236
  Image support:     No
  Max size of kernel argument:    1024
  Alignment (bits) of base address:   1024
  Minimum alignment (bytes) for any datatype:  128
  Single precision floating point capability
    Denorms:      Yes
    Quiet NaNs:      Yes
    Round to nearest even:    Yes
    Round to zero:     Yes
    Round to +ve and infinity:    Yes
    IEEE754-2008 fused multiply-add:   Yes
  Cache type:      Read/Write
  Cache line size:     64
  Cache size:      16384
  Global memory size:     3649783808
  Constant buffer size:     3102316236
  Max number of constant args:    8
  Local memory type:     Scratchpad
  Local memory size:     65536
  Max pipe arguments:     16
  Max pipe active reservations:    16
  Max pipe packet size:     3102316236
  Max global variable size:    3102316236
  Max global variable preferred total size:  3649783808
  Max read/write image args:    0
  Max on device events:     1024
  Queue on device max size:    8388608
  Max on device queues:     1
  Queue on device preferred size:   262144
  SVM capabilities:     
    Coarse grain buffer:    Yes
    Fine grain buffer:     Yes
    Fine grain system:     Yes
    Atomics:      No
  Preferred platform atomic alignment:   0
  Preferred global atomic alignment:   0
  Preferred local atomic alignment:   0
  Kernel Preferred work group size multiple:  64
  Error correction support:    0
  Unified memory for Host and Device:   1
  Profiling timer resolution:    1
  Device endianess:     Little
  Available:      Yes
  Compiler available:     Yes
  Execution capabilities:     
    Execute OpenCL kernels:    Yes
    Execute native function:    No
  Queue on Host properties:     
    Out-of-Order:     No
    Profiling :      Yes
  Queue on Device properties:     
    Out-of-Order:     Yes
    Profiling :      Yes
  Platform ID:      0x7fd6efa63d30
  Name:       gfx902+xnack
  Vendor:      Advanced Micro Devices, Inc.
  Device OpenCL C version:    OpenCL C 2.0 
  Driver version:     3084.0 (HSA1.1,LC)
  Profile:      FULL_PROFILE
  Version:      OpenCL 2.0 
  Extensions:      cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program

さて、再び Timeshift でシステムを復元し、AMDGPU-PRO の clinfo 結果を確認してみたところ、Device としてはやはり 1台しか認識されていなかったものの、こちらは BUS-ID の若い RX570 が認識されています。

ROCm の clinfo 結果もこういう風に認識してほしいのですが。

Number of platforms:     1
  Platform Profile:     FULL_PROFILE
  Platform Version:     OpenCL 2.1 AMD-APP (2841.4)
  Platform Name:     AMD Accelerated Parallel Processing
  Platform Vendor:     Advanced Micro Devices, Inc.
  Platform Extensions:     cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:     AMD Accelerated Parallel Processing
Number of devices:     1
  Device Type:      CL_DEVICE_TYPE_GPU
  Vendor ID:      1002h
  Board name:      Radeon RX 570 Series
  Device Topology:     PCI[ B#1, D#0, F#0 ]
  Max compute units:     32
  Max work items dimensions:    3
    Max work items[0]:     1024
    Max work items[1]:     1024
    Max work items[2]:     1024
  Max work group size:     256
  Preferred vector width char:    4
  Preferred vector width short:    2
  Preferred vector width int:    1
  Preferred vector width long:    1
  Preferred vector width float:    1
  Preferred vector width double:   1
  Native vector width char:    4
  Native vector width short:    2
  Native vector width int:    1
  Native vector width long:    1
  Native vector width float:    1
  Native vector width double:    1
  Max clock frequency:     1244Mhz
  Address bits:      64
  Max memory allocation:    3418680524
  Image support:     Yes
  Max number of images read arguments:   128
  Max number of images write arguments:   8
  Max image 2D width:     16384
  Max image 2D height:     16384
  Max image 3D width:     2048
  Max image 3D height:     2048
  Max image 3D depth:     2048
  Max samplers within kernel:    16
  Max size of kernel argument:    1024
  Alignment (bits) of base address:   2048
  Minimum alignment (bytes) for any datatype:  128
  Single precision floating point capability
    Denorms:      No
    Quiet NaNs:      Yes
    Round to nearest even:    Yes
    Round to zero:     Yes
    Round to +ve and infinity:    Yes
    IEEE754-2008 fused multiply-add:   Yes
  Cache type:      Read/Write
  Cache line size:     64
  Cache size:      16384
  Global memory size:     4282875904
  Constant buffer size:     3418680524
  Max number of constant args:    8
  Local memory type:     Scratchpad
  Local memory size:     32768
  Max pipe arguments:     0
  Max pipe active reservations:    0
  Max pipe packet size:     0
  Max global variable size:    0
  Max global variable preferred total size:  0
  Max read/write image args:    0
  Max on device events:     0
  Queue on device max size:    0
  Max on device queues:     0
  Queue on device preferred size:   0
  SVM capabilities:     
    Coarse grain buffer:    No
    Fine grain buffer:     No
    Fine grain system:     No
    Atomics:      No
  Preferred platform atomic alignment:   0
  Preferred global atomic alignment:   0
  Preferred local atomic alignment:   0
  Kernel Preferred work group size multiple:  64
  Error correction support:    0
  Unified memory for Host and Device:   0
  Profiling timer resolution:    1
  Device endianess:     Little
  Available:      Yes
  Compiler available:     Yes
  Execution capabilities:     
    Execute OpenCL kernels:    Yes
    Execute native function:    No
  Queue on Host properties:     
    Out-of-Order:     No
    Profiling :      Yes
  Queue on Device properties:     
    Out-of-Order:     No
    Profiling :      No
  Platform ID:      0x7f6634f3d1b0
  Name:       Ellesmere
  Vendor:      Advanced Micro Devices, Inc.
  Device OpenCL C version:    OpenCL C 1.2 
  Driver version:     2841.4
  Profile:      FULL_PROFILE
  Version:      OpenCL 1.2 AMD-APP (2841.4)
  Extensions:      cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 

なんか全然ダメじゃないことがわかっただけでも今日は収穫がありました。もう少し調べると AMDGPU-PRO とサヨナラできそうな気がしてきました。