ethOS : Cronjob : Check Mining RIG Hash

Prerequisites: Mining RIG using ethOS

 

This procedure will help you setting up a cronjob to monitor the Hash rate data from your ethOS RIG; if the hasrate goes below your defined cronjob, after 6 tries, the cronjob will restart your RIG.

 

REQUIREMENTS:

As described in requirements section of the check/script, if you want it to be able to restart your rig you will need to :

  • Create a gpu_crashReboot.log file with 0 as value inside :
  • (as ethos user, elevate root:)

    1
    2
    sudo -s
    echo 0 > /root/gpu_crashReboot.log
  • Edit Crontab file using :
  • (always as root user:)

    1
    crontab -e
  • Add the following at the end:
  • 1
    2
    # Cron Check Rig Hash
    * * * * * /usr/bin/python /root/check_rig-hash > /dev/null 2>&1

    Save and quit.

    Do not forget to adjust the minimum Global Rig hashrate value [rigMinHashRate] to your needs (no units):

      rigMinHashRate = XX.X

     

    And the check itself has to be placed in /root/check_rig-hash

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    #!/usr/bin/python
    #*****************************************************************
    # Author: David Bayle
    # Contact: contact@davidbayle.com
    # This python scripts cheks your ethOS mining rig Hash using Cron.
    #
    # Usage: If the script detects hashrate lower than your threshold, or crashed GPU(s)
    # it will start counting up to 5 (5 x 1 min by defaults) and then force reboot.
    #
    #
    # Settings:
    # Do not forget to adjust your global minimum hashrate value by rig [rigMinHashRate] to your needs :
    # # rigMinHashRate = 110.0
    #
    # Requirements:
    #
    # R1) Create a gpu_crashReboot.log file with 0 as value inside :
    # (as ethos user, elevate as root and: )
    # # sudo -s
    # # echo 0 > /root/gpu_crashReboot.log
    #
    # R2) Do not forget to setup the cron itself:
    # # Cron Check Rig Hash
    # * * * * * /usr/bin/python /root/check_rig-hash > /dev/null 2>&1
    #
    #
    # Script will now be allowed to restart your rig if something goes wrong as explained in Usage section.
    # Enjoy ;)
    #
    #

    import os
    import sys

    rigMinHashRate = 110.0
    rigRebootLogFile = "/root/gpu_crashReboot.log"
    rigStatusLogFile = "/var/run/ethos/status.file"
    rigStatsLogFile = "/var/run/ethos/stats.file"


    # ================================   functions  =============================
    def PrintOutput(dumpStr):
      print dumpStr


    def ReadStatusFile():
      try:
        # read rig hash rate from ethos status file
        pStatusLogFile = open(rigStatusLogFile, "r")
        returnedStatus = pStatusLogFile.read()
        return returnedStatus
      except:
        print "File read error in - " + rigStatusLogFile


    def ReadRigName():
      try:
        # read rig name from ethos stats file
        pStatsLogFile = open(rigStatsLogFile, "r")
        for line in open(rigStatsLogFile, "r"):
          if "rack_loc" in line:
            returnedString = line.split(":", 1)
            returnedName = returnedString[1]
            rName = returnedName.replace("\n", "")
            return rName
      except:
        print "File read error in - " + rigStatsLogFile    


    def WriteRebootCount(count):
      #print count
      try:
        # writes reboot counter in a file
        pLogFile = open(rigRebootLogFile, "w")
        pLogFile.write("%i" % int(count))
        pLogFile.close()
      except:
        print "File write error in - " + rigRebootLogFile


    def ReadRebootCount():
      try:
        # read reboot counter from a file
        pLogFile = open(rigRebootLogFile, "r")
        returnedValue = pLogFile.read(1)
    #    print returnedValue
        return returnedValue
      except:
        print "File read error in - " + rigRebootLogFile


    def IsNumber(n):
        is_number = True
        try:
            num = float(n)
            # check for "nan" floats
            is_number = num == num   # or use `math.isnan(num)`
        except ValueError:
            is_number = False
        return is_number


    #================================ RUN ======================================== #

    try:
    # try reading status file
        rigStats = ReadStatusFile()
    # try reading status file
        rigName = ReadRigName()
    # gpu reboot count init
        gpuRebootCount = int(ReadRebootCount())
    except:
        PrintOutput("UNKNOWN - Invalid Status File / Reboot counter read")

    # extract data
    try:
        hashRateData =  rigStats.split(" ", 1)
    #    print hashRateData[0]
        hashRate = hashRateData[0]
    except:
        PrintOutput("UNKNOWN - Invalid RIG Hashrate reading")


    if (IsNumber(hashRateData[0])):

      if (float(hashRate) <= float(rigMinHashRate)):
        PrintOutput("WARNING: RIG HASHRATE LOWER THAN THRESHOLD (" + str(rigMinHashRate) + ") : " + hashRate + " (MH/s)")
        gpuRebootCount = gpuRebootCount + 1
        WriteRebootCount(int(gpuRebootCount))
        if (gpuRebootCount >= 5):
          PrintOutput("CRITICAL: REBOOTING: MINER CRASHED FOR 5 MINs : RIG HASHRATE LOWER THAN THRESHOLD (" + str(rigMinHashRate) + ") : " + hashRate + " (MH/s)")
          WriteRebootCount(0)
          os.system("/sbin/reboot")

      else:
        WriteRebootCount(0)
        PrintOutput("OK - [" + str(rigName) + "] Global Rig hashrate : " + hashRate + " (MH/s) [Threshold: " +str(rigMinHashRate) + "]")

    else:
      if (hashRateData[0].strip() == "gpu" and "clock problem" in hashRateData[1].strip()):
        PrintOutput("WARNING: A GPU CRASHED !!!")
        gpuRebootCount = gpuRebootCount + 1
        WriteRebootCount(int(gpuRebootCount))
        if (gpuRebootCount >= 5):
          PrintOutput("CRITICAL: REBOOTING: MINER GPU CRASHED FOR 5 MINs !!!")
          WriteRebootCount(0)
          os.system("/sbin/reboot")

      if (hashRateData[0].strip() == "possible" and "miner stall" in hashRateData[1].strip()):
        PrintOutput("WARNING: POSSIBLE MINER CRASH !!!")
        gpuRebootCount = gpuRebootCount + 1
        WriteRebootCount(int(gpuRebootCount))
        if (gpuRebootCount >= 5):
          PrintOutput("CRITICAL: REBOOTING: MINER GPU CRASHED FOR 5 MINs !!!")
          WriteRebootCount(0)
          os.system("/sbin/reboot")

      if (hashRateData[0].strip() == "miner" and "started" in hashRateData[1].strip()):
        PrintOutput("WARNING: Miner Starting")
        gpuRebootCount = gpuRebootCount + 1
        WriteRebootCount(int(gpuRebootCount))
        if (gpuRebootCount >= 5):
          PrintOutput("CRITICAL: REBOOTING: MINER DIDN'T START IN 5 MINs !!!")
          WriteRebootCount(0)
          os.system("/sbin/reboot")

    You can test the script using:

     

    1
    2
    # python /root/check_rig-hash
    OK - Global Rig hashrate : 122.7 (MH/s) [Threshold: 110.0]

    Side note: This check/script, doesn t use EthOS API and Panel anymore, but local file stats, this in order to avoid being block by their site stats/api.

    Tested against ethOS 1.2.3 to 1.3.0

     
    https://github.com/davidbayle/ethos_GPUmonitoring-Cron