A common issue we experience is that agents often go offline in the SA console. Whether it is because the admin of the system screwed up a gateway, shutdown the server without deactivating it from SA or for one of a hundred other reasons it may happen, when you have 20,000+ devices in SA, it becomes painful to track them. Pile on that we don’t want to track every single offline agent, just ones that have been offline for awhile, meaning the agent has some type of an permanent issue, rather than occasionally flapping.

For our solution, we wrote up the following simple python script to do the job for us.

import sys
import time
import smtplib
from email.MIMEText import MIMEText
from optparse import OptionParser
from getpass import getpass
sys.path.append('/opt/opsware/smopylibs2')
sys.path.append('/opt/opsware/agent_tools/') 

import agenttools_common
from pytwist import *
from pytwist.com.opsware.search import Filter
from pytwist.com.opsware.server import ServerRef

ts = twistserver.TwistServer()
ses = ts.search.SearchService
ss = ts.server.ServerService
DAY = 86400

parser = OptionParser()
parser.add_option("-u","--user",dest="sauser",
                  help="SA login id")
parser.add_option("-p","--pass",dest="sapasswd",
                  help="SA login password")
parser.add_option("-r","--prompt",action="store_true",dest="readstdin",
                  help="Prompt for passwords instead of using arguments")
parser.add_option("-c","--customer",dest="customer",
                  help="Customer name to search for")
parser.add_option("-d","--days",dest="days",
                  help="Expiration limit in days")

(options, args) = parser.parse_args()

if not options.customer:
        print "You must specify a customer to search for."
        sys.exit(1)

if not options.sapasswd and not options.readstdin:
        print "You must specify a password to login to SA."
        sys.exit(1)

if not options.sauser:
        print "You must specify a username to login to SA."
        sys.exit(1)

if options.readstdin:
    options.sapasswd = getpass("Enter Opsware Password: ")

ts.authenticate(options.sauser, options.sapasswd)

filt = Filter()
filt.objectType = 'device'
filt.expression = "(ServerVO.state = UNREACHABLE) & " \
                  "(ServerVO.opswLifecycle != DEACTIVATED)  " \
                  "(device_customer_name = %s)" % options.customer

offline_servers = ses.findObjRefs(filt)
expired_servers = []

if len(offline_servers) == 0:
    print 'There are no unreachable servers'
    sys.exit(0)
else:
    print "Processing %d servers" % len(offline_servers)

now = int(time.time())
for sref in offline_servers:
    shvo = ss.getServerHardwareVO(sref)
    beginDate = shvo.beginDate
    if (now - int(beginDate)) >= (DAY * int(options.days)):
        server_data = [sref.id, sref.name, beginDate]

        # this sorts the list.  if you don't care about
        # sorting, remove this section.

        # sort while create - could have sorted
        # after create, which would probably been
        # faster, but this was an educational exercise

        # if the list is blank, just append
        if len(expired_servers) == 0:
            expired_servers.append(server_data)
            continue

        # place it in the correct spot
        inserted = None
        for i in range(0,len(expired_servers)):
            if beginDate < expired_servers[i][2]:
                expired_servers.insert(i,server_data)
                inserted = 1
                break

        if not inserted:
            expired_servers.append(server_data)
        # end of sorting

        # if you don't sort, then just uncomment this line
        # expired_servers.append(server_data)

if len(offline_servers) == 0:
    print "All unreachable servers have not been offline for more than %s" \
          "days" % options.days
    sys.exit(0)

report = "\r\n\r\nThe following servers have not registered for more than %s " \
         "days.\r\n\r\n" % options.days

for server in expired_servers:
    datestr = "%s/%s/%s %s:%s:%s" % time.gmtime(float(server[2]))[0:6]
    report += "%s : %s has not registered since %s.  \r\n" % \
            (server[0],server[1],datestr)

# If you want output to the screen
print report

# Create a text/plain message
msg = MIMEText(report)

sender = "root@example.com"
rcptto = ["user1@example.com", "user2@example.com"]
msg['Subject'] = "[opsware] Offline Servers for %s" % options.customer
msg['From'] = sender
msg['To'] = ",".join(rcptto)

# Send the message via our own SMTP server, but don't include the
# envelope header.
s = smtplib.SMTP('localhost')
s.sendmail(sender, rcptto, msg.as_string())
s.quit()

A couple of notes are probably in order. First, this was written for our environment, so it may require some massaging to get it to work in your own. The multiple ‘/r/n’ tags are so that it shows up in Outlook correctly and not as a huge, one line mess. By default, the script will sort by date. There are some comments in there if you’d like to remove that functionality.

We wanted to be able to run this from any agent, rather than from the cores, so that our internal groups that use SA could run this if they chose to. It only requires that the agent tools and the Python 2 API for Server Modules packages be installed.

To run, use something like the following:

/opt/opsware/smopython2/python ./offline.py -u username -p password -d 30 -c customername

Make sure to change the bottom of the script to have the email addresses you’d like to use and then just cron the script.

Comments welcome.

( Thanks to Trey Ratcliffe for the use of the photo used on the header of this post. )

This entry was posted in Featured, Work. Bookmark the permalink.

6 Responses to HP SA – Finding offline agents

  1. prasadh says:

    good one, really helpful to the opsware community, keep them coming…

  2. Arun says:

    Really Nice, Can you give an example to get server script reference or start server script both available in ServerScriptService

  3. steve says:

    Hi Arun,

    Thanks for the comment. I’ve posted a new article that may help you out and I’ll be posting more soon with more information.

  4. Arun says:

    Steve,

    I have seen the post its very helpful:). Also, i modified the code to get ServerRefs dynamically as you explained.

    Could you please let me know how to create a user account with group options in the managed Node using PyTwist Libs?

    If possible, please share some sample script for the same.

    Cheers!

    • steve says:

      Unfortunately Arun, the pytwist libs don’t have any functionality for adding users on a managed node.

      The best solution would be to create a script that creates the users and then to run that script via the pytwist.

      You can use the ‘setParameters’ method of the ServerScriptJobArgs object to add some parameters about which user to add and the group options you’d like to run.

  5. Arun says:

    Steve,

    Is possible to install LAMP stack in core opsware? if so please explain the installation procedure?

    Also, explain how i need to execute OGFS/OGSH commands from the webserver?


    Thanks
    Arun

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Browse by Topic