Saturday, June 2, 2012

High Availability for Runbook Servers on Invoking Runbooks

Here's just a quick tidbit for specifying runbook servers used in the Invoke Runbook standard activity.  It’s beneficial to keep the runbook server names stored in Global Variables.  This way you can insert the variable in the Invoke Runbook activity instead of hardcoding the server name.

In case a runbook server has issues and is powered down or unavailable, the invoke runbook activity will automatically start the invoked runbook on the next runbook server in line.  Another benefit is the ability to add/remove runbook servers and just update a variable than trying to find/replace every invoke runbook activity w/ the new server name.

You can also use the variables to group runbook servers based on their role in the environment.  So to load balance runbooks, depending on how many runbook servers you have (for this example we’ll say three internal).  You could specify three variables which have the following values.

Primary:  myrunbookserver1;myrunbookserver2;myrunbookserver3
Secondary:  myrunbookserver2;myrunbookserver3;myrunbookserver1
Tertiary:  myrunbookserver3;myrunbookserver1;myrunbookserver2

So for runbook servers interacting w/ servers on an internal domain you could use:



So for runbook servers interacting w/ servers on a dmz domain you could use:



Finally, subscribe to the variable in the Invoke Runbook activity.

7 comments:

  1. Would it work just as fine with a "computer group" variable?

    Or are they meant for another purpose?

    ReplyDelete
    Replies
    1. Computer groups are intended to run an activity against every machine in the computer group. The semicolon delimited variable shown in the "Runbook Servers" field is intended to invoke the runbook against the first RS that is available.

      For funzies I just tested this and you can only insert a computer group on a "Computer" field in an activity. You would not be able to insert a computer group into the Runbook Servers field on the invoke runbook activity.

      Delete
  2. Here's a way to do this using random server selection. This goes a step further than your solution, but I would think this approach is a bit more flexible and a great way to implement the don't repeat yourself rule.

    Instead of creating global variables that list the order of preference, just randomly select the server.

    Set up your global variables with the server names of each of the runbook servers.

    Let's keep it simple. Global variable "SCORCH01" and global variable of "SCORCH02".

    Just before your invoke runbook activity in your initiation runbook, add a run .Net script activity (PowerShell of course) and return the data to the bus.

    Here's the simple one-liner for the script:

    $RBServer = ($Server = "SCORCH01","SWPSCH02" ) | Get-Random

    Replace the server name above by subscribing to the global variable. Return that to the bus and use it in the initiate runbook activity.

    Voila! Now you are randomly distributing your runbook workload.

    ReplyDelete
    Replies
    1. Hi,

      Rather than making it completely random (which could still overload specific runbook servers w/o truely balancing runbooks across), you could run a query against the database to count the total active running runbooks on each runbook server and choose the one with the least to run it on.

      Jon

      Delete
    2. Yes you could query and I have a script here to do that:

      select top 1 rb.computer
      from ACTIONSERVERS rb
      order by isnull((select count(RunbookServerId) from [Orchestrator].[Microsoft.SystemCenter.Orchestrator.Runtime.Internal].[Jobs] job where rb.UniqueID = job.RunbookServerId and job.statusid = 1 group by RunbookServerId),0)

      But, the issue I have is that let's say you do an action that spawns 1000 runbooks and you want to distribute the load among your servers, so I put the query first before I call the runbooks but after the split array object, well the query will run 1000 times and get the same results because the runbooks have not started yet, so all 1000 will go to the same server. Just not sure how to get past that one.

      Delete
    3. Hello,

      Hmmmm....there are probably a couple options to get around that. I used to have a workflow that queried a static list of ~150 servers to run processes on every hour. To make it run most efficiently (and quickly), I had the Database Query activity query the server names that linked to four Invoke Runbook activities specified with different runbook servers to run on (for the same runbook to process them). I controlled which servers went to which RS through the link logic.

      Link 1 > Server names from 01 - 30
      Link 2 > Server names from 31 - 60
      Link 3 > Server names from 61 - 90
      etc....

      This didn't necessarily load balance based on current # of active runbooks already running, but in this case you're just trying to load balance a huge amount of runbooks across all possible runbook servers. Also, this would only work if your list is static and will have the same values (although you could throw in some additional logic to make it dynamic). If the list is in a database table, you could also grab an ID column with the value and put the ID in the link logic.

      Let me know if that is a possible solution.
      Jon

      Delete