Tuesday, May 8, 2018

Parallel PowerShell - Part 2

My next challenge was that I had to check multiple URLs for each server. Plus I also needed to check if a server was online, and do some other checks and validations to verify its status before I went and tried to open a remote PowerShell session (WinRm) session to it. I didn't want to check servers I was simply going to timeout on.  So I had 2 cases where I needed to work out a way to run multiple tasks a the same time within a single PowerShell script.

One way to do that is with Jobs.
Another is to use Windows Workflows. I went with workflows...

I like Workflows, as they are more like Function calls. and they have a nice coding ascetic. Plus there are some other benefits at the run time that they take advantage of.  Which I didn't know at the time, but the presentations at the PowerShell Summit this year, I have a new appreciation of how to make things go faster.

When I first investigated Workflows, I found them very hard to work with. There is a number of restricted commands, and things that Workflows won't let you do. Then one day, I did an experiment, I wanted to see if I could get a workflow to execute a function, and not be so restrictive about what I could do.  It worked, and it worked very well.

What I later learned, is that Function calls are compiled by the PowerShell interpreter when it processes the script. That makes Functions run quickly.  As it also turns out Workflows also get compiled, so the combination becomes very efficient for the computer to run.  Efficiency is an important component of Speed. This turned out to be a lucky coincidence for my work, however, I learned a lot by doing it. However, I lost that code in the jumble of activity and had to re-invent the wheel later on.

This method is reversed order than what you would normally;y  think of in a script.
you have to create a Function, and then a Workflow before you can execute them.
However you structure your code, I think its good to keep such blocks easily understood.
Because the next guy to maintain your code, could be you in a year, and you are going to have to figure out what you did, with a vague memory.

Example code

function Test-webrequest { ... }

Workflow Get-VIPTest
{
    param ([array]$Servers)
    foreach –parallel ($Server in $Servers)
    { Test-webrequest -uri $Serve }
} #end workflow

$VIPS = "https://outbound.corp.net/abc/isalive", "https://outbound.corp.net/abc/isalive"

Get-VIPTest $VIPS |
Select-Object Connect, Code, URL, Errormsg, Success |
Tee-Object -Variable VIPReport |
Out-host



First I create the function to do all the dirty work.
function Test-webrequest {  ...  }

here I leave a lot of detail out, but its a sophisticated check of the URL it is given.
One painful but important lesson I tested and learned, is that each workflow instance session is isolated from the other PowerShell session variables, $Using:variable and other things are not functional, you end up with null responses.
On the upside, each workflow instance session is isolated, if it fails, and crashes, it won't take the other parallel sessions down. Of course, there are reasonable limits to this. For example, if one task goes runaway on resource consumption and crashes the OS, your program is failed at a higher Order.

The I create a Workflow and set the workflow to call the function I created.
Workflow Get-VIPTest
  { param ([array]$Servers)
    foreach –parallel ($Server in $Servers)
    { Test-webrequest -uri $Serve }
  } #end workflow

The key feature that enables multi-threading in the workflow, is the -Parallel switch in the foreach cmdlet .


    foreach –parallel ($Server in $Servers)


I minimize the portion of the Workflow code and use the Function to do all the dirty work.
Some of the things you can't do in a workflow, such as out to write-host, have tested are ignored. I am sure in other cases but you will get failures if you go far from the limits of what workflows can accomplish. Such as Read-host.  So its best to heed the guidelines the best you can in a workflow.

I next define the list of URL's in my example to check.  Of course, this would be different for whatever systems you were testing, and what your function code might do.

$VIPS = "https://outbound.corp.net/abc/isalive", …

This very well could be imported from a CSV JSON, Text, or etc file.
I have found that the  ".Trim()" command is a great precaution on importing data, as an extra non-printing character such as a tab or space, can cause some code to throw an error because of the invisible character. (this can be quite baffling. )

Then there is the Kickoff.
Get-VIPTest $VIPS 

This kicks off the Workflow with all the URL's in a system.string List.

Get-VIPTest $VIPS |
     Select-Object Connect, Code, URL, Errormsg, Success | 
     Tee-Object -Variable VIPReport | 
     Out-host

and then in my own inside out method, I catch the output with a Tee-Object, in $VIPReort. and put the results as they come into the screen so I can watch the action. I have had some fuss with some cmdlets in the Flow holding up the output. For example, Sort-object will hold up things so it can process the whole stream to do its work.

It is kind of dull seeing no results for a large number of tasks, it's like watching a toaster...

In my previous studies, I found that the number of the parallel process was by default limited to 5 at a time. I will look for a link for that later.

This workflow approach is great for tasks that take a variety of different completion times. In a long queue of tasks,  one task could sit waiting for completion and another queue would be freed up to start another session.  This would be similar to a bank with a row of multiple tellers in a bank all handling requests, with a single feed line. As each teller becomes available, another customer could be served.
Of course, all the tasks would need to be driven by the same Function call.  You get the advantage of all of them running in parallel and the benefit of averaging the completion time across the group of them as a bonus.

In my next installment, I will go over the problem of when one of your tasks never returns for whatever reason.

Part 3

Tuesday, May 1, 2018

Parallel Powershell - Part 1

Speaking of making code go faster...


I learned at the PowerShell Summit this year from Joshua King that doing repetitive tasks or loops one way may be quite a bit slower than doing it one way vs another. I learned that functions get compiled by the PowerShell engine, and don't continue to get interpreted.  I do recommend his presentation to understand how he did his research and validation. Whip Your Scripts into Shape: Optimizing PowerShell for Speed

This I had done without realizing the performance improvement it provided. It was chosen to work better with the Workflow process.

It appears that Workflows get compiled by the interpreter, and get the same speed up.  So these options are Good methods to speed things up. Workflows give the capability to run things in parallel.  Using functions within the workflow the method I used to use the parallel functions of Workflow and made the code easier to write and less restrictive. 

But back to my original story. 
One of the things I was concerned about was, what would happen if the server I tasked to do part of the testing, didn't respond for some reason. Would my Health Check crash? return a big red error? or worse. just stop dead and hang the script with no return. 
Fortunately, I haven't had a script zombie yet doing this in the parallel version of the script. But it was a plague in the previous versions. (a zombie is a script that never stops, waiting indefinitely. )
I did make sure to incorporate timeout settings on the web calls to ensure that the computer would not wait forever for a return. There were other checks that would get hung, so I made sure to verify the server was functioning properly before I ran those.  This seems to be where my scripts get stuck. 

The problem became running tasks in parallel, was what would happen if the server was off or nonresponsive. Fortunately, the parallel requests didn't hang like the previous versions. They did the opposite, they returned nothing.

So I had to devise a method to discover which tests had not returned. So I winced and then tried for a Jimmy Neutron brain blast...  that didn't work.
I tried things and came up with a loop that would check to see if each item in the original list of servers to test was in the results.

One approach was to review the errors in the shell variable  $ERROR.  but that didn't work out in some cases.
So to get a full picture, I found that comparing the list of servers submitted with the resulting Responses.


$Missing = $Servers | ?{ $result.PSComputerName -notcontains $_ }


So this gives us the ability to create some additional entries of the response to mark the missing responses.

$MissingServers = $Missing | foreach {
    $MissingServer = @{
        "URI"               = "https://$($_):443/App/Service.svc?wsdl";
        "StatusCode"        = "Fail 00";
        "StatusDescription" = "Fail N/A";
        "Result1"           = "N/A";
        "Result2"           = "N/A s"
        }
    New-Object -TypeName PSObject -Property $MissingServer
    } #Close foreach

So you can combine you results and get a complete table with :

$Result += $MissingServers

Where $Result are the results of the sucessful test.