How could Windows 10 1803 be delayed

TL;DR: Some background on Windows Releases, and some speculation on why the latest build of Windows 10 Version 1803 has been delayed

BVT

My first job at Microsoft was working as a tester in the Windows NT build lab. First build 807. The job was to test Windows NT to ensure that it passed a series of automated regression tests, and met the basic functionality requirements to be sent out for broader testing within the Windows NT Product Groups. Called Build Verification Testing.

Testing things like Word, Excel, Notepad, Network Connectivity, printing, etc… Believe it or not, this was *before* Internet Explorer. so no Web browsing.

The idea is that if you could *not* perform some basic operations, then you wouldn’t want the build to get out to the larger test org, so they don’t have to waste their time on a build version that can’t even run notepad.

Windows 10

Which brings up back to Windows 10. There are a lot of testing phases involved with Windows 10. Each phase involves more people, broader testing, each phase, hopefully testing more functionality in the OS:

  • Build lab testing (BVT)
  • Microsoft internal testing
  • Fast Ring ( external to Microsoft )
  • Slow Ring
  • Semi-Annual Channel – Targeted ( Official release ) – RTM
  • Semi-Annual Channel ( Broad Deployment )

I still call the full releases “RTM” Release to Manufacturing, although for most builds, they get published to Windows Update and/or the volume licensing sites. There are still some builds that actually get put on USB Sticks, so I guess there is a factory… somewhere.

Cumulative Updates

Additionally Patches ( Cumulative updates ) also follow a Fast/Slow/Release schedule, so if you wanted to you could be deploying pre-release Cumulative updates to your test machines to get ahead of potential problems.

What’s interesting is that as of this post, the Cumulative Updates for Windows 10 version 1803 are already up to 17133.73!  Seventy Three builds since the start.

For most minor bugs that are identified after an OS is released, Microsoft has a well defined update process defined that can update and fix most issues. If you find a minor bug in notepad, then don’t release the FULL OS again, just send out the CU notepad fix via Windows Update!

Showstoppers

Not all builds make it to the next level. Sometimes the builds are just tests, and there is no need for them to continue on, they don’t have the final set of features, or have too many bugs.

But what could cause a build that nears full release to get reset like 1803? I don’t know the exact details of what is causing 17133 to have problems, but I know it’s not a minor problem. Again, minor problems can and *should* be fixed via Cumulative Updates.

Instead I speculate the problem can’t be fixed by Cumulative Update, or some other problem that prevents some machines from even installing CU’s.

That would be bad. This is my thoughts of why 17133 is delayed.

Take away

Sometimes we as IT professionals get so wrapped up in just one kind of testing that we forget to test all the environments. Perhaps we are only testing the bare-metal OS Wipe and Load process, or just testing In-Place upgrade. Really we need to test both.

(I Have a client that just had this problem, they were only testing Bare Metal Wipe and Load scenarios, and were surprised that a couple of Dell’s didn’t survive a In-Place upgrade to 1709, even though we strongly recommended In-Place upgrade testing across a wide selection of hardware types).

Additionally, add at least one Cumulative Update to your testing procedures, if you can’t service a machine after installing an OS, then you are going to have problems sometime in the future :).

-k

Advertisements

Out-Default Considered Harmful!

TL;DR: Don’t use Out-Default within a PowerShell cmdlet/function, unless you REALLY need to go to the console, otherwise use Write-Output.

Working with a client trying to narrow down a very quirky, but potentially damaging issue with Windows Update.

After spending several hours on the issue, we realized that we really didn’t have enough data, and it was suggested we programmatically search the WindowsUpdate.log, on a subset of machines, to search the presence of a specific string. If we find the string, then the machine is marked for further investigation.

New Log file format

For whatever reason, back in 2015, Microsoft decided to change the WindowsUpdate.log file format to a new format using the Event Tracing for Windows system.

See the blog here (the comments at the end of the blog are not kind).

The new system uses the Event Tracing for Windows system, and requires a convoluted Set of steps necessary to decode the data and write to a log file. Took me about an hour just to determine what the steps were to construct the command line arguments to extract a single *.etl file. In addition you must also connect to the Microsoft Symbol Servers to decode the data.

Thankfully Microsoft has included a PowerShell module and cmdlet to perform the operations… or so I thought…

WindowsUpdate PowerShell Module

Included in Windows 10 is a PowerShell module called WindowsUpdate. It’s not really that complex, the script is included, and you can see what it does:

C:\Windows\system32\WindowsPowerShell\v1.0\Modules\WindowsUpdate\WindowsUpdateLog.psm1

The cmdlet get-WindowsUpdateLog really just parses the c:\windows\Logs\WindowsUpdate\*.etl files and places all the information in a single log file, on the desktop by default. 

Honestly, I didn’t like the way the module connected to the Microsoft Symbol Servers, so I spent a while trying to figure out how to work around that, unfortunately the TraceRPT.exe tool couldn’t parse the file without the Symbols, and it frustrated me for other reasons. So I decided to use the PowerShell module as-is.

We wrote a PowerShell script and tried it out, but I noticed that the get-WindowsUpdateLog cmdlet was writing a lot data to the Console. I tried piping the output to null:

get-WindowsUpdateLog | out-null

But it didn’t work. A quick scan of the script source revealed that the author elected to write all output to Out-Default. Not Write-Host, not Write-Output, not Write-Verbose. To Out-Default

Why is that a problem?

Out-Default

Turns out that Out-Default is just a default handler for host output, not pipeline output. In the case of get-WindowsUpdate, it was just acting as a default wrapper around write-host. The background of why you would *NOT* want output from a cmdlet or script to go to the console, please Jeff Snover’s blog post on the matter: https://www.jsnover.com/blog/2013/12/07/write-host-considered-harmful/

That’s fine if we KNOW that we want the output to go to the console, but what if we want the output from a cmdlet to go to the pipeline? Well in that case get-WindowsUpdate is forcing output to the console no matter what. 

During a code review, I would have recommended using Write-Output instead, that would have redirected all output to the pipeline, allowing the out-null hack above to work.

SCCM Configuration Items and console output

The challenge is that if we elected to place this compliance script into a System Center Configuration Manager – Configuration Item script, it could lead to some undefined results.

For what ever reason, the SCCM team decided to key a PowerShell script’s success based on the console output. If it passes, the script would have called:

Write-Host "Compliant"

and have the Configuration Item search for the output “Compliant”. This is a case where we *KNOW* we want the script to write to the console. But we can’t have anything else in the script write to the console. Nothing! Otherwise it would be marked as a failure.

Personally, I would have also designed Configuration Item’s to measure pass/fail based on the process exit code directly.

The Hack

OK, Super! We have a PowerShell script that insists on writing output to the console, and a controller that get’s confused by non-deterministic console output. Sigh…

Time to write a hack. I developed a solution, and afterwards came across the same answer posted to StackExchange/SuperUser.com, so I’ll include that here.

https://superuser.com/questions/1058117/powershell-v5-suppress-out-default-output-in-nested-functions

Essentially the goal is to remove or replace the out-default cmdlet with our own function, PowerShell allows this action, I don’t usually recommend doing that, but it works in this case.

The Code

-k

 

 

A replacement for SCCM Add-CMDeviceCollectionDirectMembershipRule PowerShell cmdlet

TL;DR – The native Add-CMDeviceCollectionDirectMembershipRule PowerShell cmdlet sucks for adding more than 100 devices, use this replacement script instead.

How fast is good enough? When is the default, too slow?

I guess most of us have been spoiled with modern machines: Quad Xeon Procesors, couple hundred GB of ram, NVME cache drives, and Petabytes of storage at our command.

And don’t get me started with modern database indexing, you want to know what the average annual rainfall on the Spanish Plains are? If I don’t get 2 million responses within a half a second, I’ll be surprised, My Fair Lady.

But sometimes as a developer we need to account for actual performance, we can’t just use the default process and expect it to work in all scenarios to scale.

Background

Been working on a ConfigMgr project in an environment with a machine count well over ~300,000 devices. And we were prototyping a project that involved creating Device Collections and adding computers to the Collections using Direct Membership Rules.

Our design phase was complete, when one of our engineers mentioned that Direct Memberships are generally not optimal at scale. We figured that during the lifecycle of our project we might need to add 5000 arbitrary devices to a collection. What would happen then?

My colleague pointed to this article: http://rzander.azurewebsites.net/collection-scenarios Which discussed some of the pitfalls of Direct Memberships, but didn’t go into the details of why, or discuss what the optimal solution would be for our scenario.

I went to our NWSCUG meeting last week, and there was a knowledgeable Microsoft fella there so I asked him during Lunch. He mentioned that there were no on-going performance problems with Direct Membership collections, however there might be some performance issues when creating/adding to the collection, especially within the Console (Load up the large collection in memory, then add a single device, whew!). He recommended, of course, running our own performance analysis, to find out what worked for us.

OK, so the hard way…

The Test environment

So off to my Standard home SCCM test environment: I’m using the ever efficient Microsoft 365 Powered Device Lab Kit. It’s a bit big, 50GB, but once downloaded, I’ll have a fully functional SCCM Lab environment with a Domain Controller, MDT server, and a SCCM Server, all running within a Virtual Environment, within Seconds!

My test box is an old Intel Motherboard circa 2011, with a i7-3930k processor, 32GB of ram, and running all Virtual Machines running off a Intel 750 Series NVME SSD Drive!

First step was to create 5000 Fake computers. That was fairly easy with a CSV file and the SCCM PowerShell cmdlet Import-CMComputerInformation.  Done!

Using the native ConfigMgr PowerShell cmdlets

OK, lets write a script to create a new Direct Membership rule in ConfigMgr, and write some Device Objects to the Collection.

Unfortunately the native Add-CMDeviceCollectionDirectMembershipRule cmdlet, doesn’t support adding devices using a pipe, and won’t let us add more than one Device at a time. Gee… I wonder if *that* will affect performance. Query the Collection, add a single device, and write back to the server, for each device added. Hum….

Well the performance numbers weren’t good:

Items to add Number of Seconds to add all items
5 4.9
50 53

As you can see the number of seconds increased proportionally to the number of items added. If I wanted to add 5000 items, were talking about 5000 seconds, or an hour and a half. Um… no.

In fact a bit of decompiling of the native function in CM suggests that it’s not really designed for scale, best for adding only one device at a time.

Yuck!

The WMI way

I decided to see if we could write a functional replacement to the Add-CMDeviceCollectionDirectMembershipRule cmdlet that made WMI calls instead.

I copied some code from Kadio on http://cm12sdk.net (sorry the site is down at the moment), and tried playing around with the function.

Turns out that the SMS_Collection WMI collection has  AddMembershipRule() <Singular> and a AddMembershipRules() <multiple> function. Hey, Adding more than once one device at a time sounds… better!

<Insert several hours of coding pain here>

And finally got something that I think works pretty well:

Performance numbers look much better:

Items to add Number of Seconds to add all items
5 1.1
50 1.62
500 8.06
5000 61.65

Takes about the same amount of time to add 5000 devices using my function as it takes to add 50 devices using the native CM function. Additionally some code testing suggests that about half of the time for each group is being performed creating each rule ( the process {} block ), and the remaining half in the call to AddMembershipRules(), my guess is that should be better for our production CM environment.

Note that this isn’t just a PowerShell Function, it’s operating like a PowerShell Cmdlet. The function will accept objects from the pipeline and process them as they arrive, as quickly as Get-CMDevice can feed them through the pipeline.

However more testing continues.

-k

 

 

 

New Tool – Disk Hogs

Edit: Heavily modified script for speed. Bulk of script is now running Compiled C# Code.

Been resolving some problems at work lately with respect to full disks. One of our charters is to manage the ConfigMgr cache sizes on each machine to ensure that the packages we need to get replicated, actually get replicated out to the right machines at the right time.

But we’ve been getting some feedback about one 3rd party SCCM caching tool failing in some scenarios. Was it really the 3rd party tool failing, or some other factor?

Well we looked at the problem and found:

  • Machines with a modest 120GB SSD Drive (most machines have a more robust 250GB SSD)
  • Configuration Manager Application Install packages that are around 10-5GB (yowza!)
  • Users who leave too much… crap laying around their desktop.
  • And several other factors that have contributed to disks getting full.

Golly, when I try to install an application package that requires 12GB to install, and there is only 10GB free, it fails.

Um… yea…

I wanted to get some data for machines that are full: What is using up the disk space? But it’s a little painful searching around a disk for directories that are larger than they should be.

Options

One of my favorite tools is “WinDirStat” which produces a great graphical representation of a disk, allowing you to visualize what directories are taking up the most space, and which files are the largest.  http://windirstat.net

Additionally I also like the “du.exe” tool from SysInternals.  https://live.sysinternals.com/du.exe

I wrap it up in a custom batch script file

@%~dps0du.exe -l 1 -q -accepteula %*

and it produces output that looks like:

PS C:\Users> dudir
    263,122 C:\Users\Administrator
      1,541 C:\Users\Default
  7,473,508 C:\Users\keith
      4,173 C:\Users\Public
  7,742,345 C:\Users
Files: 27330
Directories: 5703
Size: 7,928,161,747 bytes
Size on disk: 7,913,269,465 bytes

Cool, however, I wanted something that I could run remotely, and that would give me just the most interesting directories, say everything over 1GB, or something configurable like that.

So a tool was born.

Tool

The script will enumerate through all files on a local machine and return the totals. Along the way we can add in rules to “Group” interesting directories and output the results.

So, say we want to know if there are any folders under “c:\program files (x86)\Adobe\*” that are larger than 1GB. For the most part, we don’t care about Adobe Reader, since it’s under 1GB, but everything else would be interesting. Stuff like that.

We have a default set of rules built into the script, but you can pass a new set of rules into the script using a *.csv file ( I use excel )

Folder SizeMB
c:\* 500
C:\$Recycle.Bin 100
c:\Program Files 0
C:\Program Files\* 1000
C:\Program Files (x86) 0
C:\Program Files (x86)\Adobe\* 1000
C:\Program Files (x86)\* 1000
C:\ProgramData\* 1000
C:\ProgramData 0
C:\Windows 0
C:\Windows\* 1000
c:\users 0
C:\Users\* 100
C:\Users\*\* 500
C:\Users\*\AppData\Local\Microsoft\* 1000
C:\Users\*\AppData\Local\* 400

Example output:

The machine isn’t too interesting (it’s my home machine not work machine)

I’m still looking into tweaks and other things to modify in the rules to make the output more interesting.

  • Should I exclude \windows\System32 directories under X size?
  • etc…

If you have feedback, let me know

Script

Bypass OEM Setup and install your own image.

AutoPilot

Really Windows Autopilot is the future. As soon as the OEM’s get their act together, and offer machines without the bloatware and adware. Yea, I’m talking about you Anti-Virus Trial! Go away, shoo! Shoo! Give me Signature Images, or I’ll do it myself.

Unfortunately, I’m currently working for a client that is “Cloud Adverse”, and very… particular about Security. “have our machines go through the internet, and download our apps from a cloud, oh heavens no!!”.

So all machines come from the OEM’s and into a centralized distribution center, where they run a hodge-podge of OS Imaging tools to get the machines ready to ship out to each user.

And, No they don’t use any MDT… at least not yet…

Really it’s the Anti AutoPilot…

Where to start.

Well, when the machines arrive from the OEM, they are unboxed and placed on a configuration rack. If they are Desktop Machines, they are also connected to a KVM switch (Imagine several 8-port switches daisy chained together). Then they are plugged into power, network, and turned on.

Here’s our first challenge: How do we stop the PC from booting into the OEM’s OOBE process into OUR process? Well right now the technicians need to press the magic function key press at just the right time during boot up.

You know the drill, Press F12 for Dell, or perhaps press F9 for HP, or Press enter for Lenovo. Perhaps you have a Surface Device, and need to hold down the Volume button while starting the machine. Yuck, but better than nothing…

Well, the feedback we got from the technicians is that sometimes they miss pressing the button… at “just” the right time. This is really a problem for a Desktop PC’s connected to that KVM switch. If the Monitor doesn’t sync to the new PC quickly enough, you might easily miss pressing the boot override switch.

This sounded like a good challenge to start with.

Audit Mode

Really, IT departments don’t use Audit Mode. Audit Mode is a way to make customizations *during* Windows Setup and then re-seal the OS, so the end-user gets the nice shiny Windows Setup process (Specialize and OOBE) that they expect in a new PC.

Deployments in IT are all about bypassing the shiny Windows OOBE experience. No we don’t care about all the fancy new features in Cortana, We have already signed the SA agreement with Microsoft, we already know the domain to connect to, and our company has only one locale and keyboard type. IT departments would much rather skip all that, and get the user to their machine. So the thought of re-sealing a machine and going *back* to OOBE when we just finished joining to the domain and installing apps is silly.

But there are some Possibilities here. Turns out, that when Windows Setup is running, it will look for an Unattend.xml file and try to use it.

Methods for running Windows Setup

MDT uses an Unattend.xml file on the local machine it we can skip over the settings we know about, and re-launch MDT LiteTouch when finished. What about this process? If we place the Unattend.xml file on the root of a removable USB drive, the Windows version on the hard disk will look there and use these settings. The Lab Techs appeared to have a lot of USB sticks laying around, so using them shouldn’t be a problem.

We can’t use a MDT unattend.xml file as-is, but we can use AuditMode to get to a command prompt and install our own MDT LitetouchPE_x64.wim file.

  1. Boot into Audit Mode.
  2. While in Audit Mode, auto login using the Administrator Account.
  3. Find our PowerShell script and run it!

PowerShell script

Once we are in PowerShell, we now have full access to the system, and can modify it in any we choose. In this case, I have copied a LiteTouchPE_x64.wim file to the USB Stick, and we can force the Hard Drive to boot from that instead, continuing our process in MDT LiteTouch. Yea!

Now we have a bridge between the OEM system and our LiteTouch, or any other automated WinPE disk.

Yea! Now for the *REAL* automation to begin… 🙂

-k

 

Windows 1709 In Place Upgrade Bug

Thanks to Johan Arwidmark and Dan Vega for pointing me out to this bug. It took me a while to set up a scenario where it could reproduce, but it’s a good bug. Windows 10 In-Place Upgrade is an important new feature of Windows 10, and it’s good to have MDT support it.

The Bug

When upgrading to Windows 10 Version 1709 using the built-in MDT “Standard Client Upgrade Task Sequence”, you will get an error during OS upgrade within a MDT Wizard page that shows something like:

A VBScript Runtime Error has occurred:
Error: 500 = Variable is undefined

VBScript Code:
-------------------
IsThereAtLeastOneApplicationPresent

At this point, the OS *has* been upgraded, but the Task Sequence can no longer continue.

The Background

During OS upgrades, MDT needs a way to hook into the OS Installation process when done and continue installation tasks in the New OS.

Before In-Place upgrades, MDT would perform this by adding in a custom step into the unattend.xml file. Perhaps you have seen this segment of code before:

<FirstLogonCommands>
 <SynchronousCommand wcm:action="add">
  <CommandLine>wscript.exe %SystemDrive%\LTIBootstrap.vbs</CommandLine>
  <Description>Lite Touch new OS</Description>
  <Order>1</Order>
 </SynchronousCommand>
</FirstLogonCommands>

Windows would run the LTIBootStrap.vbs script, which would call the LiteTouch.wsf script, which would find the existing Task Sequence environment and kick of the remaining steps within the “State Restore” of the Task Sequence.

For the Windows 10 In-Place upgrade process, instead of processing the unattend.xml file, they have their own method of calling back into our LiteTouch environment using a SetupComplete.cmd file. This SetupComplete.cmd is responsible for finding our LiteTouch script and calling it.

The Analysis

It took me a while to setup a repro scenario (my test lab is configured for new-computer scenarios with the Windows 10 Eval bits, which can’t be used in an In-Place upgrade scenario). But I was able to reproduce the issue, and I got the bug, and was able to get the bdd.log file for analysis.

The challenge here is that during In-Place upgrade, I can’t open a Cmd.exe window for debugging using F8 or Shift-F10. Instead I hard coded a “start cmd.exe” line into the SetupComplete.cmd file.

What I observed is that the Window being displayed was the ZTIGather.wsf progress screen. LiteTouch.wsf will kick off the ZTIGather.wsf script early in the process, and will show a customized version of the LiteTouch wizard as a progress dialog. Well clearly the wizard isn’t working properly. But after closer analysis, ZTIGather.wsf shouldn’t be running AT ALL. For some reason, LiteTouch.wsf didn’t recall that it was in the MIDDLE of a task sequence, and that it should just directly go back to the TS in progress.

MDT has two methods for storing variables. When within the SMS Stand Alone Task Sequencing Engine, MDT LiteTouch scripts will read the SMS variables through the Microsoft.SMS.TSEnvironment COM object. But if ZTIUtility.vbs can’t open the variable store, it will store variables locally in the Variables.dat file.

Finally, after getting a powershell.exe window, creating the Microsoft.SMS.TSEnvironment, and making a couple of test calls to verify the contents of the variable store, a surprise. All variables returned empty *Successfully* (which is bad) but writes caused an Exception (which is correct). Since we were not running the SMS Stand Alone Task Sequencing Engine, all calls should have caused an exception.

Why is the Microsoft.SMS.TSEnvironment COM object registered, but not working? Well, that’s still under investigation, but now we can work on a fix/work around for MDT!!!

The Fix

The fix (Not written by me), is to force the Microsoft.SMS.TSEnvironment COM object to unregister before calling LiteTouch.wsf. This can be done by a simple call at the start of the SetupComplete.cmd script:

:: Workaround for incorrectly-registered TS environment
reg delete HKCR\Microsoft.SMS.TSEnvironment /f

The issue has been acknoleged by Microsoft, and this fix is currently targeted for the next release of MDT.

<Code Removed>

Thanks!

k

Make DisMount-DiskImage work

TL;DR – DisMount-DiskImage doesn’t work the same it did in Windows Server 2012 R2, here is how to make it work in Windows 10 and Server 2016.

Dirty Boy

OK, with the release of Windows 1709, I’ve been downloading all sorts of *.ISO images from MSDN to try out the latest features, kits, and support utilities. As I develop scripts to auto mount and extract the contents, sometimes I leave the ISO images mounted, so I’ll need to clear everything off before I begin a new test run.

I developed a new powershell script function to do all of this for me: Dismount-Everything. I had the VHDX part working, but somehow the DVD part wasn’t working well.

I specifically wanted to dismount ISO images, even though I might now recall the path where they came from, even though the drive letter should be easily visible.

Blogs

I went to a couple of blog sites to find out how to dismount ISO images, and got some hits.

https://rcmtech.wordpress.com/2012/12/07/powershell-mounting-and-dismounting-iso-images-on-windows-server-2012-and-windows-8/

https://superuser.com/questions/499264/how-can-i-mount-an-iso-via-powershell-programmatically

But on the superuser site, the author suggests that you should be able to take the output from get-volume and pipe it into Get-DiskImage, but that was not working for me.

Get-Volume [Drive Letter] | Get-DiskImage | Dismount-DiskImage

No matter what, Get-DiskImage didn’t like the volume Path.

Get-Volume would output:

\\?\Volume{1f8dfd40-b7ae-11e7-bf06-9c2a70836dd4}\

but Get-DiskImage would expect:

\\.\CDROM1

Well, Windows NT will create virtual shortcuts between long path names like \\?\volume and something more readable like \\.\CDROM1, so I assumed there was an association there.

Well after testing I found out that this command didn’t work:

get-diskimage -devicepath \\?\Volume{1f8dfd40-b7ae-11e7-bf06-9c2a70836dd4}\

But this command did:

get-diskimage -devicepath \\?\Volume{1f8dfd40-b7ae-11e7-bf06-9c2a70836dd4}

Turns out that I just needed to strip out the trailing \.

Easy!

Code

Get-Volume | 
  Where-Object DriveType -eq 'CD-ROM' |
  ForEach-Object {
    Get-DiskImage -DevicePath  $_.Path.trimend('\') -EA SilentlyContinue
  } |
  Dismount-DiskImage