How could Windows 10 1803 be delayed

TL;DR: Some background on Windows Releases, and some speculation on why the latest build of Windows 10 Version 1803 has been delayed

BVT

My first job at Microsoft was working as a tester in the Windows NT build lab. First build 807. The job was to test Windows NT to ensure that it passed a series of automated regression tests, and met the basic functionality requirements to be sent out for broader testing within the Windows NT Product Groups. Called Build Verification Testing.

Testing things like Word, Excel, Notepad, Network Connectivity, printing, etc… Believe it or not, this was *before* Internet Explorer. so no Web browsing.

The idea is that if you could *not* perform some basic operations, then you wouldn’t want the build to get out to the larger test org, so they don’t have to waste their time on a build version that can’t even run notepad.

Windows 10

Which brings up back to Windows 10. There are a lot of testing phases involved with Windows 10. Each phase involves more people, broader testing, each phase, hopefully testing more functionality in the OS:

  • Build lab testing (BVT)
  • Microsoft internal testing
  • Fast Ring ( external to Microsoft )
  • Slow Ring
  • Semi-Annual Channel – Targeted ( Official release ) – RTM
  • Semi-Annual Channel ( Broad Deployment )

I still call the full releases “RTM” Release to Manufacturing, although for most builds, they get published to Windows Update and/or the volume licensing sites. There are still some builds that actually get put on USB Sticks, so I guess there is a factory… somewhere.

Cumulative Updates

Additionally Patches ( Cumulative updates ) also follow a Fast/Slow/Release schedule, so if you wanted to you could be deploying pre-release Cumulative updates to your test machines to get ahead of potential problems.

What’s interesting is that as of this post, the Cumulative Updates for Windows 10 version 1803 are already up to 17133.73!  Seventy Three builds since the start.

For most minor bugs that are identified after an OS is released, Microsoft has a well defined update process defined that can update and fix most issues. If you find a minor bug in notepad, then don’t release the FULL OS again, just send out the CU notepad fix via Windows Update!

Showstoppers

Not all builds make it to the next level. Sometimes the builds are just tests, and there is no need for them to continue on, they don’t have the final set of features, or have too many bugs.

But what could cause a build that nears full release to get reset like 1803? I don’t know the exact details of what is causing 17133 to have problems, but I know it’s not a minor problem. Again, minor problems can and *should* be fixed via Cumulative Updates.

Instead I speculate the problem can’t be fixed by Cumulative Update, or some other problem that prevents some machines from even installing CU’s.

That would be bad. This is my thoughts of why 17133 is delayed.

Take away

Sometimes we as IT professionals get so wrapped up in just one kind of testing that we forget to test all the environments. Perhaps we are only testing the bare-metal OS Wipe and Load process, or just testing In-Place upgrade. Really we need to test both.

(I Have a client that just had this problem, they were only testing Bare Metal Wipe and Load scenarios, and were surprised that a couple of Dell’s didn’t survive a In-Place upgrade to 1709, even though we strongly recommended In-Place upgrade testing across a wide selection of hardware types).

Additionally, add at least one Cumulative Update to your testing procedures, if you can’t service a machine after installing an OS, then you are going to have problems sometime in the future :).

-k

Advertisements

Out-Default Considered Harmful!

TL;DR: Don’t use Out-Default within a PowerShell cmdlet/function, unless you REALLY need to go to the console, otherwise use Write-Output.

Working with a client trying to narrow down a very quirky, but potentially damaging issue with Windows Update.

After spending several hours on the issue, we realized that we really didn’t have enough data, and it was suggested we programmatically search the WindowsUpdate.log, on a subset of machines, to search the presence of a specific string. If we find the string, then the machine is marked for further investigation.

New Log file format

For whatever reason, back in 2015, Microsoft decided to change the WindowsUpdate.log file format to a new format using the Event Tracing for Windows system.

See the blog here (the comments at the end of the blog are not kind).

The new system uses the Event Tracing for Windows system, and requires a convoluted Set of steps necessary to decode the data and write to a log file. Took me about an hour just to determine what the steps were to construct the command line arguments to extract a single *.etl file. In addition you must also connect to the Microsoft Symbol Servers to decode the data.

Thankfully Microsoft has included a PowerShell module and cmdlet to perform the operations… or so I thought…

WindowsUpdate PowerShell Module

Included in Windows 10 is a PowerShell module called WindowsUpdate. It’s not really that complex, the script is included, and you can see what it does:

C:\Windows\system32\WindowsPowerShell\v1.0\Modules\WindowsUpdate\WindowsUpdateLog.psm1

The cmdlet get-WindowsUpdateLog really just parses the c:\windows\Logs\WindowsUpdate\*.etl files and places all the information in a single log file, on the desktop by default. 

Honestly, I didn’t like the way the module connected to the Microsoft Symbol Servers, so I spent a while trying to figure out how to work around that, unfortunately the TraceRPT.exe tool couldn’t parse the file without the Symbols, and it frustrated me for other reasons. So I decided to use the PowerShell module as-is.

We wrote a PowerShell script and tried it out, but I noticed that the get-WindowsUpdateLog cmdlet was writing a lot data to the Console. I tried piping the output to null:

get-WindowsUpdateLog | out-null

But it didn’t work. A quick scan of the script source revealed that the author elected to write all output to Out-Default. Not Write-Host, not Write-Output, not Write-Verbose. To Out-Default

Why is that a problem?

Out-Default

Turns out that Out-Default is just a default handler for host output, not pipeline output. In the case of get-WindowsUpdate, it was just acting as a default wrapper around write-host. The background of why you would *NOT* want output from a cmdlet or script to go to the console, please Jeff Snover’s blog post on the matter: https://www.jsnover.com/blog/2013/12/07/write-host-considered-harmful/

That’s fine if we KNOW that we want the output to go to the console, but what if we want the output from a cmdlet to go to the pipeline? Well in that case get-WindowsUpdate is forcing output to the console no matter what. 

During a code review, I would have recommended using Write-Output instead, that would have redirected all output to the pipeline, allowing the out-null hack above to work.

SCCM Configuration Items and console output

The challenge is that if we elected to place this compliance script into a System Center Configuration Manager – Configuration Item script, it could lead to some undefined results.

For what ever reason, the SCCM team decided to key a PowerShell script’s success based on the console output. If it passes, the script would have called:

Write-Host "Compliant"

and have the Configuration Item search for the output “Compliant”. This is a case where we *KNOW* we want the script to write to the console. But we can’t have anything else in the script write to the console. Nothing! Otherwise it would be marked as a failure.

Personally, I would have also designed Configuration Item’s to measure pass/fail based on the process exit code directly.

The Hack

OK, Super! We have a PowerShell script that insists on writing output to the console, and a controller that get’s confused by non-deterministic console output. Sigh…

Time to write a hack. I developed a solution, and afterwards came across the same answer posted to StackExchange/SuperUser.com, so I’ll include that here.

https://superuser.com/questions/1058117/powershell-v5-suppress-out-default-output-in-nested-functions

Essentially the goal is to remove or replace the out-default cmdlet with our own function, PowerShell allows this action, I don’t usually recommend doing that, but it works in this case.

The Code

-k

 

 

Dell XPS 13 9360 Hardware Reset

TL;DR – If you are having some spontaneous errors start your laptop try disconnecting your batteries for an hour, and try again.

XPS 13 9360

Got a new laptop last month, it was time to replace the old one. Did some searching online and found something light, powerful, and at a good price. Dell XPS 13 9360:

  •  8th Gen Intel® Core i7-8550U  (*Quad Core*)
  • 512GB PCIe Solid Drive Drive (*NVMe Drive*)
  • 16GB LPDDR3 1866MHz RAM
  • 1x Thunderbolt port
  • 13.3″ Touchscreen InfinityEdge QHD+ (3200 x 1800) Display

imageService.jpg

On sale at Costco for $1400. Overall a good value for a quad core laptop with NVMe.

The Break

Came back from a meeting (Starbucks? :)) Friday and the machine failed to boot. Got some display errors, rebooted, but got the recovery screen. So I shutdown for a while, when I rebooted, nothing. No Screen nothing.

However I did notice that the LED on the front was blinking, and I was able to catch the pattern, 2 and 7. Looking up in the service manual:

Capture.PNG

LCD Error!?!?! Crap.

A call to Dell Support confirmed the error, and a RMA ticket was generated, it could be two weeks before I get it back.

Battery

I wanted to archive the contents of the Disk before I sent it off to dell, so I got out my Torx screw driver.

But while I had the case open I disconnected the main Battery and the CMOS battery.

With most modern PC’s each of the components have small computers built in them. If they develop errors, do they reboot like the main OS when the power is off? If the battery is always connected that might not be true.  I had a similar problems recently with my SuperMicro test box, where flashing the BIOS wasn’t helping to resolve a complex problem I had with the box. Draining the CMOS battery and re-flashing the BIOS did work!

After an hour, I plugged in the batteries, and tried booting again. Yea, the machine works! It’s alive! I don’t have to send my machine in for repair.

Hopefully the machine will work a little bit longer than 45 days. We’ll know soon.

 

A replacement for SCCM Add-CMDeviceCollectionDirectMembershipRule PowerShell cmdlet

TL;DR – The native Add-CMDeviceCollectionDirectMembershipRule PowerShell cmdlet sucks for adding more than 100 devices, use this replacement script instead.

How fast is good enough? When is the default, too slow?

I guess most of us have been spoiled with modern machines: Quad Xeon Procesors, couple hundred GB of ram, NVME cache drives, and Petabytes of storage at our command.

And don’t get me started with modern database indexing, you want to know what the average annual rainfall on the Spanish Plains are? If I don’t get 2 million responses within a half a second, I’ll be surprised, My Fair Lady.

But sometimes as a developer we need to account for actual performance, we can’t just use the default process and expect it to work in all scenarios to scale.

Background

Been working on a ConfigMgr project in an environment with a machine count well over ~300,000 devices. And we were prototyping a project that involved creating Device Collections and adding computers to the Collections using Direct Membership Rules.

Our design phase was complete, when one of our engineers mentioned that Direct Memberships are generally not optimal at scale. We figured that during the lifecycle of our project we might need to add 5000 arbitrary devices to a collection. What would happen then?

My colleague pointed to this article: http://rzander.azurewebsites.net/collection-scenarios Which discussed some of the pitfalls of Direct Memberships, but didn’t go into the details of why, or discuss what the optimal solution would be for our scenario.

I went to our NWSCUG meeting last week, and there was a knowledgeable Microsoft fella there so I asked him during Lunch. He mentioned that there were no on-going performance problems with Direct Membership collections, however there might be some performance issues when creating/adding to the collection, especially within the Console (Load up the large collection in memory, then add a single device, whew!). He recommended, of course, running our own performance analysis, to find out what worked for us.

OK, so the hard way…

The Test environment

So off to my Standard home SCCM test environment: I’m using the ever efficient Microsoft 365 Powered Device Lab Kit. It’s a bit big, 50GB, but once downloaded, I’ll have a fully functional SCCM Lab environment with a Domain Controller, MDT server, and a SCCM Server, all running within a Virtual Environment, within Seconds!

My test box is an old Intel Motherboard circa 2011, with a i7-3930k processor, 32GB of ram, and running all Virtual Machines running off a Intel 750 Series NVME SSD Drive!

First step was to create 5000 Fake computers. That was fairly easy with a CSV file and the SCCM PowerShell cmdlet Import-CMComputerInformation.  Done!

Using the native ConfigMgr PowerShell cmdlets

OK, lets write a script to create a new Direct Membership rule in ConfigMgr, and write some Device Objects to the Collection.

Unfortunately the native Add-CMDeviceCollectionDirectMembershipRule cmdlet, doesn’t support adding devices using a pipe, and won’t let us add more than one Device at a time. Gee… I wonder if *that* will affect performance. Query the Collection, add a single device, and write back to the server, for each device added. Hum….

Well the performance numbers weren’t good:

Items to add Number of Seconds to add all items
5 4.9
50 53

As you can see the number of seconds increased proportionally to the number of items added. If I wanted to add 5000 items, were talking about 5000 seconds, or an hour and a half. Um… no.

In fact a bit of decompiling of the native function in CM suggests that it’s not really designed for scale, best for adding only one device at a time.

Yuck!

The WMI way

I decided to see if we could write a functional replacement to the Add-CMDeviceCollectionDirectMembershipRule cmdlet that made WMI calls instead.

I copied some code from Kadio on http://cm12sdk.net (sorry the site is down at the moment), and tried playing around with the function.

Turns out that the SMS_Collection WMI collection has  AddMembershipRule() <Singular> and a AddMembershipRules() <multiple> function. Hey, Adding more than once one device at a time sounds… better!

<Insert several hours of coding pain here>

And finally got something that I think works pretty well:

Performance numbers look much better:

Items to add Number of Seconds to add all items
5 1.1
50 1.62
500 8.06
5000 61.65

Takes about the same amount of time to add 5000 devices using my function as it takes to add 50 devices using the native CM function. Additionally some code testing suggests that about half of the time for each group is being performed creating each rule ( the process {} block ), and the remaining half in the call to AddMembershipRules(), my guess is that should be better for our production CM environment.

Note that this isn’t just a PowerShell Function, it’s operating like a PowerShell Cmdlet. The function will accept objects from the pipeline and process them as they arrive, as quickly as Get-CMDevice can feed them through the pipeline.

However more testing continues.

-k

 

 

 

New Tool – Disk Hogs

Edit: Heavily modified script for speed. Bulk of script is now running Compiled C# Code.

Been resolving some problems at work lately with respect to full disks. One of our charters is to manage the ConfigMgr cache sizes on each machine to ensure that the packages we need to get replicated, actually get replicated out to the right machines at the right time.

But we’ve been getting some feedback about one 3rd party SCCM caching tool failing in some scenarios. Was it really the 3rd party tool failing, or some other factor?

Well we looked at the problem and found:

  • Machines with a modest 120GB SSD Drive (most machines have a more robust 250GB SSD)
  • Configuration Manager Application Install packages that are around 10-5GB (yowza!)
  • Users who leave too much… crap laying around their desktop.
  • And several other factors that have contributed to disks getting full.

Golly, when I try to install an application package that requires 12GB to install, and there is only 10GB free, it fails.

Um… yea…

I wanted to get some data for machines that are full: What is using up the disk space? But it’s a little painful searching around a disk for directories that are larger than they should be.

Options

One of my favorite tools is “WinDirStat” which produces a great graphical representation of a disk, allowing you to visualize what directories are taking up the most space, and which files are the largest.  http://windirstat.net

Additionally I also like the “du.exe” tool from SysInternals.  https://live.sysinternals.com/du.exe

I wrap it up in a custom batch script file

@%~dps0du.exe -l 1 -q -accepteula %*

and it produces output that looks like:

PS C:\Users> dudir
    263,122 C:\Users\Administrator
      1,541 C:\Users\Default
  7,473,508 C:\Users\keith
      4,173 C:\Users\Public
  7,742,345 C:\Users
Files: 27330
Directories: 5703
Size: 7,928,161,747 bytes
Size on disk: 7,913,269,465 bytes

Cool, however, I wanted something that I could run remotely, and that would give me just the most interesting directories, say everything over 1GB, or something configurable like that.

So a tool was born.

Tool

The script will enumerate through all files on a local machine and return the totals. Along the way we can add in rules to “Group” interesting directories and output the results.

So, say we want to know if there are any folders under “c:\program files (x86)\Adobe\*” that are larger than 1GB. For the most part, we don’t care about Adobe Reader, since it’s under 1GB, but everything else would be interesting. Stuff like that.

We have a default set of rules built into the script, but you can pass a new set of rules into the script using a *.csv file ( I use excel )

Folder SizeMB
c:\* 500
C:\$Recycle.Bin 100
c:\Program Files 0
C:\Program Files\* 1000
C:\Program Files (x86) 0
C:\Program Files (x86)\Adobe\* 1000
C:\Program Files (x86)\* 1000
C:\ProgramData\* 1000
C:\ProgramData 0
C:\Windows 0
C:\Windows\* 1000
c:\users 0
C:\Users\* 100
C:\Users\*\* 500
C:\Users\*\AppData\Local\Microsoft\* 1000
C:\Users\*\AppData\Local\* 400

Example output:

The machine isn’t too interesting (it’s my home machine not work machine)

I’m still looking into tweaks and other things to modify in the rules to make the output more interesting.

  • Should I exclude \windows\System32 directories under X size?
  • etc…

If you have feedback, let me know

Script

Silence is Golden during Setup

Thanks to @gwblok for pointing me to this twitter thread about Windows OOBE Setup.

When Unattended is not Silent

During Windows 10 OOBE, the Windows Welcome process uses the Cortana voice engine to speak during Windows Setup.

Now we can go look for any updates

Shut up!

Yes, I’m one of those guys who sets my Sound Profile to “silent”, Silence is Golden!

And if I’m going to be running several Windows Deployments in my lab (read my home office), then I would prefer the machine to be silent. Reminds me of the XP/Vista days when we had boot up sounds. How rude.

So how to disable… Well the answer doesn’t appear to be that straight forwards.

SkipMachineOOBE

At first I suggested SkipMachineOOBE, and works on my test machine! Yea!

Then I got a reminder that SkipMachineOOBE is deprecated according to documentation.

DisableVoice

Thanks to @Jarwidmark for pointing me in the thread above to:

reg.exe add HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\OOBE /v DisableVoice /t REG_DWORD /d 1

However, Microsoft Documentation also states that you should only use this for testing, and that Cortana Voice should be-enabled for users. OK… Fine, we’ll delete the key after setup is complete.

So where to place all this stuff?

Specialize

Several people suggested modifying the local registry within the imaging process, but I would prefer to avoid that, instead trying to see if we can perform the action during Setup using our unattend.xml file.

The command to disable would need to be *before* “OOBE”, sounds like the perfect job for the “Specialize” process.

Some quick testing, verified, and we are ready to go.

Automating OOBE

So, given the guidance from Microsoft on how to automate Windows 10:

https://docs.microsoft.com/en-us/windows-hardware/manufacture/desktop/settings-for-automating-oobe

Here are my changes:

  • We disable Cortana during the Specialize Pass before OOBE.
  • Then during OOBE, we clear the Cortana setting, and continue.

 

Bypass OEM Setup and install your own image.

AutoPilot

Really Windows Autopilot is the future. As soon as the OEM’s get their act together, and offer machines without the bloatware and adware. Yea, I’m talking about you Anti-Virus Trial! Go away, shoo! Shoo! Give me Signature Images, or I’ll do it myself.

Unfortunately, I’m currently working for a client that is “Cloud Adverse”, and very… particular about Security. “have our machines go through the internet, and download our apps from a cloud, oh heavens no!!”.

So all machines come from the OEM’s and into a centralized distribution center, where they run a hodge-podge of OS Imaging tools to get the machines ready to ship out to each user.

And, No they don’t use any MDT… at least not yet…

Really it’s the Anti AutoPilot…

Where to start.

Well, when the machines arrive from the OEM, they are unboxed and placed on a configuration rack. If they are Desktop Machines, they are also connected to a KVM switch (Imagine several 8-port switches daisy chained together). Then they are plugged into power, network, and turned on.

Here’s our first challenge: How do we stop the PC from booting into the OEM’s OOBE process into OUR process? Well right now the technicians need to press the magic function key press at just the right time during boot up.

You know the drill, Press F12 for Dell, or perhaps press F9 for HP, or Press enter for Lenovo. Perhaps you have a Surface Device, and need to hold down the Volume button while starting the machine. Yuck, but better than nothing…

Well, the feedback we got from the technicians is that sometimes they miss pressing the button… at “just” the right time. This is really a problem for a Desktop PC’s connected to that KVM switch. If the Monitor doesn’t sync to the new PC quickly enough, you might easily miss pressing the boot override switch.

This sounded like a good challenge to start with.

Audit Mode

Really, IT departments don’t use Audit Mode. Audit Mode is a way to make customizations *during* Windows Setup and then re-seal the OS, so the end-user gets the nice shiny Windows Setup process (Specialize and OOBE) that they expect in a new PC.

Deployments in IT are all about bypassing the shiny Windows OOBE experience. No we don’t care about all the fancy new features in Cortana, We have already signed the SA agreement with Microsoft, we already know the domain to connect to, and our company has only one locale and keyboard type. IT departments would much rather skip all that, and get the user to their machine. So the thought of re-sealing a machine and going *back* to OOBE when we just finished joining to the domain and installing apps is silly.

But there are some Possibilities here. Turns out, that when Windows Setup is running, it will look for an Unattend.xml file and try to use it.

Methods for running Windows Setup

MDT uses an Unattend.xml file on the local machine it we can skip over the settings we know about, and re-launch MDT LiteTouch when finished. What about this process? If we place the Unattend.xml file on the root of a removable USB drive, the Windows version on the hard disk will look there and use these settings. The Lab Techs appeared to have a lot of USB sticks laying around, so using them shouldn’t be a problem.

We can’t use a MDT unattend.xml file as-is, but we can use AuditMode to get to a command prompt and install our own MDT LitetouchPE_x64.wim file.

  1. Boot into Audit Mode.
  2. While in Audit Mode, auto login using the Administrator Account.
  3. Find our PowerShell script and run it!

PowerShell script

Once we are in PowerShell, we now have full access to the system, and can modify it in any we choose. In this case, I have copied a LiteTouchPE_x64.wim file to the USB Stick, and we can force the Hard Drive to boot from that instead, continuing our process in MDT LiteTouch. Yea!

Now we have a bridge between the OEM system and our LiteTouch, or any other automated WinPE disk.

Yea! Now for the *REAL* automation to begin… 🙂

-k