Notes on dual-booting Linux/Windows 10 with BitLocker and Secure Boot

Boot menul

These notes are meant to help you setup a dual-booting system on a computer running Windows 10 Professional using BitLocker Device Encryption, Modern Standby (a.k.a. Fast Boot), and Secure Boot.

Linux installation is covered briefly as we will focus on preserving the Windows pre-boot UEFI environment in such a setup.


Before proceeding you should backup all important data to an external disk or your preferred online backup provider. Remember… there is a not insignificant risk of permanently breaking the Windows 10 installation in a non-recoverable fashion as you’ll be making changes to the UEFI partition in your computer.

You should also print a copy of your BitLocker recovery key as it may be needed during this process. This is not your BitLocker PIN or password, but a separate numeric key. Print this key from Control Panel: System and Security: BitLocker Drive Encryption.

Please note that the recovery key changes every time you disable and re-enable BitLocker Device Encryption. Be sure you always have several copies of the most recent recovery key, or you may loose access to your encrypted data! I’d recommend creating a script that automatically backups your key to a secure place on the cloud. A good example is this code from Ammar Hasayen.

Download and prepare Windows 10 Installation Media (e.g. a 16 GiB+ USB stick) for recovery purposes beforehand. Note that you’ll also need a separate Linux installation media.

Lastly, you should double-check that you have the latest firmware updates installed — especially your Trusted Platform Module (TPM) firmware. Vendors might not auto-update the TPM using their regular driver and firmware update utilities.


To install a second operating system — whether another copy of Windows, Linux, or something more exotic — you’ll need to free up space on your system drive. You can also use a secondary drive, but this is probably not an option for laptop users and small-form-factor devices.

You’ll need to free up at least 20 GiB for a small Linux installation. Some Linux distribution installers (at least Ubuntu and Fedora) can install themselves alongside Windows with fully guided installation options if you prepare your disk in this way.

Optionally, if your partition layout allows for it you should also grow your UEFI System Partition to about 1 GiB. Some manufacturers only ship 100 MiB partitions or smaller. As multiple operating systems will be storing their UEFI blobs (and possibly multiple versions during system upgrades), it can be beneficial in the future to have more space available on this partition. You may not be able to accomplish this without reinstalling Windows from scratch, and it’s not absolutely necessary — although it might save you some troubleshooting at a later time.

You can resize and manage your partitions with the built-in Disk Management utility in Windows. You can find it by searching for “Create and manage hard disk partitions” in Windows Search or Cortana.

If this is a new device that you’ve never stored any personal data on, I’d recommend that you first disable BitLocker Device Encryption temporarily before making changes to your drive partitions. Windows is fairly decent at self-repairing any accidental damage or problems that can occur when you manipulate your partitions on a native NTFS drive, this in contrary to a BitLocker encrypted drive. After disabling BitLocker Device Encryption from Windows Settings, you must wait some time for the decryption to complete. Then you can proceed to shrink the main drive. Both of these operations can take hours, depending on the drive. When you’ve shrunk your partition and freed up space, you can re-enable BitLocker Device Encryption. You should reboot your system and wait several hours for the process to complete before proceeding to avoid running into issues later.

If you’ve already stored personal data on the drive, you should first backup everything, leave BitLocker Device encryption enabled, and then just resize the encrypted drive and hope for the best. Don’t format or partition the freed up space afterwards, leave this to the Linux installer.


Linux installers vary a lot, so I’ll only give some general pointers on the installation process. You shouldn’t need to disable Secure Boot to install a modern Linux distribution. Refer to the documentation for your distribution for specifics. Depending on your device, you may have to boot into your installation media from the Windows Settings app: System and Updates: Recovery: Advanced Startup.

You shouldn’t select to use the entire drive. The graphical installers for Fedora and Ubuntu will automatically suggest using the space you freed up on your system drive earlier. You should always verify that the installers aren’t going to format your Windows or UEFI partitions before accepting their suggestions.

Windows 10 and Linux share the same partition for their UEFI blobs. However, you can not install multiple versions of Windows or multiple version of the same Linux distribution on the same UEFI system partition. Each will install into a folder named after the operating system, e.g. “Fedora”, “Microsoft”, and “Ubuntu”, but this naming scheme doesn’t allow for more than one version at the time. If you need to install multiple version of e.g. Windows, then you also need to create separate UEFI system partitions for each one. This will require that you disable BitLocker Device Encryption as changing the boot partition will upset the TPM.

Older versions of Windows and some Linux installers would sometimes overwrite the entire UEFI partition, but this has been a problem for some years now. You’ll want to mount the existing Windows UEFI partition without formatting it to /boot/efi on your Linux system. You should use a shared UEFI partition, even if you’re installing Linux to a secondary drive as it will give you an easier time with Secure Boot, BitLocker, and GRUB2.

OS-prober should auto-detect Windows and create a boot menu item for it alongside Linux in GRUB2. Windows Update requires multiple reboots, so you’ll want to configure GRUB to remember your last boot menu selection (GRUB_DEFAULT=saved; GRUB_SAVEDEFAULT=true). This will allow either operating system to trigger multiple reboots to perform updates and have it boot into the correct operating system. It’ll also get you back into the same operating system that you used the last time you booted your system.

You may be prompted for your BitLocker Recovery key after completing the installation.

I hope you’ll now feel a little bit more prepared to create a dual-booting system with Secure Boot and BitLocker Device Encryption. Things really should just work, but the boot process is delicate so you must take precautions in case you need to restore your system. Good luck!

Study Notes - Artificial Intelligence (AI)

These are my personal notes, broadly covering the BASICS necessary for machine learning and artificial intelligence.

Some final caveats:

  • This post may not be helpful for your purposes.
  • This is still very much a work in progress and it will be changing a lot.
  • Some content may be out of order, missing. Don’t get upset.
  • The notes are created in (GitHub flavored) markdown, so unfortunately lack snazzy interactivity.
  • Part of this material is adapted, sometimes directly copied from elsewhere. I have tried to give credit where due.

The raw notes are open sourced - should you encounter errors, have a better way of explaining something, don’t hesitate to submit a pull request.

Table of Contents

  1. What is Machine Learning?
    • 1.1 Functions
    • 1.2 Algorithms - Grouped by Learning Style
    • 1.3 Supervised v. Unsupervised
  2. Techniques
    • 2.1 Regression
    • 2.2 Classification
  3. In Practice

1. What is Machine Learning?

Machine learning provides the foundation for artificial intelligence. We train our software model using data e.g. the model learns from the training cases and then we use the trained model to make predictions for new data cases.

Let’s start with a data set that contains historical records aka observations. Every record includes numerical features (X) quantifying characteristics of the item we are working with.

There are also values we try to predict (Y). We will use training cases to train the machine learning model so that it calculates a value for (Y), from the features in (X). Simply said, we are creating a function that operates on a set of features, (X), to produce predictions, (Y): f: X → Y.

1.1 Functions

At heart, a function is the mapping from a set in a domain to a set in a codomain. A function can map a set to itself. For example, f(x) = x2, also notated f: x ↦ x2, is the mapping of all real numbers to all real numbers, or f: R → R.

The range is the subset of the codomain which the function maps to.

Functions don’t necessarily map to every value in the codomain. Where they do, the range equals the codomain.


There are 2 sorts of functions. Functions which map to R, are known as scalar-valued or real-valued. Functions which map to Rn where n > 1 are known as vector-valued.

Ref: Web: Mathworld Wolfram - Eric W. Weisstein.

1.2 Algorithms - Grouped by Learning Style

  • Supervised learning - the algorithm is given a pre-labeled training example to learn from.
  • Unsupervised learning - the algorithm is given unlabeled examples.
  • Semi-supervised learning - the algorithm uses a mix of labeled & unlabeled data.
  • Active learning - similar to semi-supervised learning, but the algorithm can “ask” for extra labeled data based on what it needs to improve on.
  • Reinforcement learning - actions are taken and rewarded, or penalized; goal is maximizing lifetime/long-term reward (or vice versa).

Ref: Book: Neural Computing: Theory and Practice (1989) - Philip D. Wasserman.

Note: Following course guidelines, we’ll discuss the two most common methods; supervised and unsupervised.

1.3 Supervised v. Unsupervised

In a supervised learning scenario, we start with observations that include known values for the variable we want to predict. We call these labels.

Because we are starting with data that includes the label we are trying to predict, we can train the model using only some data and hold the rest for evaluating our models performance.

We’ll then use a algorithm to train a model that fits features to the known label.

As we started with a known label value, we can validate the model by comparing the value predicted by the function to the actual label value that we knew. Then, when we’re happy that the model works, we can use it with new observations for which the label is unknown, and generate new predicted values.

Typical notation:

  • m = number training of examples
  • x’s = input variables or features
  • y’s = output variables or the “target” variable
  • (x(i), y(i)) = the ith training example
  • h = hypothesis i.e. the function that the algorithm learns, taking x’s as input and outputting y’s

In a unsupervised learning scenario, we don’t have any known label values in our training data set.

We’ll train the model by finding similarities between observations. Once we have trained this model, more observations are added to a cluster of observations with alike characteristics. (Cluster = Group)

2. Techniques

2.1 Regression

When we need to predict continuous valued output (i.e. a numeric value), we use a supervised learning technique called regression.

Let’s take one male. We want to model the calories burned while exercising.

First we get some pre-liminary data (age: 34, gender: 1, weight: 60, height: 165), then put him on a fitness monitor and capture additional information. Now what we do is model the calories burned using features from his exercise like his heart rate: 134, temperature: 37, and duration: 25.

In this case we know all features and have a known label value of 231 calories. So we need our algorithm to learn a function, that operates of all the males exercise features to give us a net result of 231.

  • f([34, 1, 60, 165, 134, 37, 25]) = 231

A sample of one person isn’t likely to give a function that generalizes well. So we gather the same data from a large number participants, and then train the model using the bigger set of data.

  • f([X1, X2, X3, X4, X5, X6, X7]) = Y

Now having a new function that can be used to calculate label (Y), we can finally plot the values of (Y) calculated for specific features of (X) values, on a chart:

And we can interpolate any new values of (X) to predict an unknown (Y).

As we started with data that includes the label we try to predict, we can train the model using some data and keep the rest for evaluating the models performance. Then we can use the model to predict (F) of (X) for evaluation data, and compare the predictions or scored labels to the actual labels that we know to be true.

The difference between the predicted and actual levels are called the residuals. And they can tell us something about the error level in the model.

We can measure the error in the model using root-mean-square error or (RMSE) and mean absolute error (MAE).

Both are absolute measures of error in the model. For example, having an RMSE value of 5 would mean that the standard deviation of error from our test error is 5 calories. An error of 5 calories seems to indicate a reasonably good model, but let’s suppose we are predicting how long an exercise session takes. An error of 5 hours would be a very bad model.

You might want to evaluate the model using relative metrics, to indicate a more general level of error as a relative value between 0 and 1. Relative absolute error (RAE) and relative squared error (RSE) produce metrics where the closer to 0 the error, the better the model.

Coefficient of determination (CoDR) or R squared, is another relative metric, but this time a value closer to 1 indicates a good fit for the model.

2.2 Classification

Another kind of supervised learning is called classification.

Classification is the technique that we can use to predict which class or category something belongs to. A simple variant is binary classification, where we predict whether entities belong to one of two classes (true or false).

Example, we’ll take a number of patients in a health clinic, gather some personal details e.g. age: 23, pregnancy: 1, glucose: 171, BMI: 43.5, run tests, and identify which patients are diabetic and which are not.

We could learn a function that can be applied to the patient features and give us the result 1 for patients that are diabetic:

  • f([23, 1, 171, 43.5]) = 1

and 0 for patients that aren’t.

Generally, a binary classifier is a function, that can be applied to features (X), to produce a (Y) value of 1 or 0. This function won’t actually calculate the absolute value of 1 or 0. Instead, it will calculate a value between 1 and 0: Y(f(x)), and we’ll use a threshold value to decide whether the result should be counted as a 1 or a 0.

When using this model to predict values, the resulting value is classed as 1 or 0, depending on which side of the threshold line it falls.

Because classification is a supervised learning technique, we withhold some of the test data to validate the model using known labels.

Cases where the model predicts a 1 for a test observation, while holding a label value of 1, are considered true positives.

Cases where the model predicts 0, and the actual label is 0, are true negatives.

If the model predicts 1, but the actual label is a 0, that’s a false positive.

If the model predicts 0, but the value is 1, we have a false negative.

The treshold determines how predicted values are classified. In the case of our diabetes model, having more false positives, thus reducing the amount of false negatives, will be better as more people with a risk of diabetes get identified.

The actual number of positives and negatives that are generated by your model is crucial in evaluating its effectiveness. For that purpose we use this confusion matrix e.g. our basis for calculating performance metrics for the classifier.

Ref: Web: MSXDAT262017 - edX.


9. In Practice

Before you start building your machine learning system, you should:

  • Be explicit about the problem.
  • Start with a specific question. What do you want to predict, and what tools do you have to predict it with?
  • Brainstorm possible strategies like what features might be useful or do you need to collect more data?
  • Try and find good input data.
  • Randomly split data into: training samples, testing samples and validation samples.
  • Use features of, or features built from the data, that may help with making predictions.

To start:

  • Start with a simple algorithm which can be implemented quickly.
  • Test the simple algorithm on your validation data, evaluate the results.
  • Plot learning curves to decide where things need work. As example, do you need more data, features?
  • Analysis: manually examine examples in the validation set your algorithm made errors on.

To generate a learning curve, you deliberately shrink the size of the training set and see how training and validation errors will change as you increase the size.

With smaller training sets, we expect the training error will be low because it is easier to fit to less data. As the training set size grows, your average training set error is expected to grow. Conversely, we expect the average validation error to decrease as the training set size increases.

If your training and validation error curves flatten out at a high error as set sizes increase, then you have a high bias problem. Adding more training data will not (by itself) help much.

On the other hand, high variance problems are indicated by a large gap between the training and validation error curves as training set size increases. You would see a low training error. In this case, the curves are converging and adding more training data would help.

Ref: Web: Intro to Artificial Intelligence - Udacity. - Weather Forecast Service supports console ANSI-sequences for curl, httpie or wget; HTML for web browsers; or PNG for graphical viewers.

If you would like to help translate in your language, pay them a visit.

Ian's Things of Variable Interest

I am reading Vim After 15 Years. Fzf and fzf.vim for finding files, ack.vim and ag for searching files, Vim + tmux, ALE, and a Vim Awesome community.

New book, more fun!

Python Third Edition

The Brain Problem by Robert C. Martin

The idea that our meager internet, which indirectly interconnects only a trifling few ‘HUNDREDS of MILLIONS’ of pitifully weak serial processors, could emulate the information of a single brain is absurd. Read the whole article here.