Bash on Windows (!) with the Linux Subsystem

Posted by Sonya Sawtelle on Mon 29 August 2016

In which I learn what people are doing with those black terminals, and brainstorm how I too can look like a cool hacker.

So, I have a Windows machine that I'm rather fond of. Maybe you do too. The unfortunate thing about this situation is that Unix shell (commonly bash) is much more common than Windows system shell in the quantitative computing universe. In the World of Windows we don't really have much of a culture around command line, and so I'm basically a giant command line noob. Once I began poking around in the data science ecosystem it became clear that this was a pretty large gap in my skill set. Not just the lack of bash language skills but also just the lack of command line in my work flow , and I was curious what I was missing. Turns out my timing on this issue is great, because the most recent update to Windows 10 rolled out a feature that is going to let me play around with native bash pretty seamlessly - it's called the Windows Subystem for Linux (WSL).

Before diving in, I'd like to point out that I really like the idea of my work flow being operating system agnostic. And I really really like the idea of minimizing the number of tools and languages I depend on. But at the same time I think I should have some degree of familiarity with bash because it's everywhere, and because I'm sure some things are just easier with the right bash one-liner or the right command line tool. So I'm trying to build a command line work flow that is mostly operating system agnostic (read: Python) while still learning some useful bash with this WSL.

So this post is for Windows users doing scientific computing, and who are noobs at Unix command line but who want to fix that by playing with the new WSL, but who still want to do most stuff in Python not bash. Not counting myself that might be the empty set, I dunno. Anyway, here I'll describe what the hell bash even is and how IPython is actally a viable alternative console. Then I'll outline how to get your WSL up and running, and then how to get your Anaconda toolset up and running on top of it.

I'll gradually build out the specific functionality in Python that can replace a lot of what bash is commonly used for, and I may put that into a later post.

What is the Shell Even For?

To kick off with Wikipedia:

Bash is a command processor that typically runs in a text window, where the user types commands that cause actions. Bash can also read commands from a file, called a script. Like all Unix shells, it supports filename globbing (wildcard matching),piping, ... variables and control structures....

Bash and shells in general are heavily used for stuff in the realm of sys admin, like operating on files and commands such as configs or logs (see this on quora). But bash is also very useful when most of what you're doing is communicating and piping between various specialized programs or tools (see this on SO) where it acts like a kind of glue language:

Bash epitomises the spirit of UNIX -- combining multiple simple tools to solve a more complex problem, usually using pipes to direct the output of one command to the input of the next.

There is a large set of these specialized utilities that bash integrates - grep, awk, sed, scp... I've heard of most of these super adorable names, though I don't know what they do. (This article summarizes some of the more useful commands and command line tools for doing data science.)

More to the point for my purposes, in contemplating using Python in place of bash, this stackoverflow response captures the fundamental difference:

...python is a general purpose scripting language, while bash is simply a way to run a myriad of small (and often very fast) programs in a series. Python can do this, but it is not optimized for it. The programs (sort, find, uniq, scp) might do very complex tasks very simply, and bash allows these tasks to interoperate very simply with piping, flushing output in and out from files or devices etc.

For data science work rather than hardcore sys admin, I'm optimistic that I can use Python with a few modules to replace a large chunk of the work done by these microtools. I think for my purposes it's mostly going to be working with files and directories, at least to start. So onward to choosing a terminal!

IPython as System Shell

One of the most dominant Python terminals is IPython and it was actually built with usage as system shell in mind! This guy was the obvious candidate for what I wanted to do, and it conveniently ships standard with the Anaconda distribution of python. IPython is also the backend for the awesome browser-based jupyter notebooks that I am literally writing this in right now.

This SE answer outlines the desired features of any system shell:

  • Interactive usability: common commands should be short, text completion
  • Programming: data structures; concurrency (jobs, pipe, ...)
  • System access: working with files, processes, windows, databases, system configuration

IPython was cunningly designed with interactivity features (tab completion, command history), and system access can be provided by a handful of Python modules. As for programming functionality, it's blowing bash out of the water because, umm, you're using Python. And since it was designed with system shell purposes in mind, IPython actually aliases a handful of the common Bash commands (meaning you can just type them like you normally would in shell) such as cd and ls.

Since I'm writing this in jupyter notebook I can show you what I'm on about:

In [6]:
print("This is python...\nWhile this is a useful Bash alias:")
This is python...
While this is a useful Bash alias:
In [7]:
pwd
Out[7]:
'/mnt/c/Users/Sonya/Box Sync/Projects/web-data-science-blog/post-bash/mydir'

And here are all those useful IPython aliased commands I mentioned (plus you can add your own custom ones):

In [8]:
%alias
Total number of aliases: 12
Out[8]:
[('cat', 'cat'),
 ('cp', 'cp'),
 ('ldir', 'ls -F -o --color %l | grep /$'),
 ('lf', 'ls -F -o --color %l | grep ^-'),
 ('lk', 'ls -F -o --color %l | grep ^l'),
 ('ll', 'ls -F -o --color'),
 ('ls', 'ls -F --color'),
 ('lx', 'ls -F -o --color %l | grep ^-..x'),
 ('mkdir', 'mkdir'),
 ('mv', 'mv'),
 ('rm', 'rm'),
 ('rmdir', 'rmdir')]

Another nifty feature is that any line you preface with a ! is automatically shunted to your actual system shell for execution, meaning if you're running Ipython on Unix, you can seamlessly switch between executing bash and Python commands without leaving the IPython terminal! That's if you are running on Unix... which on Windows is exactly what you are NOT doing. Which brings me to the next point.

Get Ubuntu on Windows!

The Linux Subsystem for Windows is a new (as in, still in Beta) feature for Windows 10 that provides a way to run Linux tools on a Windows machine. This feature defines a mapping between the Ubuntu operating system and Windows, which allows you to work with native Linux command line tools - that includes native bash! Check out the official FAQ for WSL for more information. Following this quick guide, here is the gist of what you need to do to get it up and running:

  • Do you have Windows 10? If not, get it. Else, proceed!
  • From cmd run winver to see if you have at least Version 1607 of Windows 10, and if not do:
    • Check for updates in Settings > Update & Security
    • Install the one called "Feature update to Windows 10, version 1607".
    • Snack mindlessly while updates happen forever. It feels weird when you can't use your computer.
  • Go to Settings > for Developers and activate developer mode (this requires a restart).
  • Go to Control Panel > Programs > Turn Windows Features On or Off and turn on "Linux Subsystem for Windows (Beta)"
  • Run cmd and execute lxrun /install /y which installs Ubuntu on your subsystem
  • Run the new app called Bash on Ubuntu on Windows. You are now using Bash! On Windows! Is your mind blown?

In bash execute cd /mnt/c/ to get to the Windows filesystem you know and love. As a side note, if you want to use your windows system to access the files on the WSL then you need to open cmd and use cd %LocalAppData%\lxss which will navigate you to the WSL files. Once you know where this is you can go there in the file explorer GUI, but keep in mind you will need to show hidden files.

Grab the Anconda Python Distro for Linux

Ubuntu comes with Python, so you could start using it right off the bat in your new bash terminal. But being aspiring data scientists we want all sorts of scientific computing goodies like jupyter notebooks. An easy way to get said goodies is to install the (most recent) Anaconda distro of Python for Linux following this blog post. In your shiny new bash shell:

  • wget http://repo.continuum.io/archive/Anaconda3-4.0.0-Linux-x86_64.sh (downloads a shell script for installation)
    • This initially gave an error that I didn't have permission to write. You apparently can't just sudo in WSL, so close and then restart bash by right-clicking on the app and choosing to "Run as Administrator".
  • bash Anaconda3-4.0.0-Linux-x86_64.sh (runs the above shell script)
  • Follow the prompts and definitely say yes to prepend Anaconda root to your PATH.
  • Now restart your Bash terminal

You just used your first native Linux utility by the way - wget is a native Linux command line utility for pulling files down from the web!

Get Jupyter Working on Linux on Windows

If you are full of naive optimism then you can now try to get a notebook running on your bash terminal by:

  • jupyter notebook --no-browser (WSL doesn't support linux graphical applications, just command line for now)
  • Open your browser on Windows and point it to http://localhost:8888
  • Open or create a new notebook in the browser

You probably see a dead kernel in your notebook and a bunch of failed connection errors in terminal? Perfect. But, actually no, that's not how it's supposed to work. This bug is documented here and here, with two different workarounds depending on whether your jupyter is coming via Anaconda or not. For more clarification from kickass user @aseering:

This bug is not a bug in Python. IPython (and Jupyter) Notebook are not pure Python; they contain some native code as well, and that native code links against some third-party native systemwide C libraries. The problem is a bug in one of those libraries, "libzmq". All versions of Notebook link libzmq. Because it's an independent system library, it doesn't matter what version of iPython/Jupyter you're using, nor what version of Python. But which libzmq do they link? The iPython that comes with Ubuntu uses the libzmq that comes with Ubuntu. I believe pip will try to use the system libzmq as well. My updated patch fixes Ubuntu's version of the library. Anaconda bundles their own copy of libzmq, so you have to use @jzuhone 's solution to get their fix.

The solution by @jzuhone's that works for those of us that pulled down jupyter via Anaconda is to grab his fix by typing in our bash terminal conda install -c jzuhone zeromq=4.1.dev0. (Conda is the python package manager that comes in the Anaconda bundle.)

These people are seriously heroes.

Running IPython on Linux on Windows

Jupyter notebooks are a great interface for certain tasks, but for daily command-line work I want to just be using the trusty old console. To enter the IPython console environment within the bash shell, just type IPython3. Since we're actually running on Linux we can use the ! magic command with real lines of bash rather than windows commands. Let us experiment within this notebook front-end:

In [9]:
! $PATH
/bin/sh: 1: /root/anaconda3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin: not found

In [10]:
! echo "Is there anybody in here?"
Is there anybody in here?

Super nifty! As a final exercise we'll install a command line microtool called tree that prints out a nice heirarchical picture of directory structures - just type ! get-apt install tree and it should install itself. (apt-get is another native linux utility, it's the main tool for linux package management.)

Usage of tree is pretty self-explanatory...

In [20]:
cd mydir
/mnt/c/Users/Sonya/Box Sync/Projects/web-data-science-blog/post-bash/mydir
In [7]:
! tree
.

├── mydir

│   ├── subdir1

│   │   └── txt3.txt

│   ├── subdir2

│   │   ├── subsubdir

│   │   │   └── txt1.txt

│   │   └── txt2.txt

│   └── txt4.txt

├── mydir_clean_copy

│   ├── subdir1

│   │   └── txt3.txt

│   ├── subdir2

│   │   ├── subsubdir

│   │   │   └── txt1.txt

│   │   └── txt2.txt

│   └── txt4.txt

├── post-bash.ipynb

└── sys_shell.txt



8 directories, 10 files

You have now unlocked the full power of Bash... in a Python Terminal... on Linux... on Windows. Rejoice!

Sadly people that was definitely the easy part of what I'm trying to do. The hard part is learning the python modules that can replicate some common bash functionality, and as I mentioned that might be part of later post. I also want to customize the IPython configuration so that when it launches it is all set up for this kind of work.