In which I learn what people are doing with those black terminals, and brainstorm how I too can look like a cool hacker.¶
So, I have a Windows machine that I'm rather fond of. Maybe you do too. The unfortunate thing about this situation is that Unix shell (commonly bash
) is much more common than Windows system shell in the quantitative computing universe. In the World of Windows we don't really have much of a culture around command line, and so I'm basically a giant command line noob. Once I began poking around in the data science ecosystem it became clear that this was a pretty large gap in my skill set. Not just the lack of bash
language skills but also just the lack of command line in my work flow , and I was curious what I was missing. Turns out my timing on this issue is great, because the most recent update to Windows 10 rolled out a feature that is going to let me play around with native bash
pretty seamlessly - it's called the Windows Subystem for Linux (WSL).
Before diving in, I'd like to point out that I really like the idea of my work flow being operating system agnostic. And I really really like the idea of minimizing the number of tools and languages I depend on. But at the same time I think I should have some degree of familiarity with bash
because it's everywhere, and because I'm sure some things are just easier with the right bash
one-liner or the right command line tool. So I'm trying to build a command line work flow that is mostly operating system agnostic (read: Python
) while still learning some useful bash
with this WSL.
So this post is for Windows users doing scientific computing, and who are noobs at Unix command line but who want to fix that by playing with the new WSL, but who still want to do most stuff in Python not bash
. Not counting myself that might be the empty set, I dunno. Anyway, here I'll describe what the hell bash
even is and how IPython
is actally a viable alternative console. Then I'll outline how to get your WSL up and running, and then how to get your Anaconda toolset up and running on top of it.
I'll gradually build out the specific functionality in Python that can replace a lot of what bash
is commonly used for, and I may put that into a later post.
What is the Shell Even For?¶
To kick off with Wikipedia:
Bash is a command processor that typically runs in a text window, where the user types commands that cause actions. Bash can also read commands from a file, called a script. Like all Unix shells, it supports filename globbing (wildcard matching),piping, ... variables and control structures....
Bash
and shells in general are heavily used for stuff in the realm of sys admin, like operating on files and commands such as configs or logs (see this on quora). But bash
is also very useful when most of what you're doing is communicating and piping between various specialized programs or tools (see this on SO) where it acts like a kind of glue language:
Bash
epitomises the spirit of UNIX -- combining multiple simple tools to solve a more complex problem, usually using pipes to direct the output of one command to the input of the next.
There is a large set of these specialized utilities that bash
integrates - grep
, awk
, sed
, scp
... I've heard of most of these super adorable names, though I don't know what they do. (This article summarizes some of the more useful commands and command line tools for doing data science.)
More to the point for my purposes, in contemplating using Python in place of bash
, this stackoverflow response captures the fundamental difference:
...python is a general purpose scripting language, while bash is simply a way to run a myriad of small (and often very fast) programs in a series. Python can do this, but it is not optimized for it. The programs (sort, find, uniq, scp) might do very complex tasks very simply, and bash allows these tasks to interoperate very simply with piping, flushing output in and out from files or devices etc.
For data science work rather than hardcore sys admin, I'm optimistic that I can use Python with a few modules to replace a large chunk of the work done by these microtools. I think for my purposes it's mostly going to be working with files and directories, at least to start. So onward to choosing a terminal!
IPython as System Shell¶
One of the most dominant Python terminals is IPython
and it was actually built with usage as system shell in mind! This guy was the obvious candidate for what I wanted to do, and it conveniently ships standard with the Anaconda distribution of python. IPython
is also the backend for the awesome browser-based jupyter notebooks that I am literally writing this in right now.
This SE answer outlines the desired features of any system shell:
- Interactive usability: common commands should be short, text completion
- Programming: data structures; concurrency (jobs, pipe, ...)
- System access: working with files, processes, windows, databases, system configuration
IPython
was cunningly designed with interactivity features (tab completion, command history), and system access can be provided by a handful of Python modules. As for programming functionality, it's blowing bash
out of the water because, umm, you're using Python. And since it was designed with system shell purposes in mind, IPython
actually aliases a handful of the common Bash commands (meaning you can just type them like you normally would in shell) such as cd
and ls
.
Since I'm writing this in jupyter notebook I can show you what I'm on about:
print("This is python...\nWhile this is a useful Bash alias:")
pwd
And here are all those useful IPython
aliased commands I mentioned (plus you can add your own custom ones):
%alias
Another nifty feature is that any line you preface with a !
is automatically shunted to your actual system shell for execution, meaning if you're running Ipython
on Unix, you can seamlessly switch between executing bash
and Python commands without leaving the IPython
terminal! That's if you are running on Unix... which on Windows is exactly what you are NOT doing. Which brings me to the next point.
Get Ubuntu on Windows!¶
The Linux Subsystem for Windows is a new (as in, still in Beta) feature for Windows 10 that provides a way to run Linux tools on a Windows machine. This feature defines a mapping between the Ubuntu operating system and Windows, which allows you to work with native Linux command line tools - that includes native bash
! Check out the official FAQ for WSL for more information. Following this quick guide, here is the gist of what you need to do to get it up and running:
- Do you have Windows 10? If not, get it. Else, proceed!
- From
cmd
runwinver
to see if you have at least Version 1607 of Windows 10, and if not do:- Check for updates in
Settings > Update & Security
- Install the one called "Feature update to Windows 10, version 1607".
- Snack mindlessly while updates happen forever. It feels weird when you can't use your computer.
- Check for updates in
- Go to
Settings > for Developers
and activate developer mode (this requires a restart). - Go to
Control Panel > Programs > Turn Windows Features On or Off
and turn on "Linux Subsystem for Windows (Beta)" - Run
cmd
and executelxrun /install /y
which installs Ubuntu on your subsystem - Run the new app called Bash on Ubuntu on Windows. You are now using Bash! On Windows! Is your mind blown?
In bash
execute cd /mnt/c/
to get to the Windows filesystem you know and love. As a side note, if you want to use your windows system to access the files on the WSL then you need to open cmd
and use cd %LocalAppData%\lxss
which will navigate you to the WSL files. Once you know where this is you can go there in the file explorer GUI, but keep in mind you will need to show hidden files.
Grab the Anconda Python Distro for Linux¶
Ubuntu comes with Python, so you could start using it right off the bat in your new bash
terminal. But being aspiring data scientists we want all sorts of scientific computing goodies like jupyter
notebooks. An easy way to get said goodies is to install the (most recent) Anaconda distro of Python for Linux following this blog post. In your shiny new bash
shell:
wget http://repo.continuum.io/archive/Anaconda3-4.0.0-Linux-x86_64.sh
(downloads a shell script for installation)- This initially gave an error that I didn't have permission to write. You apparently can't just
sudo
in WSL, so close and then restart bash by right-clicking on the app and choosing to "Run as Administrator".
- This initially gave an error that I didn't have permission to write. You apparently can't just
bash Anaconda3-4.0.0-Linux-x86_64.sh
(runs the above shell script)- Follow the prompts and definitely say yes to prepend Anaconda root to your PATH.
- Now restart your
Bash terminal
You just used your first native Linux utility by the way - wget
is a native Linux command line utility for pulling files down from the web!
Get Jupyter Working on Linux on Windows¶
If you are full of naive optimism then you can now try to get a notebook running on your bash
terminal by:
jupyter notebook --no-browser
(WSL doesn't support linux graphical applications, just command line for now)- Open your browser on Windows and point it to
http://localhost:8888
- Open or create a new notebook in the browser
You probably see a dead kernel in your notebook and a bunch of failed connection errors in terminal? Perfect. But, actually no, that's not how it's supposed to work. This bug is documented here and here, with two different workarounds depending on whether your jupyter
is coming via Anaconda or not. For more clarification from kickass user @aseering:
This bug is not a bug in Python. IPython (and Jupyter) Notebook are not pure Python; they contain some native code as well, and that native code links against some third-party native systemwide C libraries. The problem is a bug in one of those libraries, "libzmq". All versions of Notebook link libzmq. Because it's an independent system library, it doesn't matter what version of iPython/Jupyter you're using, nor what version of Python. But which libzmq do they link? The iPython that comes with Ubuntu uses the libzmq that comes with Ubuntu. I believe pip will try to use the system libzmq as well. My updated patch fixes Ubuntu's version of the library. Anaconda bundles their own copy of libzmq, so you have to use @jzuhone 's solution to get their fix.
The solution by @jzuhone's that works for those of us that pulled down jupyter
via Anaconda is to grab his fix by typing in our bash
terminal conda install -c jzuhone zeromq=4.1.dev0
. (Conda is the python package manager that comes in the Anaconda bundle.)
These people are seriously heroes.
Running IPython on Linux on Windows¶
Jupyter
notebooks are a great interface for certain tasks, but for daily command-line work I want to just be using the trusty old console. To enter the IPython
console environment within the bash
shell, just type IPython3
. Since we're actually running on Linux we can use the !
magic command with real lines of bash
rather than windows commands. Let us experiment within this notebook front-end:
! $PATH
! echo "Is there anybody in here?"
Super nifty! As a final exercise we'll install a command line microtool called tree
that prints out a nice heirarchical picture of directory structures - just type ! get-apt install tree
and it should install itself. (apt-get
is another native linux utility, it's the main tool for linux package management.)
Usage of tree
is pretty self-explanatory...
cd mydir
! tree
You have now unlocked the full power of Bash
... in a Python
Terminal... on Linux... on Windows. Rejoice!
Sadly people that was definitely the easy part of what I'm trying to do. The hard part is learning the python modules that can replicate some common bash
functionality, and as I mentioned that might be part of later post. I also want to customize the IPython
configuration so that when it launches it is all set up for this kind of work.