In which the simplest possible web app workflow proves almost-but-not-quite too challenging.¶
This is part 3 of a series of posts about a recommender system deployed as a web app. In part 1 I scrape and fetch all the data that my app will rely on. In part 2 I build out the actual recommender algorithm from my scraped data.
Boardgamegeek.com (BGG) is an online community for board game enthusiasts that provides social network features as well as a huge database of board games with ratings, comments, statistics, and other features of each game. Previously I wrote some code to collect data on user ratings of games in the BGG database, and then more code to take those ratings and calculate similarities between users in order to recommend new "buddies". The next step was to take all that code and deploy it as a web app!
The problem was that I had literally zero experience with web apps or interactive visualizations, and only noob-level experience with HTML and CSS. But it turns out the web2py
web app framework and the kick ass javascript library D3.js were more than a match for my staggering ignorance. This post summarizes the simple setup and workflow to build this thing and take it live.
Tools Covered:¶
web2py
pure-python framework for developing web applicationspythonanywhere
web hosting service that integrates tightly withweb2py
virtualenv
andconda
environments- using
git
to deploy from your machine to pythonanywhere servers
What is a Web App? (in approximately 30 seconds)¶
The internet is a series of tubes. Wait that's not it. The internet is the huge set of networked computers and a connection on the internet takes place between a "client" computer that wants some file or information (like you, wanting to view a web page), and a "server" computer that has it.
Every computer on the internet is addressed by a unique IP number, and domain names, like "www.wikipedia.com", just map IP addresses to a more human-readable format. When you point your browser to a URL your computer sends out a packet of information containing that URL to your Internet Service Provider (ISP). Your ISP wants to route this packet to the physical computer associated with the domain name in the URL. To do this is has to check with the Domain Name Service (DNS) which maintains a catalogue of which domain names are associated with which physical IP addresses. When your packet arrives safely at the correct server, the server looks at the other parts of the URL (tacked on to the domain name) in order to figure out what it's supposed to send back to you. Anyone can rent available domain names from companies who manage them (domain registrars), and then let the DNS know what physical server (IP addres) should be associated with that domain for routing.
You can think of a web app as a controlling script for a server, which tells it what to do and how to reply, when it receives various URLs. So according to the above, to deploy a web app you need:
- A server that has instructions for sending back various stuff depending on what URL it receives
- A domain name that you legally rented, and which you have told the DNS to associate with the above server
Now for some terminology: a web framework is some software that makes it easier for you to tell a server how it should behave upon receiving different URLs, and to set up all the necessary files for having the server do various things. A web hosting service is a company that owns a bunch of servers and will let you put your instructions on one of them. You can buy your own domain and associate it with this company's server, or often the company will provide a domain if you want. If you are a python user then the simplest solution to deploying a web app is the combo web2py
framework + pythonanywhere
host. Read on...
Get Web2py Running¶
You have the option of either downloading a bundle for your operating system that includes a web2py.exe
executable here, or pulling the source directly from git which will instead include web2py.py
. If you use the executable bundle it will include it's own python interpreter and will not have access to your local installation of python, including installed packages! That means, when using the executable, to import a python module in any of your web app files you will need to first copy that package folder from wherever it lives in your filesystem into /web2py/site-packages. Since this was kind of a deal breaker for me we're going the github route. (For the clone make sure to use the --recursive
switch to also grab all the submodule dependencies.)
git clone https://github.com/web2py/web2py.git --recursive
In the folder you just cloned you should see a web2py.py
python file. web2py
doesn't need to install anything on your computer, instead you just run web2py.py
to start running a server locally on your computer so that you can build and test your app without having to actually deploy it. Each time you run this module you have to give it a password - just use whatever you feel like. From command line do
cd web2py
python web2py.py --ip=127.0.0.1 --port=8000 --password="password"
Now whenever you point your browser to the "domain" http://127.0.0.1:8000/ your browser will be communicating with this local server! Check it out - by default it is serving out the "welcome" example that game with the git repo.
You can make a new app just by making a new folder in /web2py/applications/
. Name your app folder "init" so that web2py
will serve it out by default instead of the "welcome" example. The details of what should go in this folder can be had from the web2py documentation, but a good place to start is just copying the contents of their example app /applications/welcome
. Side note, apparently the name of your web app folder can't have dashes in it? Now try pointing your browser to http://localhost:8000/init/default/index
. You can go find this specifc index.html
page in the "Views" folder of your app: /applications/init/views/default/index.html
- try making some changes to it and refreshing the page!
The code that tells the server what HTML file it should return when a URL is visited is located in the "Controllers" folder of your app at /applications/init/controllers/default.py
. You can open this up in your favorite text editor or python IDE. Exactly how this script defines the mapping between the URL received by the server and the resource it should send back can be had at the web2py
documentation.
You can actually edit all your web app files in web2py's nice browser interface, which also has a quick button for creating a new app. Just point your browser to http://localhost:8000/admin/default/index and log in using the password you chose when you started your local server.
Sidenote that I don't really understand: sometimes after making changes to controller python files you will need to recompile the app for those changes to take effect:
import gluon.compileapp
gluon.compileapp.remove_compiled_application("applications/my_new_app")
gluon.compileapp.compile_application("applications/my_new_app")
Git Tracking Details¶
Remember how your web2py came from a cloned git repo? Probably you want to maintain that link so that you can easily pull new versions when they update web2py. On the other hand, you probably want your app folders that live in /web2py/applications to exist as their own git repos so that you can easily push them to github, then pull them down on the server where they will be hosted. There is a git tool called submodules for dealing with situations of independently tracking a repo that lives inside a larger git repo but its probably not necessary here since you'll never be pushing the web2py stuff back to github. From this google groups discussion you will probably be fine ignoring the fact that your apps live inside a larger repo. So go ahead and create a new repo on your github and then:
cd /applications/init
git init
git remote add origin <https to your github repo>
git push -u origin master
One thing that you should do here is add a .gitignore
to this folder that will prevent tracking some run-time generated stuff (see here). Try something like the following template:
*.pyc
cache/
databases/
errors/
sessions/
private/
uploads/
.hg/
Make a Python 2.7 Environment for Web2py¶
A virtual environment is a fancy name for a folder holding specific versions of python and packages. When you work on a project it's nice to have all of it's dependencies frozen in time so that you can always recreate it on another machine, and so that it won't break the first time something upates :) The standard way of managing virtual environments in Python is with the package manager pip
and a 3rd party tool virtualenv
(in Python 3 there is built-in support). You make a virtual environment by indicating which version of python should be copied into your new directory (or specify the path from which to grab python) like virtualenv -p /path/to/python /path/to/new/virtualenv
(it will also install the pip package into the environment by default). After this you can enter the virtual environment by running the activate script inside the environment like source /path/to/new/virtualenv/bin/activate
. Inside the environment you then use pip
like normal to add packages, but they will be installed into your new virtual environment directory. Being inside a virtual environment is just like telling your OS that your python executable and your PATH (where it looks for modules) have moved to the new virtual environment directory. To leave a virtual environment you just deactivate myenv
at command line. Finally, the super popular 3rd party tool virtualenvwrapper
provides, unsurprisingly, a wrapper around virtualenv
for some easier syntax.
Conda Environments Instead of Virtualenv¶
Conda, the package management solution that ships with the Anaconda distribution of Python has it's own approach for virtual environments that doesn't use virtualenv
tool. To make a conda environment with Python 2.7 and all the standard Anaconda packages with it you do conda create -n myapp python=2.7 anaconda
(notice the anaconda
at the end). To make a bare virtual environment just with Python 2.7 just do conda create -n myapp python=2.7
. Conda by default puts your virtual environment folders inside /Anaconda3/envs
.
Since web2py
doesn't support Python 3, we need to make a Python 2.7 virtual environment to work in. First move to the folder for your app ("myapp") inside web2py/applications/init
. Create and then enter a new virtual environment for the app with
conda create -n myenv python=2.7
actvate myenv
Now start installing the packages you need with conda! When you're done it's nice to freeze all the package dependencies of the environment into a text "spec-file." which can be used by conda to recreate the environment exactly - do this by echoing the output of conda list
into a file. For people using virtualenv
the corresponding convention is to make a "requirements.txt" file which you can do with the output of the pip command pip freeze
inside our environment. Note that inside your virtual environment conda will install the most appropriate version of each package based on the version of python your enivronment is running. For instance, if your new app just needs Beautiful Soup do
conda install beautiful-soup
conda list --explicit > spec-file.txt
conda env export > myenv.yml # Another file that conda can use to recreate the environment
pip freeze > requirements.txt
deactivate
So whenever you want to run web2py and work on your app just make sure you first activate the virtual environment for it.
Resources on Environments¶
- A guide to virtualenv with pip in python 2.7
- A general overview of virtual environments in python
- Documentation on conda environments
- Short stackoverflow answer and a longer official article about how conda environments are different (better) than virtualenv.
Hosting Your App on PythonAnywhere¶
Create a (free!) pythonanywhere account, log in to your account and find the "Web" tab. Follow the steps to create a new app with their wizard, choosing the web2py
option which will cause a fresh download of the web2py
source into your home directory on the pythonanywhere servers. Once your app has been created it will have its own page in the "Web" tab - make sure the python version here is set to 2.7. Go to the "consoles" tab and start a bash console (this is a shell that is actually running on the pythonanywhere servers, and you're being given a browser interface to it... cool!). Navigate into the web2py
folder that was just created, it should look super familiar since you just pulled the exact same thing onto your own PC from git! Of course your web app isn't there yet, so we need to do:
cd web2py/applications
git clone <https to your github repo>
Now if you go to yourname.pythonanywhere.com it should server out your app by default! If you are running into issues with some of the modules that your app uses then you can recreate your virtual environment on the pythonanywhere server and then on the page for your app in the "Web" tab you can specify the location of this virtual environment that should be used for the app. See the documentation for using virtual environments on pythonanywhere.