Erik Reinertsen     Blog     Research     Tech     Teaching     Talks

Set up data science stack on ERISOne

ERISOne is a “computing cluster with a job scheduling system for batch jobs, remote desktops for graphical applications and GPUs running on a Linux OS” maintained by Partners Healthcare.

Ideally, users do not need to purchase and maintain compute workstations. Rather, we use Macbooks to connect to a cluster behind the Partners firewall, This is ideal for working on a shared data repository, especially since we analyze PHI.

Unfortunately, setting up a modern data science stack on ERISOne is challenging. Many software packages are old, and obtaining desired memory, GPU, and CPU resources on a compute node requires additional steps.

This guide walks you through setup of Zsh, git, and more. It assumes you requested and have an ERISOne account, and connect via SSH (see RISC guide here).

Follow the steps in order. For example, you should install git before oh-my-zsh because the commands for oh-my-zsh require a recent version of git.

Install zsh shell

ERISOne comes with zsh 4.3.11, but the current version is 5.8. Users lack root access on ERISOne and cannot install zsh via yum.

Note: you may need to first install ncurses from source.

This means we have to install zsh from source:

(base) -bash-4.1$ curl -L > zsh.tar.xz
(base) -bash-4.1$ tar xf zsh.tar.xz
(base) -bash-4.1$ cd zsh
(base) -bash-4.1$ ./configure --prefix="$HOME/local" \
CPPFLAGS="-I$HOME/local/include" \
(base) -bash-4.1$ make -j && make install

Now that zsh is installed, let’s change our default logging shell. However, users on ERISOne do not have root, and thus cannot use chsh.

Add the following to .bash_profile:

export PATH=$HOME/local/bin:$PATH
export SHELL=`which zsh`
[ -f "$SHELL" ] && exec "$SHELL" -l

source ~.bash_profile to make the changes take effect. Now zsh will be activated each time you log in to ERISOne.

The next time you log in, you will be prompted to set up the zsh shell. The prompt should look different:


Delete source files from your home directory:

[ab123@eris1n3]~% rm -rf ~/zsh*

If you log out and back in, your shell should look much improved:

Load git, tmux, & vim via module

ERISOne has git 1.7.1 by default, which was released in 2010. Subsequent installation of oh-my-zsh involves a shell script that invokes -c in git clone, which was introduced in git 1.7.7. Thus, we must upgrade git.

You can install git from source, but there are many other dependencies, so doing this would take a lot of effort.

Instead we can get a sufficiently recent (albeit not most current) version of git using the load command of module:

module load git/2.17.0

Check if a more recent version of MODULE_NAME is available via module avail MODULE_NAME.

My .zshrc loads newer versions of git, tmux, and vim.

Install Anaconda

The newest version of Anaconda on ERISOne (via module) is 4.4.0-p3, whereas the current Linux version is 4.8.3. Ideally, one would use the existing Anaconda module on ERISOne and set up a new environment with the desired Python version and packages. Unfortunately, there is an error with older versions of Conda that prevents users from even creating new environments:

RemoveError: 'setuptools' is a dependency of conda and cannot be removed from conda's operating environment.

To set up the desired Python environment, install Miniconda following the instructions in my Linux setup post.

Like most Linux machines, ERISOne uses the bash shell by default. Therefore Conda will modify /.bashrc with the appropriate path for you to call conda. However, only /.bash_profile is called when you log in to ERISOne.

To ensure Conda is on your path so you can call conda, source /.bashrc in /.bash_profile:

echo 'export source ~/.bashrc' >> ~/.bash_profile

After you install Anaconda, log out, and log back in ERISOne, your command line prompt should reflect that you are now using the base environment:

(base) -bash-4.1$ 

Don’t forget to delete the Anaconda installer.

Interactive session on compute node

Request a compute node on interact-big instead of running jobs on the login nodes (eris1n2 or eris1n3):

bsub -q interact-big -Is -XF -n 4 $HOME/local/bin/zsh

Read more about compute nodes on ERISOne here.