UPDATED: Google Cloud AI Platform Notebooks and Cloud Source Repositories

[Update 1/19/21: Added section “Automating Notebook Checkout at Instance Startup”]
[Note: I work at Google Cloud.]

Overview

As part of an effort at “New Years Cleaning”, I decided I should organize my Jupyter notebooks in git. If you’re like me, you have multiple Jupyter notebooks running on multiple different managed Jupyterlab servers under Google Cloud’s AI Platform Notebooks. While it’s really easy to turn off a server until it’s needed again, I find reasons to spin up new instances, which leads to random notebooks scattered around. I endlessly postpone Deleting the underlying VM because I don’t want to lose whatever work I left off with.

My cadre of notebook instances prior to cleanup

Enter Cloud Source Repositories! With Cloud Source Repositories, I can create a git repository to store and version my Jupyter notebooks. I can push new and updated versions of my notebooks back to the repository and tear down my infrastructure without fear of losing my work. Then, the next time I launch an AI Platform Notebook server, I can clone the repository and have all my notebooks ready to go.

Creating a Repository

First thing we’ll need to create a repository in Cloud Source Repository (if you don’t have a Google Cloud, you can sign up here). There are multiple ways to do this, but we’ll do it via the browser interface.

From the drop-down menu, select “Source Repositories” under TOOLS (my menu might look slightly different from yours):

Image for post
Image for post

From the Cloud Source Repositories dashboard, make sure you have the right Project selected in the drop-down. Take note of its name. Then, create a new repository by clicking the “Add repository” button in the upper right:

Image for post
Image for post

Go through the wizard and give your repository a meaningful name and select the appropriate project.

Cloning and Adding Code

Once you’ve created the (empty) repository, you’ll be prompted to add code to it. From this menu, select the “Manually generated credentials” authentication method:

Then click “1. Generate and store your Git credentials”. Authenticate with the correct credentials, and Allow.

You’ll then be presented with a short script that will authenticate you with Cloud Source Repositories.

To run the script, we’re going to copy and paste it into a Terminal on the VM running Jupyterlab. Head back over to your running Jupyterlab instance (from the list of managed notebook Instances, click “OPEN JUPYTERLAB”).

From here, open a Terminal tab by going to File > New > Terminal:

This opens a new terminal tab in our home directory. If you issue an ls command, you should see your .ipynb files.

Paste in the contents of the short script. This will create a ~/.gitcookie file with your credentials.

You can now Clone your repository via the Terminal tab with the git clone command (the command is back on the “Manually generated credentials” tab).

$ git clone https://source.developers.google.com/p/[YOUR_PROJECT]/r/[YOUR_REPOSITORY]

You can also clone it via Jupyterlab by going to Git > Clone a Repository and providing the same https://source.developers.google.com/p/[YOUR_PROJECT]/r/[YOUR_REPOSITORY] uri.

Image for post
Image for post

Now, you can move and organize your notebooks within the newly-created local git repository.

Once you’re happy, make sure to add them ($ git add your_notebook.ipynb), commit them ($ git commit -m “Useful message”), and push them back to the main source repository ($ git push or “Git > Push to Remote” from the GUI).

Note: before your initial commit, you will likely need to run:

$ git config — global user.email “you@example.com”
$ git config — global user.name “Your Name”

Note2: You may want to create a .gitignore file and add .ipynb_checkpoints to it

Once you’ve successfully pushed the notebooks, they will show up when browsing the Cloud Storage Repository.

Onward!

The next time you launch a new AI Notebook instance, you can check out your repository from Cloud Source and have all your Jupyter notebooks readily available!

To do so, repeat the steps documented above:

  1. Re-generate the authentication script at https://source.developers.google.com/auth/start?scopes=https://www.googleapis.com/auth/cloud-platform&state=
  2. Launch the Terminal
  3. Paste in the authorization script
  4. Clone the repository

Now with your clean and organized AI Platform Notebooks environment, you’re all set to create those new Tensorflow models or tweak those matplotlib visualizations in the New Year!

Update 1/19/21: Automating Notebook Checkout at Instance Startup

Manually cutting and pasting a script into a terminal each time you launch a new instance is less-than-ideal. Fortunately, it’s fairly easy to configure an AI Platorm Notebook at launch to automatically checkout your Jupyter notebook repository from Cloud Source Repository.

To do so, we’ll create a bash script to run at instance launch and store it (along with some git configuration files) in a Cloud Storage bucket. When launching a notebook instance, we can have it download and execute the script to automatically retrieve the git configuration files and retrieve our notebooks.

Notes:

First, create a new Cloud Storage Bucket to hold your git configuration files and startup script. If you decide to use an existing bucket, ensure that the permissions are set appropriately to not allow public access to your files.

From the terminal on your running AI Notebook instance, copy the ~/.gitcookies file (this was created when you ran the authentication script) and the ~/.gitconfig file (created by running the git config commands) into the bucket:

$ gsutil cp ~/.gitcookies ~/.gitconfig gs://[CONFIG_BUCKET]/

Next we’ll create a script called startup_config_git.sh and upload it to the bucket. Using Cloud Shell or your local machine, create a startup_config_git.sh file with the following contents:

#!/bin/bash -ex# Startup scripts are run as root, run the rest of the script as the jupyter user
sudo -i -u jupyter bash -ex << EOF
cd /home/jupyter
# Replace CONFIG_BUCKET with your Cloud Storage bucket
gsutil cp gs://[CONFIG_BUCKET]/.gitcookies ./
gsutil cp gs://[CONFIG_BUCKET]/.gitconfig ./
# Edit the following to be the cloud source repo with your notebooks
git clone https://source.developers.google.com/p/[YOUR_PROJECT]/r/[YOUR_REPOSITORY]

Make sure you replace CONFIG_BUCKET, YOUR_PROJECT, and YOUR_REPOSITORY with the appropriate values. Once complete, upload the file to your CONFIG_BUCKET using gsutil or via the console.

Great! Now when you go to launch an AI Notebooks instance, we’ll tell it to run this script at startup. To do so, when you launch a New Instance, select “Advanced Options” at the bottom of the dialog:

On the next page, expand the “Environment” section, find the “Select a script to run after creation” box and click “Browse”:

In the dialog the pops up, navigate to the bucket you created and select the startup_config_git.sh script:

Image for post
Image for post

After selecting the startup script, click Create at the bottom of the configuration section:

Now, once your instance loads, your notebooks will be checked out and your git credentials will be ready to go!

References

ML model goes brrrrr

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store