CSE 40657/60657
Using Kaggle

Kaggle is an online community for data science and machine learning, probably best known for its competitions. It offers free access to GPUs, which are needed for some of the assignments in this class. You are not required to use Kaggle; you are welcome to use whatever GPUs you want.

If you don't already have a Kaggle account, visit https://www.kaggle.com and follow the instructions there. Kaggle only allows one account per email address.

Creating a notebook

Programs in Kaggle are called "notebooks," even when they aren't Jupyter-style notebooks.

To create a new notebook, click on "+" → "New Notebook". Or, "<> Code" → "New Notebook." For work done for this class, notebooks must be private. To open an existing notebook, click on "<> Code" → "Your work" and select the notebook.

The editor has two views: script view shows your code as one continuous file, and notebook view, similar to a Jupyter notebook, divides your code into cells and shows the value of each cell. To switch between the two views, use "File" → "Editor Type".

Either way, your code is stored in a single file, and multiple files are not allowed. If you want multiple files, you'll have to put them into a dataset and add the dataset's directory (see below) to sys.path.

Running a notebook

A session is a connection to a computer in the cloud. It can be either an interactive session, meaning you can type Python expressions and start/stop the running of the notebook, or a batch session, which is a single, non-interactive run of your notebook.

Interactive sessions

To start an interactive session, open a notebook editor and click on "Run" → "⏻ Start Session." In the lower-left corner (next to "⧉ View Active Events"), you'll see a yellow ① appear. The session has started, but no code is running yet.

To start a run, click on "⏩ Run All". The button will change to "□ Cancel Run." To see the output of the run, click on ">_" to make the console appear.

To stop the run, click on "□ Cancel Run." The button will change back to "⏩ Run All", indicating that the run has ended. However, the session is still running: you can still see it listed under "⧉ View Active Events".

You can close the window and come back to it again (unlike Colab) by clicking "⧉ View Active Events" and then your session. But after one hour of inactivity, the session will be ended automatically and its log deleted.

To end the session, select "Run" → "⏻ Stop Session" or click "⧉ View Active Events", then the "⋯" next to the session, then "□ Stop Session".

Batch sessions

A batch session is non-interactive and not subject to the 1 hour inactivity limit. Note that the Kaggle UI and documentation usually calls batch sessions "versions."

To start one, click on "Save Version" in the upper-right corner, make sure the version type is "✓ Save and Run All (Commit)", then click "Save". Next to "⧉ View Active Events", you'll see a yellow ① appear (or increment to ② if you already started an interactive session). You'll also see the number next to "Save Version" increment by one.

You can view the output of a running batch session by clicking "⧉ View Active Events", the "⋯" next to the session, then "↗ Open Logs in Viewer". You can view the output of any version, running or not, by clicking the number next to "Save Version", then the "⋯" next to the session, then "↗ Open in Viewer".

Batch sessions are limited to 12 hours.

Ending a batch session is the same as ending an interactive session: click "⧉ View Active Events", then the "⋯" next to the session, then "□ Stop Session".

If your notebook has at least one version, then every time you open the notebook, the first thing you will see is the most recent successful version. To get to the editor, click "Edit".

Inputs and outputs

If you link a dataset to your notebook, then its files will be visible in /kaggle/input/dataset-name.

In an interactive session, your script's stdout and stderr are shown in the console at the bottom of the screen. I find that if you print to both stdout and stderr, some lines can get lost, but scrolling up and down recovers them. When you save a version, its stdout and stderr are both saved in the log. Again, if you mix stdout and stderr, some lines can get lost, and unfortunately it seems that the lost lines won't appear until the run is finished.

Your script should put its output files into /kaggle/working. In an interactive session, output files will be viewable on the right-hand side of the editor. When you save a version, the output files will be saved with it.

GPUs

To enable the GPU, under "Accelerator", change "None" to "GPU P100" (or "GPU T4×2" should be fine too). At the same time, add the line to your program (after import torch but before any tensors are created):
torch.set_default_device('cuda')
GPU usage is subject to limitations:

It seems that these limits are not always strictly enforced. The weekly 30 hours rolls over every Friday night at 8pm EDT / 7pm EST.

Important: When an interactive session is active, it counts towards your 30 hours, even when no process is running.