Getting Started With fast.ai

Getting Started With fast.ai

My experience with the first lesson of the excellent “Practical Deep Learning for Coders” by fast.ai

For the last six months, I’ve been working to transition from being an adjunct math instructor at a university to being a data scientist (or perhaps a machine learning engineer), and the next leg of my journey is taking me through the excellent “Practical Deep Learning for Coders” course by fast.ai. I actually attempted this course in May, but got bogged down in Lesson 2, when we’re supposed to productionize a model and make a web app. I also felt like I wasn’t making progress in my understanding, possibly because I didn’t yet have faith in the power of top-down learning. I read a tweet by Christine McLeavey Payne, saying she’d done Andrew Ng’s Deep Learning Specialization on Coursera before doing fast.ai, and I decided to try that. That went fine, and having completed the five-course specialization over the summer, I’m excited to get back to fast.ai.

Photo by Crew on Unsplash

I’m starting over completely, and I didn’t keep notes when getting started the first time in May, but following the advice of fast.ai co-founder Rachel Thomas and others, I’m writing “the blog I needed to read last week.” I first planned to use Medium, then thought using Jupyter notebooks on GitHub Pages might teach me more useful skills, and got thoroughly waylaid by using the terminal. I need to learn to use the terminal anyway, but I can learn it while blogging, rather than waiting to blog until I’ve learned it, and published is better than not published.

As I begin the course, I’m following the advice of Jeremy Howard, co-founder of fast.ai and instructor of the course, and am watching the lectures straight through, then watching the lectures again as I work through the corresponding notebooks. My first task after watching Lesson 1 is setting up a GPU, and I’ve chosen to use Google Cloud Platform. The course provides detailed instructions for this, which work nearly flawlessly for me, with only one exception: when I try to connect to my instance through my terminal, I get an error:

So I Google the error message + “fast.ai,” and find a thread on the fast.ai forums with instructions that solve my problem.

Now, on to the Lesson 1 notebook! I’m trying to find a balance between just pressing Shift+Enter on everything, and trying to understand every detail as I go through it. I think the balance I’ve chosen is to read everything and make sure it makes sense, but not try to memorize the code. As a math instructor, I know memorizing symbols feels like learning headway is being made, but it’s a false sense of progress. Instead, I’m assuming that as I practice the code in my personal projects, it’ll solidify in my mind.

Everything makes sense for me until the resnet50 section, when we run the learning rate finder, and we get this graph:

Figure 1: learning rate finder graph

Looking at the graph above, I feel like the optimal learning rate — the one with the lowest validation loss — should be between 1e-2 and 1e-1, but next when we’re fine-tuning with the code below, we’re setting the learning rate between 1e-6 and 1e-4.

But the validation loss and error rate with the default learning rate of 1e-3 are quite low: 0.18 and 0.06, respectively (see chart below).

Figure 2: results of fitting one cycle with the default learning rate

I’m not really expecting to improve those, but to solidify my understanding of the learning rate finder graph, I’m going to rerun the learning rate finder, setting the learning rate between 1e-2 and 1e-1, just to see if anything changes.

Something changes, and not for the better! Compare the chart below to the one above: my validation loss is a whopping 7319, and my error rate is .97!

Figure 3: results of fitting one cycle with a learning rate range of 1e-2 to1e-1

I clearly have some misconception in my interpretation of the graph generated by the learning rate finder. I’ll go back and watch that bit of the lesson again. Okay, Jeremy refers to choosing a range that ends “well before” the loss starts to get worse, and referring back to the learning rate finder graph, I see that it starts to get worse at about 1e-1, which I’d set as the end of my range. It looks like maybe a range of 1e-4 to 1e-2 might work? The default rate of 1e-3, which got us great results previously, is in that range, of course.

I’ll try one more time with a range of 1e-4 to 1e-2:

Nope, still no good, though at least this time it improved from epoch 0 to epoch 1, unlike the previous attempt:

Figure 4: results of fitting one cycle with a learning rate range of 1e-4 to 1e-2

I’m going to stop fiddling with the learning rate finder that I obviously still don’t quite get the nuances of, since Jeremy says we’ll learn more about it in Lesson 2. I move on, and everything else goes smoothly and makes sense.

When I first completed Lesson 1 in May, I tried to do the Lesson 2 notebook without watching Lesson 2, as seems to be suggested at the bottom of the wiki. The wiki doesn’t explicitly say not to watch Lesson 2 before trying the notebook, but why else is the link on the Lesson 1 wiki? Maybe for people with more experience?

Anyway, on to Lesson 2: Classifying Pregnancy Test Results!

Lesson 2 (the sequel): Can Deep Learning Perform Better than Pigeons?

Lesson 3: 10,000 Ways that Won’t Work

Lesson 4: Predicting a Waiter’s Tips

Lesson 5: But Where Does the Pickle Go?

Lesson 6: Everybody Wants to be a Cat

I’m a mathematics lecturer at CSU East Bay, and an aspiring data scientist. Connect with me on LinkedIn, or say hi on Twitter.

Thanks to Vera Chernova for instructions on embedding Jupyter code on Medium!