This tutorial is created from Lecture 4 from FAST ai Course Deep Learning from coders Course I will go through step by step how to build Classifier using pytorch from scratch.

Image Classification

path = untar_data(URLs.PETS)

Why we are USIGN BASE_PATH

we want to nicely represent our data paths relative to our current path Look at path.ls()

Path.BASE_PATH = path

path.ls()

(#4) [Path('annotations'),Path('images'),Path('models'),Path('crappy')]

(path/"images").ls()

(#7394) [Path('images/Sphynx_245.jpg'),Path('images/miniature_pinscher_55.jpg'),Path('images/havanese_20.jpg'),Path('images/miniature_pinscher_34.jpg'),Path('images/samoyed_91.jpg'),Path('images/chihuahua_123.jpg'),Path('images/yorkshire_terrier_155.jpg'),Path('images/Egyptian_Mau_79.jpg'),Path('images/scottish_terrier_23.jpg'),Path('images/basset_hound_198.jpg')...]

Remember

Most of function we are using in fastai are belong to Class "L" instead of list Ehanced list ( showing number of items , more items are denoted as ".."

Last time first letter is capital then cat otherwise dog

here our case is different

Regular expression help us to get labels

Please google re if you havnt gone through

There is FASTai NLP course a--2 regix lessons

Bit hard to get sometimes

Lets Pick file name and see how it is

fname = (path/"images").ls()[0]

fname.name

'Sphynx_245.jpg'

Little Experiment With RE ( Regular Expression)

re is module
findall grab all parts of regular expression
that have parantheses around them
r is special kind string which says dont treat backslashes\ special remember in python backslashes is newline
'r(.+)\d.jpg '-- means string pick any "." letter "+" can be repeated one or more time which is followed by under score "" "\d+" followed by digit one or more time ("." --followed by anything can be . ) followed by "jpg" ("dollar" followed by end of string)

re.findall(r'(.+)_\d+.jpg$',fname.name)

['Sphynx']

DataBlock

Now we blocks expect dependent and independent variable
get items --get images files
splitter- Random splitt data
get_y --using attribute which takes Regex LAbler function which will be passed to attribute "name"
aug transform we saw in lesson 2 section aug transformer .. its basically synthetic
Resize to very large image 460 then using aug trans to have smaller size

why?

this is called Presizing

details are below Steps

resize grab square randomly if its portrait then grab randomly full width grab random from top to bottom
secondly augmernt transform resize grab random wraped crop possibly rotated and turn that into square (rotation ,wrapping ,zooming) to smaller to 224 by 224

note : first step turning square. but seccond step can happen in gpu normally things like rotating and cropping are pretty slow

(rotation ,wrapping ,zooming) are actually desruptive to image becasue each one requires interpolation step which not just slow but makes images low quality

whats unique in fast ai

we are keeping track of changing. coordinate values in non-lossy way ,so the full floting point value and then once at very end we will do interpolation

look taddy bears

left - presizing approach right - using python libraries

there are wierd things over here Flaws

less nicely focused
grass
distortion on leg sides

Details Presizing

"""
<img alt="Presizing on the training set" width="600" caption="Presizing on the training set" id="presizing" src="images/att_00060.png">

"""

'\n<img alt="Presizing on the training set" width="600" caption="Presizing on the training set" id="presizing" src="images/att_00060.png">\n\n'

[('american_pit_bull_terrier', 'staffordshire_bull_terrier', 10),
 ('beagle', 'basset_hound', 7),
 ('Bengal', 'Egyptian_Mau', 6),
 ('staffordshire_bull_terrier', 'american_pit_bull_terrier', 5)]

Minimum/10 : 8.32e-03 , "steepest point":4.37e-03

SuggestedLRs(minimum=1.318256749982538e-07, steep=1.0964781722577754e-06)

"""
<img alt="Presizing on the training set" width="600" caption="Presizing on the training set" id="presizing" src="images/att_00060.png">

"""

'\n<img alt="Presizing on the training set" width="600" caption="Presizing on the training set" id="presizing" src="images/att_00060.png">\n\n'

pets = DataBlock( blocks =(ImageBlock, CategoryBlock),
                get_items = get_image_files,
                splitter= RandomSplitter(seed=42),
                get_y = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'),'name'),
                item_tfms=Resize(460),
                batch_tfms=aug_transforms(size=224, min_scale=0.75))

[('american_pit_bull_terrier', 'staffordshire_bull_terrier', 10),
 ('beagle', 'basset_hound', 7),
 ('Bengal', 'Egyptian_Mau', 6),
 ('staffordshire_bull_terrier', 'american_pit_bull_terrier', 5)]

Minimum/10 : 8.32e-03 , "steepest point":4.37e-03

SuggestedLRs(minimum=1.318256749982538e-07, steep=1.0964781722577754e-06)

pets = DataBlock( blocks =(ImageBlock, CategoryBlock),
                get_items = get_image_files,
                splitter= RandomSplitter(seed=42),
                get_y = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'),'name'),
                item_tfms=Resize(460),
                batch_tfms=aug_transforms(size=224, min_scale=0.75))

dls = pets.dataloaders(path/"images")

Lets Debug DataLoader

show batch is for each mini batch it will show data if loaded properly

dls.show_batch(nrows=1,ncols=3)

Lets Debug Augmentation

get unique = "true"

dls.show_batch(nrows=1,unique=True,ncols=3)

learn = cnn_learner(dls,resnet34, metrics= error_rate)
learn.fine_tune(2)

Model Interpretation

having trained model lets interpret it

using confusion matrix

diagnol shows classified correctly

let see saimese miss classified

when you have lot of classes

tweak to most confused which nummber is bigger

that will tell you confusion which is most confused

interpret = ClassificationInterpretation.from_learner(learn)
interpret.plot_confusion_matrix(figsize =(12,12),dpi=20)

Most Confused

here you can see 'american_pit_bull_terrier', 'staffordshire_bull_terrier' are confused 10 times

I am not a dog and cat expret

Google it : I found they both are similar

when you see when you model make same mistakes like human then its a good sign

interpret.most_confused(min_val=5)

[('american_pit_bull_terrier', 'staffordshire_bull_terrier', 10),
 ('beagle', 'basset_hound', 7),
 ('Bengal', 'Egyptian_Mau', 6),
 ('staffordshire_bull_terrier', 'american_pit_bull_terrier', 5)]

Improve Model

we will try different techniques

Improve learning rate
Unfreezing Transfer learning

Fine Tune Learning rate

Last time if you see we used default 1e-2

unfortunately when we use high learning rate we got high error

lets run and compare it

why this?

see lesson 4 about learning rate jumping

Lislie Research will help us to find learning rate

REmember when we do SGD we look at one mini batch at a time there are images in this case at a time find gradient of taht mini-batch and then did step based on learning rate and gradient

Lislie said lets have first mini batch low learning rate and then make it little increase lets say 25 percent higher and do another step then again 25 higher and go another step 25 percent higher and to another step So there are no epochs there are single and similar minibatch and then we can plot a chart here

Look at plot first we have very samll steps and loss there is not big cahnge in loss with gradually we got the point where they are big enough to make difference the loss came down Until we get to point where learning rate too high

We want a steep point

in our case range ( minimum point/10 , steepest )

Question : Is the leanring rate plot is against one minibatch ?

No its not its just standard walking through datalooders its trainig actually the only change is learning rate tweaking after each minibatch

Question: is network reset after each trial to initail state?

Certainly not we want to see how it change Until we are done.. until we are at the time we ran this so whats we are seeing here is something thats actaul learning happening at the same time increasing learning rate

Question : Why would an ideal learning rate found with single minibatch at start of traning keep being a good learning rate even after several epochs further loss reduction ?

Its absolutely would not

Question : why Steepest?

Because minimum is a point where we wont learn anymore the weights will be oscillating

learn = cnn_learner(dls,resnet34, lr=1e-1,metrics= error_rate)
learn.fine_tune(2)

learn = cnn_learner(dls,resnet34,metrics= error_rate)

lr_min, lr_steep=learn.lr_find(suggest_funcs=(minimum, steep))

print(f'Minimum/10 : {lr_min:.2e} , "steepest point":{lr_steep:.2e}')

Minimum/10 : 8.32e-03 , "steepest point":4.37e-03

Lets Choose Learning Rate

Each time you run it you got different a values so you can pick 3e-3

learning rate finder is on logrithmic scale

now lets experiment

Summary

its just shows

reason took while engineers kind of love using lot of computers they like to run lots of experiments on big clusters to find out which learning rate is the best rather than batch at time and. i think partly

Fast ai is first one having this

learn= cnn_learner(dls,resnet34,metrics=error_rate, lr=3e-3)

learn.fit_one_cycle(3)

Unfreezing

Whats Inside Transfer Learning

Remind ourself what transfer learning does

its bunch of linear model along activation function usually relu

each of those linear layers bunch of parameter after train on image net we have bunch of parameter that are not random anymore we also see early layers learn general features like edges and later layers read sophisticated like eyes etc

We through away last layers because last layers is bit that specifiacally says which one of those belongs to 1000 categories in case of imgaenett ..we replace with random weights sometime more than 1 layer and train that

Discussion Rachel and Howerd:

I think learning rate finder after you learn about it its approximate method would work

Noticed lot of my students at USF have tendency to jump in and try to find where they account of possible imperfection at start. and its very rare..One of cool thing is to try to do easier things

This is big innovation and super helpful and poeple are researcher are ignoring lot of people dont know about learning rate finder

Okay so transfer learning Pretrained network go with architecture..Now we want to fine tune in our case for breed classification

Do this please

learn.fine_tune??

Methods Inside Fine Tune

Freeze : only last layer weightswe step to
freeze_epoch is 1 by default
fit : doing randomly added weights

all layers except last is pretrained

divide learning rate by 1
unfreeze: all parameters now being steps all of them gradient calculated
then we fit for some epochs

we can do it by hand cnn learner by default freeze model for us then we can just call fit one cycle and train last layer

learn = cnn_learner(dls,resnet34,metrics = error_rate)
learn.fit_one_cycle(3,3e-3)

Recheck Learning rate

now lets unfreze

run lr_find again you will not see rapid drop

now we are improving 0.05 error

better we can do**

at this point here we training whole model with same learning rate 10 to -5 which doesnt make sense because the last layer is still not that great only had three epochs of training at random. so probably needs more work

we know the second last layer was probably specialised to imagenet instead of pet breed so we need to train it more.... early layerss kind of gradient and edges not need to change much what we like is small learning rate for early layers--- bigger fot the later

learn.unfreeze()

learn.lr_find(suggest_funcs=(minimum, steep))

SuggestedLRs(minimum=1.318256749982538e-07, steep=1.0964781722577754e-06)

learn.fit_one_cycle(6,lr_max=1e-5)

Lets learn about Slice

this is something called discriminative learning rate

JAson Hesinki in his research shows this theory

so we pass slice

e.g slice(1e-6,1e-4)

very first layer 1e-6 --and-- last layer would be 1e-4 layers between the two kind of equal multiples so they will be kind of equally spaced

We overshoot here

Fit One cycle Working

its bit different it is actually at low learning rate increases it gradually for the first 1/3 of batches until it gets to a high learning rate the highest one this is why they are called lr_max

so remaining 2/3 or so batches its gradually decreases... the reason because emperically researcher have found it the best that was developed again by Lieslie Smith the same guy that did learning rate finder.. its dramatically increased the speed to which we can train networks..

Academic community ignored it but infact the key publication that developed the idea was not even the peer review and so the reason i mentioned this is now is to say that we cant we dont really

So we dont just want to go back and pick the model that was trained back here because we could probably do better because we really want to pick a model thats got a low learning rate.

So I want to change epoch to 8 because at epoch 8 we

learn = cnn_learner(dls,resnet34,metrics=error_rate)
learn.fit_one_cycle(3,3e-3)
learn.unfreeze()
learn.fit_one_cycle(12,lr_max=slice(1e-6,1e-4))

epoch	train_loss	valid_loss	error_rate	time
0	0.494089	0.284470	0.086604	00:52
1	0.339563	0.229090	0.071042	00:53

epoch	train_loss	valid_loss	error_rate	time
0	0.509674	0.300865	0.100135	00:54
1	0.338566	0.227000	0.077808	00:54

epoch	train_loss	valid_loss	error_rate	time
0	1.104831	0.339296	0.116373	00:45
1	0.526981	0.215112	0.066982	00:45
2	0.324713	0.208224	0.069689	00:45

epoch	train_loss	valid_loss	error_rate	time
0	1.157931	0.296587	0.100812	00:44
1	0.548078	0.263024	0.080514	00:44
2	0.327115	0.235120	0.072395	00:44

epoch	train_loss	valid_loss	error_rate	time
0	0.263316	0.216748	0.070365	00:52
1	0.240422	0.206543	0.067659	00:54
2	0.215897	0.193572	0.061570	00:54
3	0.200803	0.197216	0.068336	00:55
4	0.193693	0.193816	0.062923	00:55
5	0.193264	0.192747	0.062246	00:56

epoch	train_loss	valid_loss	error_rate	time
0	1.157091	0.375448	0.116373	00:43
1	0.537694	0.227430	0.077808	00:44
2	0.333000	0.212604	0.068336	00:44

epoch	train_loss	valid_loss	error_rate	time
0	0.243977	0.204482	0.064953	00:54
1	0.249572	0.199334	0.069689	00:55
2	0.251869	0.192973	0.069012	00:55
3	0.210270	0.182762	0.062246	00:56
4	0.189316	0.180381	0.056834	00:55
5	0.173813	0.183509	0.065629	00:55
6	0.151790	0.176850	0.061570	00:56
7	0.145880	0.178531	0.061570	00:56
8	0.133386	0.174536	0.052774	00:55
9	0.134584	0.179717	0.058187	00:55
10	0.129982	0.174135	0.058863	00:56
11	0.125701	0.175962	0.056157	00:56