4. Deep Dive to Breed Classification
A tutorial for someone have interest we are deep diving to classification and fine tune our model with Pytorch and FASTAI.
- Image Classification
- Why we are USIGN BASE_PATH
- Remember
- Little Experiment With RE ( Regular Expression)
- DataBlock
- this is called Presizing
- Details Presizing
- Lets Debug DataLoader
- Lets Debug Augmentation
- Model Interpretation
- Most Confused
- Improve Model
- Fine Tune Learning rate
- Question : Is the leanring rate plot is against one minibatch ?
- Question: is network reset after each trial to initail state?
- Question : Why would an ideal learning rate found with single minibatch at start of traning keep being a good learning rate even after several epochs further loss reduction ?
- Question : why Steepest?
- Lets Choose Learning Rate
This tutorial is created from Lecture 4 from FAST ai Course Deep Learning from coders Course I will go through step by step how to build Classifier using pytorch from scratch.
path = untar_data(URLs.PETS)
Path.BASE_PATH = path
path.ls()
(path/"images").ls()
Remember
Most of function we are using in fastai are belong to Class "L" instead of list Ehanced list ( showing number of items , more items are denoted as ".."
Last time first letter is capital then cat otherwise dog
here our case is different
Regular expression help us to get labels
Please google re if you havnt gone through
There is FASTai NLP course a--2 regix lessons
Bit hard to get sometimes
Lets Pick file name and see how it is
fname = (path/"images").ls()[0]
fname.name
Little Experiment With RE ( Regular Expression)
- re is module
- findall grab all parts of regular expression
- that have parantheses around them
- r is special kind string which says dont treat backslashes\ special remember in python backslashes is newline
- 'r(.+)\d.jpg '-- means string pick any "." letter "+" can be repeated one or more time which is followed by under score "" "\d+" followed by digit one or more time ("." --followed by anything can be . ) followed by "jpg" ("dollar" followed by end of string)
re.findall(r'(.+)_\d+.jpg$',fname.name)
DataBlock
- Now we blocks expect dependent and independent variable
- get items --get images files
- splitter- Random splitt data
- get_y --using attribute which takes Regex LAbler function which will be passed to attribute "name"
- aug transform we saw in lesson 2 section aug transformer .. its basically synthetic
- Resize to very large image 460 then using aug trans to have smaller size
why?
this is called Presizing
details are below Steps
- resize grab square randomly if its portrait then grab randomly full width grab random from top to bottom
- secondly augmernt transform resize grab random wraped crop possibly rotated and turn that into square (rotation ,wrapping ,zooming) to smaller to 224 by 224
note : first step turning square. but seccond step can happen in gpu normally things like rotating and cropping are pretty slow
(rotation ,wrapping ,zooming) are actually desruptive to image becasue each one requires interpolation step which not just slow but makes images low quality
whats unique in fast ai
we are keeping track of changing. coordinate values in non-lossy way ,so the full floting point value and then once at very end we will do interpolation
look taddy bears
left - presizing approach right - using python libraries
there are wierd things over here Flaws
- less nicely focused
- grass
- distortion on leg sides
We need our images to have the same dimensions, so that they can collate into tensors to be passed to the GPU. We also want to minimize the number of distinct augmentation computations we perform. The performance requirement suggests that we should, where possible, compose our augmentation transforms into fewer transforms (to reduce the number of computations and the number of lossy operations) and transform the images into uniform sizes (for more efficient processing on the GPU).
The challenge is that, if performed after resizing down to the augmented size, various common data augmentation transforms might introduce spurious empty zones, degrade data, or both. For instance, rotating an image by 45 degrees fills corner regions of the new bounds with emptiness, which will not teach the model anything. Many rotation and zooming operations will require interpolating to create pixels. These interpolated pixels are derived from the original image data but are still of lower quality.
To work around these challenges, presizing adopts two strategies that are shown in < The first step, the resize, creates images large enough that they have spare margin to allow further augmentation transforms on their inner regions without creating empty zones. This transformation works by resizing to a square, using a large crop size. On the training set, the crop area is chosen randomly, and the size of the crop is selected to cover the entire width or height of the image, whichever is smaller. In the second step, the GPU is used for all data augmentation, and all of the potentially destructive operations are done together, with a single interpolation at the end. This picture shows the two steps: To implement this process in fastai you use < we will try different techniques Improve learning rate Unfreezing Transfer learning Last time if you see we used default 1e-2 unfortunately when we use high learning rate we got high error lets run and compare it why this? see lesson 4 about learning rate jumping Lislie Research will help us to find learning rate REmember when we do SGD we look at one mini batch at a time there are images in this case at a time find gradient of taht mini-batch and then did step based on learning rate and gradient Lislie said lets have first mini batch low learning rate and then make it little increase lets say 25 percent higher and do another step then again 25 higher and go another step 25 percent higher and to another step So there are no epochs there are single and similar minibatch and then we can plot a chart here Look at plot first we have very samll steps and loss there is not big cahnge in loss with gradually we got the point where they are big enough to make difference the loss came down
Until we get to point where learning rate too high We want a steep point in our case range ( minimum point/10 , steepest ) No its not its just standard walking through datalooders its trainig actually the only change is learning rate tweaking after each minibatch Certainly not we want to see how it change Until we are done.. until we are at the time we ran this so whats we are seeing here is something thats actaul learning happening at the same time increasing learning rate Its absolutely would not Because minimum is a point where we wont learn anymore the weights will be oscillating Each time you run it you got different a values so you can pick 3e-3 learning rate finder is on logrithmic scale now lets experiment Summary its just shows reason took while engineers kind of love using lot of computers
they like to run lots of experiments on big clusters to find out which learning rate is the best rather than batch at time and. i think partly Fast ai is first one having this Remind ourself what transfer learning does its bunch of linear model along activation function usually relu each of those linear layers bunch of parameter after train on image net we have bunch of parameter that are not random anymore we also see early layers learn general features like edges and later layers read sophisticated like eyes etc We through away last layers because last layers is bit that specifiacally says which one of those belongs to 1000 categories in case of imgaenett ..we replace with random weights sometime more than 1 layer and train that I think learning rate finder after you learn about it its approximate method would work Noticed lot of my students at USF have tendency to jump in and try to find where they account of possible imperfection at start. and its very rare..One of cool thing is to try to do easier things This is big innovation and super helpful and poeple are researcher are ignoring lot of people dont know about learning rate finder Okay so transfer learning Pretrained network go with architecture..Now we want to fine tune in our case for breed classification Do this please learn.fine_tune?? all layers except last is pretrained unfreeze: all parameters now being steps all of them gradient calculated then we fit for some epochs we can do it by hand
cnn learner by default freeze model for us then we can just call fit one cycle and train last layer now lets unfreze run lr_find again you will not see rapid drop now we are improving 0.05 error this is something called discriminative learning rate JAson Hesinki in his research shows this theory so we pass slice e.g slice(1e-6,1e-4) very first layer 1e-6 --and-- last layer would be 1e-4 layers between the two kind of equal multiples so they will be kind of equally spaced We overshoot here its bit different it is actually at low learning rate increases it gradually for the first 1/3 of batches until it gets to a high learning rate the highest one this is why they are called lr_max so remaining 2/3 or so batches its gradually decreases...
the reason because emperically researcher have found it the best that was developed again by Lieslie Smith the same guy that did learning rate finder.. its dramatically increased the speed to which we can train networks.. Academic community ignored it but infact the key publication that developed the idea was not even the peer review and so the reason i mentioned this is now is to say that we cant we dont really So we dont just want to go back and pick the model that was trained back here because we could probably do better because we really want to pick a model thats got a low learning rate. So I want to change epoch to 8 because at epoch 8 we
"""
<img alt="Presizing on the training set" width="600" caption="Presizing on the training set" id="presizing" src="images/att_00060.png">
"""
item_tfms
, so it's applied to each individual image before it is copied to the GPU. It's used to ensure all images are the same size. On the training set, the crop area is chosen randomly. On the validation set, the center square of the image is always chosen.batch_tfms
, so it's applied to a batch all at once on the GPU, which means it's fast. On the validation set, only the resize to the final size needed for the model is done here. On the training set, the random crop and any other augmentations are done first.Resize
as an item transform with a large size, and RandomResizedCrop
as a batch transform with a smaller size. RandomResizedCrop
will be added for you if you include the min_scale
parameter in your aug_transforms
function, as was done in the DataBlock
call in the previous section. Alternatively, you can use pad
or squish
instead of crop
(the default) for the initial Resize
.pets = DataBlock( blocks =(ImageBlock, CategoryBlock),
get_items = get_image_files,
splitter= RandomSplitter(seed=42),
get_y = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'),'name'),
item_tfms=Resize(460),
batch_tfms=aug_transforms(size=224, min_scale=0.75))
dls = pets.dataloaders(path/"images")
dls.show_batch(nrows=1,ncols=3)
dls.show_batch(nrows=1,unique=True,ncols=3)
learn = cnn_learner(dls,resnet34, metrics= error_rate)
learn.fine_tune(2)
interpret = ClassificationInterpretation.from_learner(learn)
interpret.plot_confusion_matrix(figsize =(12,12),dpi=20)
interpret.most_confused(min_val=5)
Improve Model
Fine Tune Learning rate
Question : Is the leanring rate plot is against one minibatch ?
Question: is network reset after each trial to initail state?
Question : Why would an ideal learning rate found with single minibatch at start of traning keep being a good learning rate even after several epochs further loss reduction ?
Question : why Steepest?learn = cnn_learner(dls,resnet34, lr=1e-1,metrics= error_rate)
learn.fine_tune(2)
learn = cnn_learner(dls,resnet34,metrics= error_rate)
lr_min, lr_steep=learn.lr_find(suggest_funcs=(minimum, steep))
print(f'Minimum/10 : {lr_min:.2e} , "steepest point":{lr_steep:.2e}')
Lets Choose Learning Ratelearn= cnn_learner(dls,resnet34,metrics=error_rate, lr=3e-3)
learn.fit_one_cycle(3)
Unfreezing
Whats Inside Transfer Learning
Discussion Rachel and Howerd:
Methods Inside Fine Tune
learn = cnn_learner(dls,resnet34,metrics = error_rate)
learn.fit_one_cycle(3,3e-3)
Recheck Learning rate
better we can do**
learn.unfreeze()
learn.lr_find(suggest_funcs=(minimum, steep))
learn.fit_one_cycle(6,lr_max=1e-5)
Lets learn about Slice
Fit One cycle Workinglearn = cnn_learner(dls,resnet34,metrics=error_rate)
learn.fit_one_cycle(3,3e-3)
learn.unfreeze()
learn.fit_one_cycle(12,lr_max=slice(1e-6,1e-4))