3. Breed Classification Using PyTorch/FASTAI
A tutorial for beginners with Pytorch and FASTAI you can create your own classifier.
- 3. Neural Network
- Image Classification
- Why we are USIGN BASE_PATH
- Remember
- Little Experiment With RE ( Regular Expression)
- DataBlock
- this is called Presizing
- Details Presizing
- Lets Debug DataLoader
- Lets Debug Augmentation
- Failure in DataBlock
- Question
- Train Model help Clean Data
- Cross Entropy
- Lets see whats inside Batch
- View the predictions
- How do we go about this prediction
- Taking Sigmoid
- Solution :
- More than 2 cat:
- Interesting thing
- Log Likelihood
- Replace torch.where
- How to make it Work More than 2 colums
- Negative Log Liklihood
- Lets Talk about Logs
- Two ways for Cross entropy Loss
- why loss function needs to negative?
This tutorial is created from Lecture 4 from FAST ai Course Deep Learning from coders Course I will go through step by step how to build Classifier using pytorch from scratch.
path = untar_data(URLs.PETS)
Path.BASE_PATH = path
path.ls()
(path/"images").ls()
Remember
Most of function we are using in fastai are belong to Class "L" instead of list Ehanced list ( showing number of items , more items are denoted as ".."
Last time first letter is capital then cat otherwise dog
here our case is different
Regular expression help us to get labels
Please google re if you havnt gone through
There is FASTai NLP course a--2 regix lessons
Bit hard to get sometimes
Lets Pick file name and see how it is
fname = (path/"images").ls()[0]
fname.name
Little Experiment With RE ( Regular Expression)
- re is module
- findall grab all parts of regular expression
- that have parantheses around them
- r is special kind string which says dont treat backslashes\ special remember in python backslashes is newline
- 'r(.+)\d.jpg '-- means string pick any "." letter "+" can be repeated one or more time which is followed by under score "" "\d+" followed by digit one or more time ("." --followed by anything can be . ) followed by "jpg" ("dollar" followed by end of string)
re.findall(r'(.+)_\d+.jpg$',fname.name)
DataBlock
- Now we blocks expect dependent and independent variable
- get items --get images files
- splitter- Random splitt data
- get_y --using attribute which takes Regex LAbler function which will be passed to attribute "name"
- aug transform we saw in lesson 2 section aug transformer .. its basically synthetic
- Resize to very large image 460 then using aug trans to have smaller size
why?
this is called Presizing
details are below Steps
- resize grab square randomly if its portrait then grab randomly full width grab random from top to bottom
- secondly augmernt transform resize grab random wraped crop possibly rotated and turn that into square (rotation ,wrapping ,zooming) to smaller to 224 by 224
note : first step turning square. but seccond step can happen in gpu normally things like rotating and cropping are pretty slow
(rotation ,wrapping ,zooming) are actually desruptive to image becasue each one requires interpolation step which not just slow but makes images low quality
whats unique in fast ai
we are keeping track of changing. coordinate values in non-lossy way ,so the full floting point value and then once at very end we will do interpolation
look taddy bears
left - presizing approach right - using python libraries
there are wierd things over here Flaws
- less nicely focused
- grass
- distortion on leg sides
We need our images to have the same dimensions, so that they can collate into tensors to be passed to the GPU. We also want to minimize the number of distinct augmentation computations we perform. The performance requirement suggests that we should, where possible, compose our augmentation transforms into fewer transforms (to reduce the number of computations and the number of lossy operations) and transform the images into uniform sizes (for more efficient processing on the GPU).
The challenge is that, if performed after resizing down to the augmented size, various common data augmentation transforms might introduce spurious empty zones, degrade data, or both. For instance, rotating an image by 45 degrees fills corner regions of the new bounds with emptiness, which will not teach the model anything. Many rotation and zooming operations will require interpolating to create pixels. These interpolated pixels are derived from the original image data but are still of lower quality.
To work around these challenges, presizing adopts two strategies that are shown in < The first step, the resize, creates images large enough that they have spare margin to allow further augmentation transforms on their inner regions without creating empty zones. This transformation works by resizing to a square, using a large crop size. On the training set, the crop area is chosen randomly, and the size of the crop is selected to cover the entire width or height of the image, whichever is smaller. In the second step, the GPU is used for all data augmentation, and all of the potentially destructive operations are done together, with a single interpolation at the end. This picture shows the two steps: To implement this process in fastai you use < why? Initial model will help you clean data Remember we have interpret.toplosses help us identify mislables confusion matrix help us where we are confused ImageClassifierCleaner let us find for example two bears top confused things Model helping you and then go ahead train data after cleaning Notebook4 included loss function
Fastai atomatically pick good loss function Lets look what acc. it picks The second column (the probability of it being a 7) will then just be that value subtracted from 1. Now, we need a way to do all this that also works for more than two columns. It turns out that this function, called jargon:Exponential function (exp): Literally defined as Let's check that If we have three output activations, such as in our bear classifier, calculating softmax for a single bear image would then look like something like < e to power something grows really fast
see below if we have one activation is bit bigger than other then softmax is really big its tries to pick one whcih one thats not you always want sometimes you have inference time you want to bit concious its default you do most of time so that is somtmax math.exp(6) so this is super nifty indexing expression you should play with 1st thing say which row you should return second says which column we can use that more than two values Problem : That means that our model will not care whether it predicts 0.99 or 0.999. Indeed, those numbers are so close together but in another sense, 0.999 is 10 times more confident than 0.99 The function we saw in the previous section works quite well as a loss function, but we can make it a bit better. The problem is that we are using probabilities, and probabilities cannot be smaller than 0 or greater than 1. That means that our model will not care whether it predicts 0.99 or 0.999. Indeed, those numbers are so close together—but in another sense, 0.999 is 10 times more confident than 0.99. So, we want to transform our numbers between 0 and 1 to instead be between negative infinity and infinnity. There is a mathematical function that does exactly this: the logarithm (available as so this log fuctoin we can acc make it better what if model decide 0.99 or 0.999 if we have 1000 things then right one is better than 0.99 so really we like to transform numbers between 0-1 instead -infinite to positive infinite log will help us in this case so numbers as we closer to zero its goes down to infinity at 1 it goes to zeros we cant go zero our loss function we want to be negative (y = b power a)
a= log(y,b) what intersting log (a*b) = log(a) + log(b) a*b can be very very big or small adding not get out of control when we take the probabilities such as sm_acts we take log we take mean that is called negative log likelihood *if we take softmax
and then log pass to nll_loss which is cross Entropy Loss why nll_loss dont take Log? the reason it is more convinient to take log back at softmax step so pytorch has fuction. log_softmax since it is very easir pytirch assume u did log softmax and pass to nll loss function
"""
<img alt="Presizing on the training set" width="600" caption="Presizing on the training set" id="presizing" src="images/att_00060.png">
"""
item_tfms
, so it's applied to each individual image before it is copied to the GPU. It's used to ensure all images are the same size. On the training set, the crop area is chosen randomly. On the validation set, the center square of the image is always chosen.batch_tfms
, so it's applied to a batch all at once on the GPU, which means it's fast. On the validation set, only the resize to the final size needed for the model is done here. On the training set, the random crop and any other augmentations are done first.Resize
as an item transform with a large size, and RandomResizedCrop
as a batch transform with a smaller size. RandomResizedCrop
will be added for you if you include the min_scale
parameter in your aug_transforms
function, as was done in the DataBlock
call in the previous section. Alternatively, you can use pad
or squish
instead of crop
(the default) for the initial Resize
.pets = DataBlock( blocks =(ImageBlock, CategoryBlock),
get_items = get_image_files,
splitter= RandomSplitter(seed=42),
get_y = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'),'name'),
item_tfms=Resize(460),
batch_tfms=aug_transforms(size=224, min_scale=0.75))
dls = pets.dataloaders(path/"images")
dls.show_batch(nrows=1,ncols=3)
dls.show_batch(nrows=1,unique=True,ncols=3)
pets1 = DataBlock(blocks = (ImageBlock,CategoryBlock),
get_items= get_image_files,
splitter=RandomSplitter(seed=42),
get_y = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'),"name"))
pets1.summary(path/"images")
learn = cnn_learner(dls,resnet34, metrics= error_rate)
learn.fine_tune(2)
Train Model help Clean Datalearn.loss_func
x,y = dls.one_batch()
y
dls.vocab
dls.vocab[16]
preds,_ = learn.get_preds(dl= [(x,y)])
preds[0]
len(preds[0]), preds[0].sum()
torch.random.manual_seed(42),
acts = torch.randn((6,2))*2
acts
#How much likely is first and how muc to 7 i.e
acts.sigmoid()
diff = acts[:,0]- acts[:,1]
diff
torch.stack([diff.sigmoid(),1-diff.sigmoid()],dim=1)
softmax
, is exactly that:def softmax(x): return exp(x) / exp(x).sum(dim=1, keepdim=True)
e**x
, where e
is a special number approximately equal to 2.718. It is the inverse of the natural logarithm function. Note that exp
is always positive, and it increases very rapidly!softmax
returns the same values as sigmoid
for the first column, and those values subtracted from 1 for the second column:sm_acts = torch.softmax(acts, dim=1)
sm_acts
softmax
is the multi-category equivalent of sigmoid
—we have to use it any time we have more than two categories and the probabilities of the categories must add to 1, and we often use it even when there are just two categories, just to make things a bit more consistent. We could create other functions that have the properties that all activations are between 0 and 1, and sum to 1; however, no other function has the same relationship to the sigmoid function, which we've seen is smooth and symmetric. Also, we'll see shortly that the softmax function works well hand-in-hand with the loss function we will look at in the next section."""
<img alt="Bear softmax example" width="280" id="bear_softmax" caption="Example of softmax on the bear classifier" src="images/att_00062.png">
"""
Interesting thingmath.exp(4)
math.exp(6)
targ = tensor([0,1,0,1,1,0])
sm_acts
Replace torch.where
idx = range(6)
sm_acts[idx,targ]
from IPython.display import HTML
df = pd.DataFrame(sm_acts , columns=["3","7"])
df["target"] = targ
df["idx"] = idx
df["loss"] = sm_acts[range(6), targ]
t= df.style.hide_index()
#To have html code compatible with our script
html = t._repr_html_().split('</style>')[1]
html = re.sub(r'<table id="([^"]+)"\s*>', r'<table >', html)
display(HTML(html))
-sm_acts[idx, targ]
F.nll_loss(sm_acts,targ,reduction="none")
Lets Talk about Logstorch.log
). It is not defined for numbers less than 0, and looks like this:torch.log(tensor(0.01)), torch.log(tensor(1)),torch.log(tensor(0.))
loss_func=nn.CrossEntropyLoss()
loss_func(acts,targ)
F.cross_entropy(acts,targ)
nn.CrossEntropyLoss(reduction="none")(acts,targ)