3. Neural Network

A tutorial for beginners with Pytorch and FASTAI you can create your own classifier.

  • toc:true - badges: true
  • comments: true
  • author: HAFIZ AHMAD HASSAN & Jeremy Howard
  • categories: [jupyter]
  • image: images/chart-preview.png

This tutorial is created from Lecture 4 from FAST ai Course Deep Learning from coders Course I will go through step by step how to build Classifier using pytorch from scratch.

Image Classification

path = untar_data(URLs.PETS)

Why we are USIGN BASE_PATH

we want to nicely represent our data paths relative to our current path Look at path.ls()

Path.BASE_PATH = path
path.ls()
(#4) [Path('annotations'),Path('images'),Path('models'),Path('crappy')]
(path/"images").ls()
(#7394) [Path('images/Sphynx_245.jpg'),Path('images/miniature_pinscher_55.jpg'),Path('images/havanese_20.jpg'),Path('images/miniature_pinscher_34.jpg'),Path('images/samoyed_91.jpg'),Path('images/chihuahua_123.jpg'),Path('images/yorkshire_terrier_155.jpg'),Path('images/Egyptian_Mau_79.jpg'),Path('images/scottish_terrier_23.jpg'),Path('images/basset_hound_198.jpg')...]

Remember

Most of function we are using in fastai are belong to Class "L" instead of list Ehanced list ( showing number of items , more items are denoted as ".."

Last time first letter is capital then cat otherwise dog

here our case is different

Regular expression help us to get labels

Please google re if you havnt gone through

There is FASTai NLP course a--2 regix lessons

Bit hard to get sometimes

Lets Pick file name and see how it is

fname = (path/"images").ls()[0]
fname.name
'Sphynx_245.jpg'

Little Experiment With RE ( Regular Expression)

  1. re is module
  2. findall grab all parts of regular expression
  3. that have parantheses around them
  4. r is special kind string which says dont treat backslashes\ special remember in python backslashes is newline
  5. 'r(.+)\d.jpg '-- means string pick any "." letter "+" can be repeated one or more time which is followed by under score "" "\d+" followed by digit one or more time ("." --followed by anything can be . ) followed by "jpg" ("dollar" followed by end of string)
re.findall(r'(.+)_\d+.jpg$',fname.name)
['Sphynx']

DataBlock

  1. Now we blocks expect dependent and independent variable
  2. get items --get images files
  3. splitter- Random splitt data
  4. get_y --using attribute which takes Regex LAbler function which will be passed to attribute "name"
  5. aug transform we saw in lesson 2 section aug transformer .. its basically synthetic
  6. Resize to very large image 460 then using aug trans to have smaller size

why?

this is called Presizing

details are below Steps

  1. resize grab square randomly if its portrait then grab randomly full width grab random from top to bottom
  2. secondly augmernt transform resize grab random wraped crop possibly rotated and turn that into square (rotation ,wrapping ,zooming) to smaller to 224 by 224

note : first step turning square. but seccond step can happen in gpu normally things like rotating and cropping are pretty slow

(rotation ,wrapping ,zooming) are actually desruptive to image becasue each one requires interpolation step which not just slow but makes images low quality

whats unique in fast ai

we are keeping track of changing. coordinate values in non-lossy way ,so the full floting point value and then once at very end we will do interpolation

look taddy bears

left - presizing approach right - using python libraries

there are wierd things over here Flaws

  1. less nicely focused
  2. grass
  3. distortion on leg sides

Details Presizing

We need our images to have the same dimensions, so that they can collate into tensors to be passed to the GPU. We also want to minimize the number of distinct augmentation computations we perform. The performance requirement suggests that we should, where possible, compose our augmentation transforms into fewer transforms (to reduce the number of computations and the number of lossy operations) and transform the images into uniform sizes (for more efficient processing on the GPU).

The challenge is that, if performed after resizing down to the augmented size, various common data augmentation transforms might introduce spurious empty zones, degrade data, or both. For instance, rotating an image by 45 degrees fills corner regions of the new bounds with emptiness, which will not teach the model anything. Many rotation and zooming operations will require interpolating to create pixels. These interpolated pixels are derived from the original image data but are still of lower quality.

To work around these challenges, presizing adopts two strategies that are shown in <>:</p>

  1. Resize images to relatively "large" dimensions—that is, dimensions significantly larger than the target training dimensions.
  2. Compose all of the common augmentation operations (including a resize to the final target size) into one, and perform the combined operation on the GPU only once at the end of processing, rather than performing the operations individually and interpolating multiple times.

The first step, the resize, creates images large enough that they have spare margin to allow further augmentation transforms on their inner regions without creating empty zones. This transformation works by resizing to a square, using a large crop size. On the training set, the crop area is chosen randomly, and the size of the crop is selected to cover the entire width or height of the image, whichever is smaller.

In the second step, the GPU is used for all data augmentation, and all of the potentially destructive operations are done together, with a single interpolation at the end.

</div> </div> </div>
"""
<img alt="Presizing on the training set" width="600" caption="Presizing on the training set" id="presizing" src="images/att_00060.png">
"""
'\n<img alt="Presizing on the training set" width="600" caption="Presizing on the training set" id="presizing" src="images/att_00060.png">\n'

This picture shows the two steps:

  1. Crop full width or height: This is in item_tfms, so it's applied to each individual image before it is copied to the GPU. It's used to ensure all images are the same size. On the training set, the crop area is chosen randomly. On the validation set, the center square of the image is always chosen.
  2. Random crop and augment: This is in batch_tfms, so it's applied to a batch all at once on the GPU, which means it's fast. On the validation set, only the resize to the final size needed for the model is done here. On the training set, the random crop and any other augmentations are done first.

To implement this process in fastai you use Resize as an item transform with a large size, and RandomResizedCrop as a batch transform with a smaller size. RandomResizedCrop will be added for you if you include the min_scale parameter in your aug_transforms function, as was done in the DataBlock call in the previous section. Alternatively, you can use pad or squish instead of crop (the default) for the initial Resize.

<> shows the difference between an image that has been zoomed, interpolated, rotated, and then interpolated again (which is the approach used by all other deep learning libraries), shown here on the right, and an image that has been zoomed and rotated as one operation and then interpolated just once on the left (the fastai approach), shown here on the left.</p> </div> </div> </div>

pets = DataBlock( blocks =(ImageBlock, CategoryBlock),
                get_items = get_image_files,
                splitter= RandomSplitter(seed=42),
                get_y = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'),'name'),
                item_tfms=Resize(460),
                batch_tfms=aug_transforms(size=224, min_scale=0.75))
dls = pets.dataloaders(path/"images")

Lets Debug DataLoader

show batch is for each mini batch it will show data if loaded properly

dls.show_batch(nrows=1,ncols=3)

Lets Debug Augmentation

get unique = "true"

dls.show_batch(nrows=1,unique=True,ncols=3)

Failure in DataBlock

Issues

  1. different images different sizes
  2. unable to collate them to batch

you can see everything happens

pets1 = DataBlock(blocks = (ImageBlock,CategoryBlock),
                 get_items= get_image_files,
                 splitter=RandomSplitter(seed=42),
                 get_y = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'),"name"))
pets1.summary(path/"images")

Question

What if your image size is less than resize?

Ans: if you remember lesson we look at different ways to create this thing

squish

Pad

etc

Squish and Pad will help you

You model can teach you about problem is your data

we are getting 7 percent error

learn = cnn_learner(dls,resnet34, metrics= error_rate)
learn.fine_tune(2)
epoch train_loss valid_loss error_rate time
0 1.499157 0.335323 0.113667 00:42
epoch train_loss valid_loss error_rate time
0 0.497216 0.289698 0.098106 00:52
1 0.333692 0.221010 0.076455 00:53

Train Model help Clean Data

why?

Initial model will help you clean data

Remember we have

interpret.toplosses help us identify mislables

confusion matrix help us where we are confused

ImageClassifierCleaner let us find for example two bears top confused things

Model helping you and then go ahead train data after cleaning

Notebook4 included loss function Fastai atomatically pick good loss function

Lets look what acc. it picks

learn.loss_func
FlattenedLoss of CrossEntropyLoss()

Cross Entropy

Same as Mnist loss ..kind of extended version

torch.where only works when you have binary outcome

we want to create just like that but we want to make it work more than two categories

Lets see whats inside Batch

destructure batch size = 64

dls.vocab

x,y = dls.one_batch()
y
TensorCategory([ 0, 25, 33, 19, 12, 22, 35, 17, 23, 32,  8,  5,  3, 33, 36, 34, 32,  7,  1, 14,  5, 14,  8, 36, 18, 13, 22,  5,  0, 20, 18, 33, 28, 28, 19, 33, 26,  0, 30, 25, 27, 23, 31,  1, 17, 13,  8, 23,
        34, 24, 28,  7, 13, 12, 31, 10, 29, 33, 22,  0, 21, 20,  3,  6], device='cuda:0')
dls.vocab
['Abyssinian', 'Bengal', 'Birman', 'Bombay', 'British_Shorthair', 'Egyptian_Mau', 'Maine_Coon', 'Persian', 'Ragdoll', 'Russian_Blue', 'Siamese', 'Sphynx', 'american_bulldog', 'american_pit_bull_terrier', 'basset_hound', 'beagle', 'boxer', 'chihuahua', 'english_cocker_spaniel', 'english_setter', 'german_shorthaired', 'great_pyrenees', 'havanese', 'japanese_chin', 'keeshond', 'leonberger', 'miniature_pinscher', 'newfoundland', 'pomeranian', 'pug', 'saint_bernard', 'samoyed', 'scottish_terrier', 'shiba_inu', 'staffordshire_bull_terrier', 'wheaten_terrier', 'yorkshire_terrier']
dls.vocab[16]
'boxer'

View the predictions

its just call the last activation

preds,_ = learn.get_preds(dl= [(x,y)])
preds[0]
tensor([9.5199e-01, 1.0111e-02, 6.5786e-05, 2.1975e-04, 7.8775e-04, 3.6360e-03, 3.0038e-03, 1.0312e-04, 2.4044e-02, 5.6874e-04, 5.1095e-05, 3.3239e-03, 2.5059e-05, 5.7563e-06, 1.0179e-05, 9.1572e-06,
        8.6417e-06, 2.3385e-04, 1.0258e-05, 1.3129e-05, 7.4432e-06, 8.3646e-06, 5.2068e-05, 1.6303e-05, 4.9455e-06, 6.5756e-06, 6.5956e-05, 9.1565e-06, 3.6239e-05, 1.1460e-05, 1.4704e-05, 3.3519e-05,
        9.6280e-04, 3.4956e-04, 7.9019e-07, 1.6408e-05, 1.7885e-04])
len(preds[0]), preds[0].sum()
(37, tensor(1.0000))

How do we go about this prediction

Softmax is an extension of sigmoid handle more than two categoreis

what if we want 37 cat.

we need one activation for 1 category e.g in case 3,7 activations are two

below 1st column is activation of 1st cat and 2nd is for 7

like how much like 3 and how much like 7

torch.random.manual_seed(42),
acts = torch.randn((6,2))*2
acts

#How much likely is first and how muc to 7 i.e
tensor([[ 0.6734,  0.2576],
        [ 0.4689,  0.4607],
        [-2.2457, -0.3727],
        [ 4.4164, -1.2760],
        [ 0.9233,  0.5347],
        [ 1.0698,  1.6187]])

Taking Sigmoid

if we take it values will be between 0 or 1

but dont add up to one

so doesnt make sense

acts.sigmoid()
tensor([[0.6623, 0.5641],
        [0.6151, 0.6132],
        [0.0957, 0.4079],
        [0.9881, 0.2182],
        [0.7157, 0.6306],
        [0.7446, 0.8346]])

Solution :

  1. So if we take difference

  2. Relative confidence: take sigmoid after

diff = acts[:,0]- acts[:,1]
diff
tensor([ 0.4158,  0.0083, -1.8731,  5.6924,  0.3886, -0.5489])
torch.stack([diff.sigmoid(),1-diff.sigmoid()],dim=1)
tensor([[0.6025, 0.3975],
        [0.5021, 0.4979],
        [0.1332, 0.8668],
        [0.9966, 0.0034],
        [0.5959, 0.4041],
        [0.3661, 0.6339]])

More than 2 cat:

Use Softmax

in binary case it is equal to sigmoid

The second column (the probability of it being a 7) will then just be that value subtracted from 1. Now, we need a way to do all this that also works for more than two columns. It turns out that this function, called softmax, is exactly that:

def softmax(x): return exp(x) / exp(x).sum(dim=1, keepdim=True)

jargon:Exponential function (exp): Literally defined as e**x, where e is a special number approximately equal to 2.718. It is the inverse of the natural logarithm function. Note that exp is always positive, and it increases very rapidly!

Let's check that softmax returns the same values as sigmoid for the first column, and those values subtracted from 1 for the second column:

sm_acts = torch.softmax(acts, dim=1)
sm_acts
tensor([[0.6025, 0.3975],
        [0.5021, 0.4979],
        [0.1332, 0.8668],
        [0.9966, 0.0034],
        [0.5959, 0.4041],
        [0.3661, 0.6339]])

softmax is the multi-category equivalent of sigmoid—we have to use it any time we have more than two categories and the probabilities of the categories must add to 1, and we often use it even when there are just two categories, just to make things a bit more consistent. We could create other functions that have the properties that all activations are between 0 and 1, and sum to 1; however, no other function has the same relationship to the sigmoid function, which we've seen is smooth and symmetric. Also, we'll see shortly that the softmax function works well hand-in-hand with the loss function we will look at in the next section.

If we have three output activations, such as in our bear classifier, calculating softmax for a single bear image would then look like something like <>.</p> </div> </div> </div>

"""
<img alt="Bear softmax example" width="280" id="bear_softmax" caption="Example of softmax on the bear classifier" src="images/att_00062.png">
"""
'\n<img alt="Bear softmax example" width="280" id="bear_softmax" caption="Example of softmax on the bear classifier" src="images/att_00062.png">\n'

Interesting thing

e to power something grows really fast see below

if we have one activation is bit bigger than other

then softmax is really big

its tries to pick one whcih one

thats not you always want sometimes you have inference time you want to bit concious

its default you do most of time

so that is somtmax

math.exp(4)
54.598150033144236

math.exp(6)

math.exp(6)
403.4287934927351

Log Likelihood

binary case. we did this

def mnist_loss(inputs, targets):
    inputs = inputs.sigmoid()
    return torch.where(targets==1, 1-inputs, inputs).mean()

its fine it worked so we could do thing exactly same thing

because tagets are not 0 or 1

targ = tensor([0,1,0,1,1,0])
sm_acts
tensor([[0.6025, 0.3975],
        [0.5021, 0.4979],
        [0.1332, 0.8668],
        [0.9966, 0.0034],
        [0.5959, 0.4041],
        [0.3661, 0.6339]])

Replace torch.where

  1. grab all number from 0-5
  2. my targets
  3. each row number it will pick particular column defined in target
  4. lets see pick column 0 for first row

so this is super nifty indexing expression you should play with

1st thing say which row you should return

second says which column

we can use that more than two values

idx = range(6)
sm_acts[idx,targ]
tensor([0.6025, 0.4979, 0.1332, 0.0034, 0.4041, 0.3661])
from IPython.display import HTML
df = pd.DataFrame(sm_acts , columns=["3","7"])
df["target"] = targ
df["idx"] = idx
df["loss"] = sm_acts[range(6), targ]
t= df.style.hide_index()

#To have html code compatible with our script
html = t._repr_html_().split('</style>')[1]
html = re.sub(r'<table id="([^"]+)"\s*>', r'<table >', html)
display(HTML(html))
3 7 target idx loss
0.602469 0.397531 tensor(0) 0 tensor(0.6025)
0.502065 0.497935 tensor(1) 1 tensor(0.4979)
0.133188 0.866811 tensor(0) 2 tensor(0.1332)
0.996640 0.003360 tensor(1) 3 tensor(0.0034)
0.595949 0.404051 tensor(1) 4 tensor(0.4041)
0.366118 0.633882 tensor(0) 5 tensor(0.3661)

How to make it Work More than 2 colums

  1. full mnist : we will have more 10 columns
  2. indexer

Negative Log Liklihood

There is no log in it we will see

-sm_acts[idx, targ]
tensor([-0.6025, -0.4979, -0.1332, -0.0034, -0.4041, -0.3661])
F.nll_loss(sm_acts,targ,reduction="none")
tensor([-0.6025, -0.4979, -0.1332, -0.0034, -0.4041, -0.3661])

Lets Talk about Logs

Problem : That means that our model will not care whether it predicts 0.99 or 0.999. Indeed, those numbers are so close together but in another sense, 0.999 is 10 times more confident than 0.99

The function we saw in the previous section works quite well as a loss function, but we can make it a bit better. The problem is that we are using probabilities, and probabilities cannot be smaller than 0 or greater than 1. That means that our model will not care whether it predicts 0.99 or 0.999. Indeed, those numbers are so close together—but in another sense, 0.999 is 10 times more confident than 0.99. So, we want to transform our numbers between 0 and 1 to instead be between negative infinity and infinnity. There is a mathematical function that does exactly this: the logarithm (available as torch.log). It is not defined for numbers less than 0, and looks like this:

so this log fuctoin

we can acc make it better

what if model decide 0.99 or 0.999

if we have 1000 things then right one is better than 0.99

so really we like to transform numbers between 0-1 instead -infinite to positive infinite

log will help us in this case

so

numbers as we closer to zero its goes down to infinity at 1 it goes to zeros

torch.log(tensor(0.01)), torch.log(tensor(1)),torch.log(tensor(0.))
(tensor(-4.6052), tensor(0.), tensor(-inf))

we cant go zero

our loss function we want to be negative

(y = b power a) a= log(y,b)

what intersting

log (a*b) = log(a) + log(b)

a*b can be very very big or small

adding not get out of control

when we take the probabilities such as sm_acts

  1. we take log

  2. we take mean

that is called negative log likelihood

*if we take softmax and then log

pass to nll_loss

which is cross Entropy Loss

why nll_loss dont take Log?

the reason it is more convinient to take log back at softmax step

so pytorch has fuction. log_softmax since it is very easir pytirch assume u did log softmax and pass to nll loss function

 

Two ways for Cross entropy Loss

single number because of mean

reduction = none

for looking all

loss_func=nn.CrossEntropyLoss()
loss_func(acts,targ)
tensor(1.8045)
F.cross_entropy(acts,targ)
tensor(1.8045)
nn.CrossEntropyLoss(reduction="none")(acts,targ)
tensor([0.5067, 0.6973, 2.0160, 5.6958, 0.9062, 1.0048])

why loss function needs to negative?

Lower it is better

needed to cuttoff

next week Data Ethics:

</div>