3. Neural Network

A tutorial for beginners with Pytorch and FASTAI you can create your own classifier.

toc:true - badges: true
comments: true
author: HAFIZ AHMAD HASSAN & Jeremy Howard
categories: [jupyter]
image: images/chart-preview.png

This tutorial is created from Lecture 4 from FAST ai Course Deep Learning from coders Course I will go through step by step how to build Classifier using pytorch from scratch.

Image Classification

path = untar_data(URLs.PETS)

Why we are USIGN BASE_PATH

we want to nicely represent our data paths relative to our current path Look at path.ls()

Path.BASE_PATH = path

path.ls()

(#4) [Path('annotations'),Path('images'),Path('models'),Path('crappy')]

(path/"images").ls()

(#7394) [Path('images/Sphynx_245.jpg'),Path('images/miniature_pinscher_55.jpg'),Path('images/havanese_20.jpg'),Path('images/miniature_pinscher_34.jpg'),Path('images/samoyed_91.jpg'),Path('images/chihuahua_123.jpg'),Path('images/yorkshire_terrier_155.jpg'),Path('images/Egyptian_Mau_79.jpg'),Path('images/scottish_terrier_23.jpg'),Path('images/basset_hound_198.jpg')...]

Remember

Most of function we are using in fastai are belong to Class "L" instead of list Ehanced list ( showing number of items , more items are denoted as ".."

Last time first letter is capital then cat otherwise dog

here our case is different

Regular expression help us to get labels

Please google re if you havnt gone through

There is FASTai NLP course a--2 regix lessons

Bit hard to get sometimes

Lets Pick file name and see how it is

fname = (path/"images").ls()[0]

fname.name

'Sphynx_245.jpg'

Little Experiment With RE ( Regular Expression)

re is module
findall grab all parts of regular expression
that have parantheses around them
r is special kind string which says dont treat backslashes\ special remember in python backslashes is newline
'r(.+)\d.jpg '-- means string pick any "." letter "+" can be repeated one or more time which is followed by under score "" "\d+" followed by digit one or more time ("." --followed by anything can be . ) followed by "jpg" ("dollar" followed by end of string)

re.findall(r'(.+)_\d+.jpg$',fname.name)

['Sphynx']

DataBlock

Now we blocks expect dependent and independent variable
get items --get images files
splitter- Random splitt data
get_y --using attribute which takes Regex LAbler function which will be passed to attribute "name"
aug transform we saw in lesson 2 section aug transformer .. its basically synthetic
Resize to very large image 460 then using aug trans to have smaller size

why?

this is called Presizing

details are below Steps

resize grab square randomly if its portrait then grab randomly full width grab random from top to bottom
secondly augmernt transform resize grab random wraped crop possibly rotated and turn that into square (rotation ,wrapping ,zooming) to smaller to 224 by 224

note : first step turning square. but seccond step can happen in gpu normally things like rotating and cropping are pretty slow

(rotation ,wrapping ,zooming) are actually desruptive to image becasue each one requires interpolation step which not just slow but makes images low quality

whats unique in fast ai

we are keeping track of changing. coordinate values in non-lossy way ,so the full floting point value and then once at very end we will do interpolation

look taddy bears

left - presizing approach right - using python libraries

there are wierd things over here Flaws

less nicely focused
grass
distortion on leg sides

Details Presizing

"""
<img alt="Presizing on the training set" width="600" caption="Presizing on the training set" id="presizing" src="images/att_00060.png">
"""

'\n<img alt="Presizing on the training set" width="600" caption="Presizing on the training set" id="presizing" src="images/att_00060.png">\n'

FlattenedLoss of CrossEntropyLoss()

TensorCategory([ 0, 25, 33, 19, 12, 22, 35, 17, 23, 32,  8,  5,  3, 33, 36, 34, 32,  7,  1, 14,  5, 14,  8, 36, 18, 13, 22,  5,  0, 20, 18, 33, 28, 28, 19, 33, 26,  0, 30, 25, 27, 23, 31,  1, 17, 13,  8, 23,
        34, 24, 28,  7, 13, 12, 31, 10, 29, 33, 22,  0, 21, 20,  3,  6], device='cuda:0')

['Abyssinian', 'Bengal', 'Birman', 'Bombay', 'British_Shorthair', 'Egyptian_Mau', 'Maine_Coon', 'Persian', 'Ragdoll', 'Russian_Blue', 'Siamese', 'Sphynx', 'american_bulldog', 'american_pit_bull_terrier', 'basset_hound', 'beagle', 'boxer', 'chihuahua', 'english_cocker_spaniel', 'english_setter', 'german_shorthaired', 'great_pyrenees', 'havanese', 'japanese_chin', 'keeshond', 'leonberger', 'miniature_pinscher', 'newfoundland', 'pomeranian', 'pug', 'saint_bernard', 'samoyed', 'scottish_terrier', 'shiba_inu', 'staffordshire_bull_terrier', 'wheaten_terrier', 'yorkshire_terrier']

'boxer'

tensor([9.5199e-01, 1.0111e-02, 6.5786e-05, 2.1975e-04, 7.8775e-04, 3.6360e-03, 3.0038e-03, 1.0312e-04, 2.4044e-02, 5.6874e-04, 5.1095e-05, 3.3239e-03, 2.5059e-05, 5.7563e-06, 1.0179e-05, 9.1572e-06,
        8.6417e-06, 2.3385e-04, 1.0258e-05, 1.3129e-05, 7.4432e-06, 8.3646e-06, 5.2068e-05, 1.6303e-05, 4.9455e-06, 6.5756e-06, 6.5956e-05, 9.1565e-06, 3.6239e-05, 1.1460e-05, 1.4704e-05, 3.3519e-05,
        9.6280e-04, 3.4956e-04, 7.9019e-07, 1.6408e-05, 1.7885e-04])

(37, tensor(1.0000))

tensor([[ 0.6734,  0.2576],
        [ 0.4689,  0.4607],
        [-2.2457, -0.3727],
        [ 4.4164, -1.2760],
        [ 0.9233,  0.5347],
        [ 1.0698,  1.6187]])

tensor([[0.6623, 0.5641],
        [0.6151, 0.6132],
        [0.0957, 0.4079],
        [0.9881, 0.2182],
        [0.7157, 0.6306],
        [0.7446, 0.8346]])

tensor([ 0.4158,  0.0083, -1.8731,  5.6924,  0.3886, -0.5489])

"""
<img alt="Presizing on the training set" width="600" caption="Presizing on the training set" id="presizing" src="images/att_00060.png">
"""

'\n<img alt="Presizing on the training set" width="600" caption="Presizing on the training set" id="presizing" src="images/att_00060.png">\n'

pets = DataBlock( blocks =(ImageBlock, CategoryBlock),
                get_items = get_image_files,
                splitter= RandomSplitter(seed=42),
                get_y = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'),'name'),
                item_tfms=Resize(460),
                batch_tfms=aug_transforms(size=224, min_scale=0.75))

FlattenedLoss of CrossEntropyLoss()

TensorCategory([ 0, 25, 33, 19, 12, 22, 35, 17, 23, 32,  8,  5,  3, 33, 36, 34, 32,  7,  1, 14,  5, 14,  8, 36, 18, 13, 22,  5,  0, 20, 18, 33, 28, 28, 19, 33, 26,  0, 30, 25, 27, 23, 31,  1, 17, 13,  8, 23,
        34, 24, 28,  7, 13, 12, 31, 10, 29, 33, 22,  0, 21, 20,  3,  6], device='cuda:0')

['Abyssinian', 'Bengal', 'Birman', 'Bombay', 'British_Shorthair', 'Egyptian_Mau', 'Maine_Coon', 'Persian', 'Ragdoll', 'Russian_Blue', 'Siamese', 'Sphynx', 'american_bulldog', 'american_pit_bull_terrier', 'basset_hound', 'beagle', 'boxer', 'chihuahua', 'english_cocker_spaniel', 'english_setter', 'german_shorthaired', 'great_pyrenees', 'havanese', 'japanese_chin', 'keeshond', 'leonberger', 'miniature_pinscher', 'newfoundland', 'pomeranian', 'pug', 'saint_bernard', 'samoyed', 'scottish_terrier', 'shiba_inu', 'staffordshire_bull_terrier', 'wheaten_terrier', 'yorkshire_terrier']

'boxer'

tensor([9.5199e-01, 1.0111e-02, 6.5786e-05, 2.1975e-04, 7.8775e-04, 3.6360e-03, 3.0038e-03, 1.0312e-04, 2.4044e-02, 5.6874e-04, 5.1095e-05, 3.3239e-03, 2.5059e-05, 5.7563e-06, 1.0179e-05, 9.1572e-06,
        8.6417e-06, 2.3385e-04, 1.0258e-05, 1.3129e-05, 7.4432e-06, 8.3646e-06, 5.2068e-05, 1.6303e-05, 4.9455e-06, 6.5756e-06, 6.5956e-05, 9.1565e-06, 3.6239e-05, 1.1460e-05, 1.4704e-05, 3.3519e-05,
        9.6280e-04, 3.4956e-04, 7.9019e-07, 1.6408e-05, 1.7885e-04])

(37, tensor(1.0000))

tensor([[ 0.6734,  0.2576],
        [ 0.4689,  0.4607],
        [-2.2457, -0.3727],
        [ 4.4164, -1.2760],
        [ 0.9233,  0.5347],
        [ 1.0698,  1.6187]])

tensor([[0.6623, 0.5641],
        [0.6151, 0.6132],
        [0.0957, 0.4079],
        [0.9881, 0.2182],
        [0.7157, 0.6306],
        [0.7446, 0.8346]])

tensor([ 0.4158,  0.0083, -1.8731,  5.6924,  0.3886, -0.5489])

tensor([[0.6025, 0.3975],
        [0.5021, 0.4979],
        [0.1332, 0.8668],
        [0.9966, 0.0034],
        [0.5959, 0.4041],
        [0.3661, 0.6339]])

pets = DataBlock( blocks =(ImageBlock, CategoryBlock),
                get_items = get_image_files,
                splitter= RandomSplitter(seed=42),
                get_y = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'),'name'),
                item_tfms=Resize(460),
                batch_tfms=aug_transforms(size=224, min_scale=0.75))

dls = pets.dataloaders(path/"images")

Lets Debug DataLoader

show batch is for each mini batch it will show data if loaded properly

dls.show_batch(nrows=1,ncols=3)

Lets Debug Augmentation

get unique = "true"

dls.show_batch(nrows=1,unique=True,ncols=3)

Failure in DataBlock

Issues

different images different sizes
unable to collate them to batch

you can see everything happens

pets1 = DataBlock(blocks = (ImageBlock,CategoryBlock),
                 get_items= get_image_files,
                 splitter=RandomSplitter(seed=42),
                 get_y = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'),"name"))

pets1.summary(path/"images")

Question

What if your image size is less than resize?

Ans: if you remember lesson we look at different ways to create this thing

squish

Pad

etc

Squish and Pad will help you

You model can teach you about problem is your data

we are getting 7 percent error

learn = cnn_learner(dls,resnet34, metrics= error_rate)
learn.fine_tune(2)

Train Model help Clean Data

why?

Initial model will help you clean data

Remember we have

interpret.toplosses help us identify mislables

confusion matrix help us where we are confused

ImageClassifierCleaner let us find for example two bears top confused things

Model helping you and then go ahead train data after cleaning

Notebook4 included loss function Fastai atomatically pick good loss function

Lets look what acc. it picks

learn.loss_func

FlattenedLoss of CrossEntropyLoss()

Cross Entropy

Same as Mnist loss ..kind of extended version

torch.where only works when you have binary outcome

we want to create just like that but we want to make it work more than two categories

Lets see whats inside Batch

destructure batch size = 64

dls.vocab

x,y = dls.one_batch()

y

TensorCategory([ 0, 25, 33, 19, 12, 22, 35, 17, 23, 32,  8,  5,  3, 33, 36, 34, 32,  7,  1, 14,  5, 14,  8, 36, 18, 13, 22,  5,  0, 20, 18, 33, 28, 28, 19, 33, 26,  0, 30, 25, 27, 23, 31,  1, 17, 13,  8, 23,
        34, 24, 28,  7, 13, 12, 31, 10, 29, 33, 22,  0, 21, 20,  3,  6], device='cuda:0')

dls.vocab

['Abyssinian', 'Bengal', 'Birman', 'Bombay', 'British_Shorthair', 'Egyptian_Mau', 'Maine_Coon', 'Persian', 'Ragdoll', 'Russian_Blue', 'Siamese', 'Sphynx', 'american_bulldog', 'american_pit_bull_terrier', 'basset_hound', 'beagle', 'boxer', 'chihuahua', 'english_cocker_spaniel', 'english_setter', 'german_shorthaired', 'great_pyrenees', 'havanese', 'japanese_chin', 'keeshond', 'leonberger', 'miniature_pinscher', 'newfoundland', 'pomeranian', 'pug', 'saint_bernard', 'samoyed', 'scottish_terrier', 'shiba_inu', 'staffordshire_bull_terrier', 'wheaten_terrier', 'yorkshire_terrier']

dls.vocab[16]

'boxer'

View the predictions

its just call the last activation

preds,_ = learn.get_preds(dl= [(x,y)])
preds[0]

tensor([9.5199e-01, 1.0111e-02, 6.5786e-05, 2.1975e-04, 7.8775e-04, 3.6360e-03, 3.0038e-03, 1.0312e-04, 2.4044e-02, 5.6874e-04, 5.1095e-05, 3.3239e-03, 2.5059e-05, 5.7563e-06, 1.0179e-05, 9.1572e-06,
        8.6417e-06, 2.3385e-04, 1.0258e-05, 1.3129e-05, 7.4432e-06, 8.3646e-06, 5.2068e-05, 1.6303e-05, 4.9455e-06, 6.5756e-06, 6.5956e-05, 9.1565e-06, 3.6239e-05, 1.1460e-05, 1.4704e-05, 3.3519e-05,
        9.6280e-04, 3.4956e-04, 7.9019e-07, 1.6408e-05, 1.7885e-04])

len(preds[0]), preds[0].sum()

(37, tensor(1.0000))

How do we go about this prediction

Softmax is an extension of sigmoid handle more than two categoreis

what if we want 37 cat.

we need one activation for 1 category e.g in case 3,7 activations are two

below 1st column is activation of 1st cat and 2nd is for 7

like how much like 3 and how much like 7

torch.random.manual_seed(42),
acts = torch.randn((6,2))*2
acts

#How much likely is first and how muc to 7 i.e

tensor([[ 0.6734,  0.2576],
        [ 0.4689,  0.4607],
        [-2.2457, -0.3727],
        [ 4.4164, -1.2760],
        [ 0.9233,  0.5347],
        [ 1.0698,  1.6187]])

Taking Sigmoid

if we take it values will be between 0 or 1

but dont add up to one

so doesnt make sense

acts.sigmoid()

tensor([[0.6623, 0.5641],
        [0.6151, 0.6132],
        [0.0957, 0.4079],
        [0.9881, 0.2182],
        [0.7157, 0.6306],
        [0.7446, 0.8346]])

Solution :

So if we take difference
Relative confidence: take sigmoid after

diff = acts[:,0]- acts[:,1]

diff

tensor([ 0.4158,  0.0083, -1.8731,  5.6924,  0.3886, -0.5489])

torch.stack([diff.sigmoid(),1-diff.sigmoid()],dim=1)

tensor([[0.6025, 0.3975],
        [0.5021, 0.4979],
        [0.1332, 0.8668],
        [0.9966, 0.0034],
        [0.5959, 0.4041],
        [0.3661, 0.6339]])

More than 2 cat:

Use Softmax

in binary case it is equal to sigmoid

The second column (the probability of it being a 7) will then just be that value subtracted from 1. Now, we need a way to do all this that also works for more than two columns. It turns out that this function, called softmax, is exactly that:

def softmax(x): return exp(x) / exp(x).sum(dim=1, keepdim=True)

jargon:Exponential function (exp): Literally defined as e**x, where e is a special number approximately equal to 2.718. It is the inverse of the natural logarithm function. Note that exp is always positive, and it increases very rapidly!

Let's check that softmax returns the same values as sigmoid for the first column, and those values subtracted from 1 for the second column:

sm_acts = torch.softmax(acts, dim=1)
sm_acts

tensor([[0.6025, 0.3975],
        [0.5021, 0.4979],
        [0.1332, 0.8668],
        [0.9966, 0.0034],
        [0.5959, 0.4041],
        [0.3661, 0.6339]])

"""
<img alt="Bear softmax example" width="280" id="bear_softmax" caption="Example of softmax on the bear classifier" src="images/att_00062.png">
"""

'\n<img alt="Bear softmax example" width="280" id="bear_softmax" caption="Example of softmax on the bear classifier" src="images/att_00062.png">\n'

54.598150033144236

403.4287934927351

tensor([[0.6025, 0.3975],
        [0.5021, 0.4979],
        [0.1332, 0.8668],
        [0.9966, 0.0034],
        [0.5959, 0.4041],
        [0.3661, 0.6339]])

tensor([0.6025, 0.4979, 0.1332, 0.0034, 0.4041, 0.3661])

tensor([-0.6025, -0.4979, -0.1332, -0.0034, -0.4041, -0.3661])

tensor([-0.6025, -0.4979, -0.1332, -0.0034, -0.4041, -0.3661])

(tensor(-4.6052), tensor(0.), tensor(-inf))

tensor(1.8045)

tensor(1.8045)

"""
<img alt="Bear softmax example" width="280" id="bear_softmax" caption="Example of softmax on the bear classifier" src="images/att_00062.png">
"""

'\n<img alt="Bear softmax example" width="280" id="bear_softmax" caption="Example of softmax on the bear classifier" src="images/att_00062.png">\n'

Interesting thing

e to power something grows really fast see below

if we have one activation is bit bigger than other

then softmax is really big

its tries to pick one whcih one

thats not you always want sometimes you have inference time you want to bit concious

its default you do most of time

so that is somtmax

math.exp(4)

54.598150033144236

math.exp(6)

math.exp(6)

403.4287934927351

Log Likelihood

binary case. we did this

def mnist_loss(inputs, targets):
    inputs = inputs.sigmoid()
    return torch.where(targets==1, 1-inputs, inputs).mean()

its fine it worked so we could do thing exactly same thing

because tagets are not 0 or 1

targ = tensor([0,1,0,1,1,0])

sm_acts

tensor([[0.6025, 0.3975],
        [0.5021, 0.4979],
        [0.1332, 0.8668],
        [0.9966, 0.0034],
        [0.5959, 0.4041],
        [0.3661, 0.6339]])

Replace torch.where

grab all number from 0-5
my targets
each row number it will pick particular column defined in target
lets see pick column 0 for first row

so this is super nifty indexing expression you should play with

1st thing say which row you should return

second says which column

we can use that more than two values

idx = range(6)
sm_acts[idx,targ]

tensor([0.6025, 0.4979, 0.1332, 0.0034, 0.4041, 0.3661])

from IPython.display import HTML
df = pd.DataFrame(sm_acts , columns=["3","7"])
df["target"] = targ
df["idx"] = idx
df["loss"] = sm_acts[range(6), targ]
t= df.style.hide_index()

#To have html code compatible with our script
html = t._repr_html_().split('</style>')[1]
html = re.sub(r'<table id="([^"]+)"\s*>', r'<table >', html)
display(HTML(html))

How to make it Work More than 2 colums

full mnist : we will have more 10 columns
indexer

Negative Log Liklihood

There is no log in it we will see

-sm_acts[idx, targ]

tensor([-0.6025, -0.4979, -0.1332, -0.0034, -0.4041, -0.3661])

F.nll_loss(sm_acts,targ,reduction="none")

tensor([-0.6025, -0.4979, -0.1332, -0.0034, -0.4041, -0.3661])

Lets Talk about Logs

Problem : That means that our model will not care whether it predicts 0.99 or 0.999. Indeed, those numbers are so close together but in another sense, 0.999 is 10 times more confident than 0.99

The function we saw in the previous section works quite well as a loss function, but we can make it a bit better. The problem is that we are using probabilities, and probabilities cannot be smaller than 0 or greater than 1. That means that our model will not care whether it predicts 0.99 or 0.999. Indeed, those numbers are so close together—but in another sense, 0.999 is 10 times more confident than 0.99. So, we want to transform our numbers between 0 and 1 to instead be between negative infinity and infinnity. There is a mathematical function that does exactly this: the logarithm (available as torch.log). It is not defined for numbers less than 0, and looks like this:

so this log fuctoin

we can acc make it better

what if model decide 0.99 or 0.999

if we have 1000 things then right one is better than 0.99

so really we like to transform numbers between 0-1 instead -infinite to positive infinite

log will help us in this case

so

numbers as we closer to zero its goes down to infinity at 1 it goes to zeros

torch.log(tensor(0.01)), torch.log(tensor(1)),torch.log(tensor(0.))

(tensor(-4.6052), tensor(0.), tensor(-inf))

we cant go zero

our loss function we want to be negative

(y = b power a) a= log(y,b)

what intersting

log (a*b) = log(a) + log(b)

a*b can be very very big or small

adding not get out of control

when we take the probabilities such as sm_acts

we take log
we take mean

that is called negative log likelihood

*if we take softmax and then log

pass to nll_loss

which is cross Entropy Loss

why nll_loss dont take Log?

the reason it is more convinient to take log back at softmax step

so pytorch has fuction. log_softmax since it is very easir pytirch assume u did log softmax and pass to nll loss function

Two ways for Cross entropy Loss

single number because of mean

reduction = none

for looking all

loss_func=nn.CrossEntropyLoss()

loss_func(acts,targ)

tensor(1.8045)

F.cross_entropy(acts,targ)

tensor(1.8045)

nn.CrossEntropyLoss(reduction="none")(acts,targ)

tensor([0.5067, 0.6973, 2.0160, 5.6958, 0.9062, 1.0048])

why loss function needs to negative?

Lower it is better

needed to cuttoff

next week Data Ethics:

epoch	train_loss	valid_loss	error_rate	time
0	0.497216	0.289698	0.098106	00:52
1	0.333692	0.221010	0.076455	00:53

3	7	target	idx	loss
0.602469	0.397531	tensor(0)	0	tensor(0.6025)
0.502065	0.497935	tensor(1)	1	tensor(0.4979)
0.133188	0.866811	tensor(0)	2	tensor(0.1332)
0.996640	0.003360	tensor(1)	3	tensor(0.0034)
0.595949	0.404051	tensor(1)	4	tensor(0.4041)
0.366118	0.633882	tensor(0)	5	tensor(0.3661)