Deep Learning — neural network python — Dog or Cat?

It is such an interesting technology that I heard so much about from many sources.

I have Always wanted to understand how machine learning is capable of figuring out things that only complicated minds of humans can.

Only us human can figure things out, that is correct until 1946, when Warren McCulloch, a neurophysiologist, and a young mathematician, Walter Pitts, wrote a paper on how neurons might work, learn more

It such a complicated issue. I cannot even begin explaining how this works from mathematics perspective. On the other hand it is relieving to know that as a python coder or any programming language you don’t have to know exactly how deep learning actually works. let’s demonstrate that by this well known example for the matter.

This demonstration is a python code that can predict given a specific picture whether it is a CAT or DOG.

Shall we begin coding?

Download data set for dogs and cats pictures: download link

Install the following python packages:


Import the following at the begging your python script:

import numpy as np
import os
import cv2
import random
import time
import pickle
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.callbacks import TensorBoard

Data Pre processing

In the following we go over all the pictures in the data set and convert them to an array of RBG with an image with these dimensions (100,100,3), then we append this array to the data object list. In each element of the data list we have a list of two elements, the first one is the array and the second is the label. The label stands for the index of the category that we took the picture from, in our case 0 for cat and 1 for dog.

directory = "/Users/username/Documents/PetImages"
categories = ["cat", "dog"]

data = []
IMG_SIZE = 100
for category in categories:
category_folder = os.path.join(directory, category)
for img in os.listdir(category_folder):
img_path = os.path.join(category_folder, img)
label = categories.index(category)
arr = cv2.imread(img_path)
new_arr = cv2.resize(arr, (IMG_SIZE, IMG_SIZE))
except Exception:
print("{0} corrupted".format(img_path))
data.append([new_arr, label])

We need to shuffle the data, so we give the training model pictures of both cats and dogs.


X and Y, where X stands for the input layer will have all the pictures arrays/matrix where Y stands for the output layer. Our output will be the label that we store in previous code section, the label will tell us whether it is a cat or a dog

X = []
y = []

for features, label in data:

Convert X and y to numpy array

X = np.array(X)
y = np.array(y)

Save the data arrays to files, we don’t want to create these arrays each time we run the training model, so for saving time, we saving these arrays to files and load them once we need them

pickle.dump(X, open("X.pkl", "wb"))
pickle.dump(y, open("y.pkl", "wb"))

Building ANN (Artificial Neural Network)

loading the data we saved in the previous section

X = pickle.load(open("X.pkl", "rb"))
y = pickle.load(open("y.pkl", "rb"))

Scaling X to be from 0 to 1 instead of from 1 to 255

X = X/255 # Scaling X to be from 0 to 1
print(X[0]) # to see example of X element
print(X.shape) # to see the shape of X, should be the number of pictures * IMG_SIZE*IMG_SIZE*3

Creating the training model

model = Sequential()

Adding features detector, matrix, with activation function relu, for understanding the following code I would recommend reading more about Convolution and Feature Map, learn more

model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(2, 2))
model.add(Conv2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D(2, 2))

Adding input layer and first hidden layer

model.add(Flatten()) # Flat the data before adding the input+hidden layer
model.add(Dense(128, input_shape=X.shape[1:], activation="relu")) # this line will add input layer and first hidden layer

One more hidden layer, this time no need for input_shape, because we already added the input layer

model.add(Dense(128, activation="relu"))

Output layer

model.add(Dense(2, activation="softmax"))

Notice that all this section will be the same procedure with every ANN you build but could be with different parameters, different number of hidden layers, different activation functions. You need to keep in mind that you need to a basic understanding of how these parameters will affect your training model and what each activation function does in order to have the ability to decide what to choose in each layer. As I heard from one of the resources, that its an absolute art.


Their is several algorithms to compile the ANN, one of these powerful algorithms is “adam”

loss function that will do the stochastic gradient descent for the logistic regression, learn more

The metrics is the criteria that u choose to evaluate your module

model.compile(optimizer="adam", loss="sparse_categorical_crossentropy", metrics=["accuracy"])


You will train the model that we created with the data that we got from the dogs and cats pictures

epochs: Integer. Number of epochs to train the model. An epoch is an iteration over the entire ‘x’ and ‘y’ data provided

validation_split: Float between 0 and 1. how much from the entire data will be taken to validate the model training, in our case is 0.1 means 10% of the data will be used to validate our training and calculate the actual accuracy of the model, the actual accuracy will be called val_accuracy, and the model training prediction accuracy will be called just accuracy

batch_size: Integer or ‘None’. Number of samples per batch of computation, y, epochs=5, validation_split=0.1, batch_size=32)

Output log will be like something like that:

702/702 [==============================] — 199s 282ms/step — loss: 0.6831 — accuracy: 0.5448 — val_loss: 0.5817 — val_accuracy: 0.7034
Epoch 2/5
702/702 [==============================] — 175s 250ms/step — loss: 0.5306 — accuracy: 0.7287 — val_loss: 0.4928 — val_accuracy: 0.7663
Epoch 3/5
702/702 [==============================] — 173s 246ms/step — loss: 0.4334 — accuracy: 0.7961 — val_loss: 0.4399 — val_accuracy: 0.7928
Epoch 4/5
702/702 [==============================] — 164s 234ms/step — loss: 0.3557 — accuracy: 0.8411 — val_loss: 0.3887 — val_accuracy: 0.8293
Epoch 5/5
702/702 [==============================] — 167s 239ms/step — loss: 0.2958 — accuracy: 0.8697 — val_loss: 0.4119 — val_accuracy: 0.8208

From the log we can understand that the prediction accuracy is 87% and the validation accuracy is 82% so its very good validation accuracy.

For single prediction:

Create an image array for a picture, and give to predict method of the model, the result will be an array of 1*2 dimension [0][0] will be one if the model predict a cat and [0][1] will be one if the model predict a dog

Single Prediction

# Single prediction
arr = cv2.imread("/Users/username/PycharmProjects/deep-learning/src/new_pet/dog/1.jpg")
new_arr = cv2.resize(arr, (100, 100))
new_prediction = [new_arr]
new_prediction = np.array(new_prediction)
new_prediction_array = model.predict(new_prediction)
if new_prediction_array[0][0] == 1:
print("Single prediction CAT")
print("Single prediction DOG")

Thanks for reading and Enjoy deep learning



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store