The identification and classification of items in this world surrounding us has been a part of machine learning and a primary element of image recognition. Computers can be taught to identify the visual factors on images given so that it could use the trained neural network to extract features for this demanding responsibility. In our project, we trained multinomial classifiers and used Convolutional neural networks for learning the features of the images and analysed the output to improve the accuracy. We built database by collecting the universally used databases available. Our initial attempt for training using VGG16 model wasn’t providing satisfactory results. So we assessed the performance of the CNN architectures chosen namely VGG16, InceptionV3 & MobileNet to have knowledge on the outputs obtained from different models and then to select the model with highest performance comparatively by analysing the output. We got an accuracy of around 78% for VGG16, around 87% for MobileNet, around 90% for InceptionV3. We selected InceptionV3 since it had better performance comparatively, for further processing, we used the model to train on NVIDIA DGX and acquired an accuracy of 97.88. The output acquired over the dataset consisting of four classifiers shows that this is the maximum accuracy that can be obtained using the currently available dataset and techniques. For learning purpose, we tried increasing the amount of dataset and observed that the accuracy is improved. The initial dataset consisted of images around 3300 and we further increased it to 5100. The results point out that the performance and accuracy for the model can be enhanced using advanced architectures and providing large amount of images as input in the dataset.