Gesture recognition helps computers to understand human body language. This helps to build a more potent link between humans and machines, rather than just the basic text user interfaces or graphical user interfaces (GUIs). In this project for gesture recognition, the human body's motions are read by computer camera. The computer then makes use of this data as input to handle applications. The objective of this project is to develop an interface which will capture human hand gesture dynamically and will control the volume level. For this, Deep Learning techniques such as Yolo model, Inception Net model+LSTM, 3-D CNN+LSTM and Time Distributed CNN+LSTM have been studied to compare the results of hand detection. The results of Yolo model outperform the other three models. The models were trained using Kaggle and 20% of the videos available in 20 billion jester dataset. After the hand detection in captured frames, the next step is to control the system volume depending on direction of hand movement. The hand movement direction is determined by generating and locating the bounding box on the detected hand.