In this project, we will develop a video prediction model that generates future frames based on short input sequences from the UCF101 dataset, a dataset capturing various human activities. The primary goal is to predict multiple consecutive frames to create a coherent video clip