Head pose estimation is an important problem as it facilitates tasks such as gaze estimation and attention modeling. In the automotive context, head pose provides crucial information about the driver’s mental state, including drowsiness, distraction and attention. It can also be used for interaction with in-vehicle infotainment systems. While computer vision algorithms using RGB cameras are reliable in controlled environments, head pose estimation is a challenging problem in the car due to sudden illumination changes, occlusions and large head rotations that are common in a vehicle. These issues can be partially alleviated by using depth cameras. Head rotation trajectories are continuous with important temporal dependencies. Our study leverages this observation, proposing a novel temporal deep learning model for head pose estimation from point cloud. The approach extracts discriminative feature representation directly from point cloud data, leveraging the 3D spatial structure of the face. The frame-based representations are then combined with bidirectional long short term memory (BLSTM) layers. We train this model on the newly collected multimodal driver monitoring (MDM) dataset, achieving better results compared to non-temporal algorithms using point cloud data, and state-of-the-art models using RGB images. We further show quantitatively and qualitatively that incorporating temporal information provides large improvements not only in accuracy, but also in the smoothness of the predictions.