Summary: Facial expressions are generated by the actions of the facial muscles located at different facial regions. The spatial dependencies of different spatial facial regions are worth exploring and can improve the performance of facial expression recognition. In this letter we propose a joint convolutional bidirectional long short-term memory (JCBLSTM) framework to model the discriminative facial textures and spatial relations between different regions jointly. We treat each row or column of feature maps output from CNN as individual ordered sequence and employ LSTM to model the spatial dependencies within it. Moreover, a shortcut connection for convolutional feature maps is introduced for joint feature representation. We conduct experiments on two databases to evaluate the proposed JCBLSTM method. The experimental results demonstrate that the JCBLSTM method achieves state-of-the-art performance on Multi-PIE and very competitive result on FER-2013.