Image-Based Food Calorie Estimation Using Recipe Information

Takumi EGE  Keiji YANAI  

IEICE TRANSACTIONS on Information and Systems   Vol.E101-D   No.5   pp.1333-1341
Publication Date: 2018/05/01
Online ISSN: 1745-1361
DOI: 10.1587/transinf.2017MVP0027
Type of Manuscript: Special Section PAPER (Special Section on Machine Vision and its Applications)
Category: Machine Vision and its Applications
food image recognition,  image-based food calorie estimation,  convolutional neural network,  multi-task CNN,  

Full Text: PDF(2.7MB)
>>Buy this Article

Recently, mobile applications for recording everyday meals draw much attention for self dietary. However, most of the applications return food calorie values simply associated with the estimated food categories, or need for users to indicate the rough amount of foods manually. In fact, it has not been achieved to estimate food calorie from a food photo with practical accuracy, and it remains an unsolved problem. Then, in this paper, we propose estimating food calorie from a food photo by simultaneous learning of food calories, categories, ingredients and cooking directions using deep learning. Since there exists a strong correlation between food calories and food categories, ingredients and cooking directions information in general, we expect that simultaneous training of them brings performance boosting compared to independent single training. To this end, we use a multi-task CNN. In addition, in this research, we construct two kinds of datasets that is a dataset of calorie-annotated recipe collected from Japanese recipe sites on the Web and a dataset collected from an American recipe site. In the experiments, we trained both multi-task and single-task CNNs, and compared them. As a result, a multi-task CNN achieved the better performance on both food category estimation and food calorie estimation than single-task CNNs. For the Japanese recipe dataset, by introducing a multi-task CNN, 0.039 were improved on the correlation coefficient, while for the American recipe dataset, 0.090 were raised compared to the result by the single-task CNN. In addition, we showed that the proposed multi-task CNN based method outperformed search-based methods proposed before.