Graphing the Future: Activity and Next Active Object Prediction Using Graph-Based Activity Representations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13598))

Included in the following conference series:

International Symposium on Visual Computing

742 Accesses
3 Citations

Abstract

We present a novel approach for the visual prediction of human-object interactions in videos. Rather than forecasting the human and object motion or the future hand-object contact points, we aim at predicting (a) the class of the on-going human-object interaction and (b) the class(es) of the next active object(s) (NAOs), i.e., the object(s) that will be involved in the interaction in the near future as well as the time the interaction will occur. Graph matching relies on the efficient Graph Edit distance (GED) method. The experimental evaluation of the proposed approach was conducted using two well-established video datasets that contain human-object interactions, namely the MSR Daily Activities and the CAD120. High prediction accuracy was obtained for both action prediction and NAO forecasting.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: EUR 29.95; Price includes VAT (France)

eBook: EUR 67.40; Price includes VAT (France)

Softcover Book: EUR 84.39; Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Context-Aware Activity Forecasting

Predicting Human Actions Taking into Account Object Affordances

Article Open access 04 April 2018

Context Helps: Integrating Context Information with Videos in a Graph-Based HAR Framework

References

Abu-Aisheh, Z., Raveaux, R., Ramel, J.Y., Martineau, P.: An exact graph edit distance algorithm for solving pattern recognition problems. In: ICPRAM (2015)
Google Scholar
Abu Farha, Y., Ke, Q., Schiele, B., Gall, J.: Long-term anticipation of activities with cycle consistency. In: Akata, Z., Geiger, A., Sattler, T. (eds.) DAGM GCPR 2020. LNCS, vol. 12544, pp. 159–173. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71278-5_12
Chapter Google Scholar
Alfaifi, R., Artoli, A.M.: Human action prediction with 3D-CNN. SN Comput. Sci. 1(5), 1–15 (2020). https://doi.org/10.1007/s42979-020-00293-x
Article Google Scholar
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Cao, K., Ji, J., Cao, Z., Chang, C.Y., Niebles, J.C.: Few-shot video classification via temporal alignment. In: CVPR, pp. 10618–10627 (2020)
Google Scholar
Damen, D., et al.: Rescaling egocentric vision: collection, pipeline and challenges for EPIC-KITCHENS-100. Int. J. Comput. Vis. 130(1), 33–55 (2021). https://doi.org/10.1007/s11263-021-01531-2
Article MathSciNet Google Scholar
Dessalene, E., Devaraj, C., Maynord, M., Fermuller, C., Aloimonos, Y.: Forecasting action through contact representations from first person video. In: IEEE PAMI (2021)
Google Scholar
Fellbaum, C.: WordNet and WordNets (2005)
Google Scholar
Furnari, A., Battiato, S., Grauman, K., Farinella, G.M.: Next-active-object prediction from egocentric videos. J. Vis. Commun. Image Represent. 49, 401–411 (2017)
Article Google Scholar
Furnari, A., Farinella, G.M.: What would you expect? Anticipating egocentric actions with rolling-unrolling LSTMs and modality attention. In: IEEE ICCV, pp. 6252–6261 (2019)
Google Scholar
Furnari, A., Farinella, G.M.: Rolling-unrolling LSTMs for action anticipation from first-person video. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 4021–4036 (2020)
Article Google Scholar
Girdhar, R., Grauman, K.: Anticipative video transformer. In: IEEE ICCV, pp. 13505–13515 (2021)
Google Scholar
Grauman, K., et al.: Ego4D: around the world in 3,000 hours of egocentric video. In: IEEE CVPR, pp. 18995–19012 (2022)
Google Scholar
Hu, J.F., Zheng, W.S., Ma, L., Wang, G., Lai, J., Zhang, J.: Early action prediction by soft regression. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2568–2583 (2018)
Article Google Scholar
Hu, X., Dai, J., Li, M., Peng, C., Li, Y., Du, S.: Online human action detection and anticipation in videos: a survey. Neurocomputing 491, 395–413 (2022)
Article Google Scholar
Kong, Yu., Fu, Y.: Human action recognition and prediction: a survey. Int. J. Comput. Vis. 130, 1366–1401 (2022). https://doi.org/10.1007/s11263-022-01594-9
Article Google Scholar
Kong, Y., Gao, S., Sun, B., Fu, Y.: Action prediction from videos via memorizing hard-to-predict samples. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, No. 1 (2018)
Google Scholar
Koppula, H., Gupta, R., Saxena, A.: Learning human activities and object affordances from RGB-D videos. Int. J. Robot. Res. 32(8), 951–970 (2013)
Article Google Scholar
Liu, M., Tang, S., Li, Y., Rehg, J.M.: Forecasting human-object interaction: joint prediction of motor attention and actions in first person video. In: ECCV (2020)
Google Scholar
Liu, S., Tripathi, S., Majumdar, S., Wang, X.: Joint hand motion and interaction hotspots prediction from egocentric videos. In: IEEE CVPR, pp. 3282–3292 (2022)
Google Scholar
Loper, E., Bird, S.: NLTK: the natural language toolkit. arXiv preprint arXiv:cs/0205028 (2002)
Manousaki, V., Argyros, A.A.: Segregational soft dynamic time warping and its application to action prediction. In: VISIGRAPP (5: VISAPP), pp. 226–235 (2022)
Google Scholar
Manousaki, V., Papoutsakis, K., Argyros, A.: Action prediction during human-object interaction based on DTW and early fusion of human and object representations. In: Vincze, M., Patten, T., Christensen, H.I., Nalpantidis, L., Liu, M. (eds.) ICVS 2021. LNCS, vol. 12899, pp. 169–179. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87156-7_14
Chapter Google Scholar
Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957)
Article MathSciNet MATH Google Scholar
Nagarajan, T., Feichtenhofer, C., Grauman, K.: Grounded human-object interaction hotspots from video. In: IEEE ICCV, pp. 8688–8697 (2019)
Google Scholar
Oprea, S., et al.: A review on deep learning techniques for video prediction. IEEE PAMI (2020)
Google Scholar
Panagiotakis, C., Papoutsakis, K., Argyros, A.: A graph-based approach for detecting common actions in motion capture data and videos. Pattern Recognit. 79, 1–11 (2018)
Article Google Scholar
Papoutsakis, K., Panagiotakis, C., Argyros, A.A.: Temporal action co-segmentation in 3D motion capture data and videos. In: IEEE CVPR, pp. 6827–6836 (2017)
Google Scholar
Papoutsakis, K.E., Argyros, A.A.: Unsupervised and explainable assessment of video similarity. In: BMVC, p. 151 (2019)
Google Scholar
Petković, T., Petrović, L., Marković, I., Petrović, I.: Human action prediction in collaborative environments based on shared-weight LSTMs with feature dimensionality reduction. Appl. Soft Comput. 126, 109245 (2022)
Article Google Scholar
Ragusa, F., Furnari, A., Livatino, S., Farinella, G.M.: The MECCANO dataset: understanding human-object interactions from egocentric videos in an industrial-like domain. In: IEEE WACV, pp. 1569–1578 (2021)
Google Scholar
Reily, B., Han, F., Parker, L.E., Zhang, H.: Skeleton-based bio-inspired human activity prediction for real-time human–robot interaction. Auton. Robot. 42(6), 1281–1298 (2017). https://doi.org/10.1007/s10514-017-9692-3
Article Google Scholar
Rodin, I., Furnari, A., Mavroeidis, D., Farinella, G.M.: Predicting the future from first person (egocentric) vision: a survey. Comput. Vis. Image Underst. 211, 103252 (2021)
Article Google Scholar
Rodin, I., Furnari, A., Mavroeidis, D., Farinella, G.M.: Untrimmed action anticipation. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing (ICIAP 2022). ICIAP 2022. LNCS, vol. 13233, pp. 337–348. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-06433-3_29
Sener, F., Singhania, D., Yao, A.: Temporal aggregate representations for long-range video understanding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 154–171. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_10
Chapter Google Scholar
Wang, C., Wang, Y., Xu, M., Crandall, D.J.: Stepwise goal-driven networks for trajectory prediction. IEEE Robot. Autom. Lett. 7(2), 2716–2723 (2022)
Article Google Scholar
Wang, J., Liu, Z., Wu, Y., Yuan, J.: Mining actionlet ensemble for action recognition with depth cameras. In: IEEE CVPR, pp. 1290–1297 (2012)
Google Scholar
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Wu, X., Wang, R., Hou, J., Lin, H., Luo, J.: Spatial–temporal relation reasoning for action prediction in videos. Int. J. Comput. Vis. 129(5), 1484–1505 (2021). https://doi.org/10.1007/s11263-020-01409-9
Article Google Scholar
Wu, Z., Palmer, M.: Verb semantics and lexical selection. arXiv preprint arXiv:cmp-lg/9406033 (1994)
Xu, X., Li, Y.L., Lu, C.: Learning to anticipate future with dynamic context removal. In: IEEE CVPR, pp. 12734–12744 (2022)
Google Scholar
Yagi, T., Hasan, M.T., Sato, Y.: Hand-object contact prediction via motion-based pseudo-labeling and guided progressive label correction. arXiv preprint arXiv:2110.10174 (2021)
Zatsarynna, O., Abu Farha, Y., Gall, J.: Multi-modal temporal convolutional network for anticipating actions in egocentric videos. In: IEEE CVPR, pp. 2249–2258 (2021)
Google Scholar
Zhou, B., Andonian, A., Oliva, A., Torralba, A.: Temporal relational reasoning in videos. In: ECCV, pp. 803–818 (2018)
Google Scholar

Download references

Acknowledgements

This research was co-financed by Greece and the European Union (European Social Fund-ESF) through the Operational Programme “Human Resources Development, Education and Lifelong Learning” in the context of the Act “Enhancing Human Resources Research Potential by undertaking a Doctoral Research” Sub-action 2: IKY Scholarship Programme for PhD candidates in the Greek Universities. The research work was also supported by the Hellenic Foundation for Research and Innovation (HFRI) under the HFRI PhD Fellowship grant (Fellowship Number: 1592) and by HFRI under the “1st Call for HFRI Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment”, project I.C.Humans, number 91.

Author information

Authors and Affiliations

Computer Science Department, University of Crete, Heraklion, Greece
Victoria Manousaki & Antonis Argyros
Institute of Computer Science, Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece
Victoria Manousaki, Konstantinos Papoutsakis & Antonis Argyros

Authors

Victoria Manousaki
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Papoutsakis
View author publications
You can also search for this author in PubMed Google Scholar
Antonis Argyros
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Victoria Manousaki or Antonis Argyros .

Editor information

Editors and Affiliations

University of Nevada, Reno, NV, USA
George Bebis
University of Illinois Urbana-Champaign, Urbana, IL, USA
Bo Li
National University of Singapore, Singapore, Singapore
Angela Yao
Microsoft Research Asia, Beijing, China
Yang Liu
University of Missouri, Columbia, MO, USA
Ye Duan
City University of Hong Kong, Kowloon, Hong Kong
Manfred Lau
Idaho National Laboratory, Idaho Falls, ID, USA
Rajiv Khadka
Salesforce, Seattle, WA, USA
Ana Crisan
Tufts University, Medford, MA, USA
Remco Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Manousaki, V., Papoutsakis, K., Argyros, A. (2022). Graphing the Future: Activity and Next Active Object Prediction Using Graph-Based Activity Representations. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2022. Lecture Notes in Computer Science, vol 13598. Springer, Cham. https://doi.org/10.1007/978-3-031-20713-6_23

Download citation

DOI: https://doi.org/10.1007/978-3-031-20713-6_23
Published: 11 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20712-9
Online ISBN: 978-3-031-20713-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Graphing the Future: Activity and Next Active Object Prediction Using Graph-Based Activity Representations

Abstract

Access this chapter

Subscribe and save

Buy Now