Task Graph Based Mistake Detection in Instructional Videos

Open Access

Author:: Sell, Parker
Area of Honors:: Computer Science
Degree:: Bachelor of Science
Document Type:: Thesis
Thesis Supervisors:: Huijuan Xu, Thesis Supervisor
John Joseph Hannan, Thesis Honors Advisor
Keywords:: Mistake Recognition
Machine Learning
Instructional Videos
Assembly Videos
Task Graph
Visual Recognition Network
Abstract:: The primary focus of this thesis is to examine the impact of integrating a structural task graph into a visual recognition network to accurately identify and segment errors in the assembly of toy cars. We have introduced enhancements to two baseline networks that specifically encode the structural and sequential intricacies of assembly processes. These enhancements have led to state-of-the-art performance in visual-only mistake recognition task, marking a 3.7% increase in the F1-score over existing benchmarks within the Assembly101 dataset. Moreover, our work pioneers in addressing the temporal mistake segmentation task which does not rely on ground truth action segments during test time. The advancements presented have yielded substantial improvements over baseline models, with a 5% increase in F1 @ 10, 3.8% at F1 @ 25, and 1.8% at F1 @ 50. Our results indicate the significant role that graph construction and attention-based mechanisms play in enhancing mistake recognition and temporal mistake segmentation tasks, setting a new precedent for future research in mistake detection.

Tools