Evaluating SOMA and MoSh++ Using PSU-TMM100 Marker Data

Restricted (Penn State Only)
- Author:
- Johri, Divyesh
- Area of Honors:
- Computer Science
- Degree:
- Bachelor of Science
- Document Type:
- Thesis
- Thesis Supervisors:
- Yanxi Liu, Thesis Supervisor
John Joseph Hannan, Thesis Honors Advisor - Keywords:
- mocap
markers
labels
labeling
auto-labeler
labeler
automatic labeling
soma
mosh
tmm100
taiji
movement
3d body
motion capture
python
matlab - Abstract:
- Motion capture (mocap) is a widely used technology in numerous industries that provide digital representations of humans, animals, or objects. Subjects in motion are captured using cameras/sen- sors to detect reflective/infrared markers on bodies as three-dimensional (3D) marker points. To be useful for almost any visualization or animation task, the marker points must be labeled with their locations on the captured body. These labels are short alphanumeric phrases that represent specific locations on the human body anatomy, like “LANK” and “RANK” for the left and right ankles. Labeling, which is the mapping of marker points to labels, is difficult due to occlusions or ghost points, and typically requires manual corrections, even when using paid commercial solutions. Related works in the field provide solutions that typically have strict conditions, such as specific marker layouts, initializing poses, manually labeled frames, or poses that resemble the training data. “Solving Optical Marker-based Mo-cap Automatically” (SOMA), an artificial neural network used in machine learning that leverages self-attention and inexact graph-matching, is a novel approach to labeling that includes a post-processing step, “Motion and Shape Capture”(MoSh++), to visualize labeled markers with fitted 3D bodies. SOMA can accept sparse mocap data with any marker layout, pose, or body shape and generally provides results with over 80% accuracy using a solution trained on the “Archive of Motion Capture as Surface Shapes” (AMASS), a large repository of mocap datasets. The Motion Capture Lab for Smart Health and the Laboratory for Perception, Action, and Cognition (LPAC) at Penn State have jointly created and released a new, large mocap dataset of Taiji movements called “Penn State University Taiji Multimodal” (PSU-TMM100), which is a multimodal dataset that includes mocap joints, videos, foot pressure data, and labeled markers of 100 human motion sequences. In this thesis, TMM100 is used to verify SOMA’s accuracy and functionality, and the following contributions are made: 1. Detailed information on using SOMA and MoSh++ and how it can be used to setup SOMA and MoSh++ for TMM100 or any other dataset. 2. Detailed discussions of the results from applying SOMA on TMM100, including an analysis of SOMA’s accuracy and the quality of solved body renders from MoSh++. 3. Videos that showcase generated renders of bodies solved on labeled data using MoSh++ along with captured videos of 24-form Taiji from ten subjects in TMM100, Sequence #2.