COMPARISON OF SIMILARITY FUNCTIONS FOR VECTOR AGGREGATION IN NETWORK REPRESENTATION LEARNING MODELS
Open Access
Author:
Colom, Daniel A
Area of Honors:
Computer Engineering
Degree:
Bachelor of Science
Document Type:
Thesis
Thesis Supervisors:
Wang-Chien Lee, Thesis Supervisor Dr. Jesse Louis Barlow, Thesis Honors Advisor
Keywords:
Machine Learning Neural Network Information Network Representation Learning Similarity Functions
Abstract:
This research investigates the effects of different similarity functions on various representation learning models for heterogeneous information networks. The similarity function adopted in a model determines how the learned latent vectors of two nodes in an information network are aggregated to achieve the learning objective of the model based on the similarity between these two nodes, and as a result outputs a vectorized representation of nodes in the network. This output, i.e., learned representation of nodes in forms of latent feature vectors, is shown to be very important and useful to many network mining tasks on large datasets. Thus, making a good choice of the similarity function used in the training models for network representation learning has a critical impact on the results of those network mining tasks. Two recent representation learning models for information networks, DeepWalk and HIN2Vec, make use of the dot product similarity function for vector aggregation. In this paper, we explore other similarity functions including cosine similarity and Euclidean distance to compare their effects on some network mining tasks. In order to account for variation in network complexity and structure, we used various distinct datasets for an experimental evaluation. We compare the effectiveness of the similarity functions by using the representations generated from DeepWalk and HIN2Vec to perform prediction and classification tasks.