DeepMind and Google researchers have proposed a powerful new graph matching network (GMN) model for the retrieval and matching of graph structured objects. GMN uses similarity learning for graph structured objects and outperforms graph neural network (GNN) models on graph similarity learning (GSL) tasks.

Solving GSL problems has many important applications, especially for similarity-based search in a graph database. Researchers studied the use of GNN, and demonstrated how to train a GNN to generate graph embedding in vector spaces that performs efficient similarity reasoning.

Researchers further proposed GMN, a GNN extension for performing similarity learning. Instead of calculating each graph representation separately, this new GMN model calculates similarity scores through a cross-graph attention mechanism to correlate the nodes between the graphs and identify the differences. The model relies on paired graphs to compute graph representations. It is more powerful than embedding models and provides a good trade-off between accuracy and computation.

Researchers evaluated the performance of the GSL framework and compared the GNN embedding model and the GMN on three tasks.

Learning Graph Edit Distances

Researchers determined that the GSL models trained on graphs with few specific distributions performed better than generic baselines; while the GMN models consistently outperformed the embedding model (GNNs) as expected

Comparing graph embedding (GNN) and matching (GMN) models trained on graphs from different distributions with the baseline, measuring pair AUC / triplet accuracy (×100).

Visualization of cross-graph attention for GMNs after 5 propagation layers. In each pair of graphs the left figure shows the attention from left graph to the right, the right figure shows the opposite.

Control Flow Graph Based Binary Function Similarity Search

Performance (×100) of different models on the binary function similarity search task.

Control flow graph based binary function similarity search plays a key role in the retrieval of software system vulnerabilities. The above graph shows the performance of different models with different propagation steps and different data settings on binary function similarity search tasks. The GMN is superior to the GNN embedding model in all setup and propagation steps.

More Baselines and Ablation Studies

More results on the function similarity search task and the extra COIL-DEL dataset.

Researchers examined the effects of different components in GMN and compared the GMN model to the Siamese version of the Graph Convolutional Network (GCN), GNN, and GNN/GCN embedded models. GMN proved superior to the Siamese model, indicating the importance of communicating cross-graph information early in the calculation process.

The experiment results show that the GMN model can not only utilize structure in the context of similarity learning, but also outperform the domain-specific baseline system hand-engineered for these problems.

GMN can be a more powerful alternative to GNN because they compare at all levels across the pair of graphs in addition to the embedding computation, rather than independently mapping each graph to an embedding. The trade-off is added computation cost especially for large graphs. Moreover, GMN cannot directly be used for indexing and searching through large graph databases.

Researchers conclude that GMN are best used when the only concern is the similarity between individual pairs, or “in a retrieval setting together with a faster filtering model like the graph embedding model or standard graph similarity search methods, to narrow down the search to a smaller candidate set, and then use the more expensive matching model to rerank the candidates to improve precision.”

The paper Graph Matching Networks for Learning the Similarity of Graph Structured Objects has been accepted by ICML 2019 and is on arXiv.