Recent Advances in Kernel-Based Graph Classification

Recent Advances in Kernel-Based Graph
Classiﬁcation
ECML PKDD 2017, Nectar Track
Nils Kriege, Christopher Morris
June 20, 2017
TU Dortmund University, Algorithm Engineering Group

Motivation
Question
How similar are two graphs?
(a) Sildenaﬁl (b) Vardenaﬁl
1

High-level View: Supervised Graph Classiﬁcation
⊆ H
φ
2

Primer on Graph Kernels
Question
How similar are two graphs?
Deﬁnition (Graph Kernel)
Let 𝒢 be a non-empty set of graphs and let k: 𝒢 × 𝒢 → R. Then k is
a graph kernel if there is a real Hilbert space ℋ and a feature map
𝜑: 𝒢 → ℋ such that k(G, H) = ⟨𝜑(G), 𝜑(H)⟩.
Explicit vs. Implicit
Exp.
(EX)
Imp.
(IM)
G H
Inner Product
PSD function
𝜑(G)
𝜑(H)
k(G, H)
3

Talk Structure
1 Explict vs. Implicit Graph Kernels, IEEE ICDM 2014
2 Fast Kernels for Graphs with Continuous Labels, IEEE ICDM 2016
3 Graph Kernels Based on Optimal Assignments, NIPS 2016
4 Outlook/What’s next?
4

Part I: Explicit vs. Implicit Graph Kernels
Challenge
Investigate the beneﬁts of explicit and implicit graph kernels.
N. Kriege, M. Neumann, K. Kersting, and P. Mutzel. “Explicit versus
Implicit Graph Feature Maps: A Computational Phase Transition for
Walk Kernels”. In: IEEE International Conference on Data Mining.
2014, pp. 881–886
N. M. Kriege, M. Neumann, C. Morris, K. Kersting, and P. Mutzel. “A
Unifying View of Explicit and Implicit Feature Maps for Structured
Data: Systematic Studies of Graph Kernels”. In: CoRR abs/1703.00676
(2017). url: http://arxiv.org/abs/1703.00676
5

𝜑 Cont. Labels Run time
Random Walk [Gärtner et al., 2003] IM 𝒪(n2𝜔
)
Shortest-Path [Borgwardt et al., 2005] IM 𝒪(n4
)
Subgraph Matching [Kriege, Mutzel, 2012] IM 𝒪(kn2k+2
)
GraphHopper [Feragen et al., 2013] IM 𝒪(n2
m)
Graphlet [Shervashidze et al., 2009] EX
NSPDK [Costa et al., 2010] EX
Weisfeiler-Lehman [Shervashidze et al., 2011] EX 𝒪(hm)
Propagation [Neumann et al., 2016] EX
Implicit vs. Explicit
• Implicit Kernels: do not scale, extendable to continuous labels
• Explicit Kernels: do scale, only discrete labels
6

Challenge
7

Challenge
Contribution
• Conditions under which the computation of a
ﬁnite-dimensional explicit mapping is possible
7

Challenge
Contribution
• Explicit feature maps for convolution kernels
7

Challenge
Contribution
• Weighted vertex kernels: derived approximate ﬁnite-dimensional
explicit feature maps
7

Challenge
Contribution
• Weighted vertex kernels: derived approximate ﬁnite-dimensional
explicit feature maps
• Validated theoretical results in experimental study
7

implicit
explicit
100
150
200
250
300
Data set size
0
1020
30
40
50
60
Label diversity
0
2
4
6
8
10
Runtime [s]
Experimental Results
Discrete Labels: explicit feature maps outperform implicit kernels
(for most kernels and benchmark data sets)
8

implicit
explicit
100
150
200
250
300
Data set size
0
1020
30
40
50
60
Label diversity
0
2
4
6
8
10
Runtime [s]
Experimental Results
Discrete Labels: explicit feature maps outperform implicit kernels
(for most kernels and benchmark data sets)
Continuous Labels: approximation by explicit feature maps not
competitive for complex kernels
8

Part II: Hash Graph Kernel Framework
Challenge
Design fast, explicit graph kernels that can handle continuous
labels.
C. Morris, N. M. Kriege, K. Kersting, and P. Mutzel. “Faster Kernel for
Graphs with Continuous Attributes via Hashing”. In: IEEE
International Conference on Data Mining. 2016, pp. 1095–1100
[ 1.2
0.3 ]
[ 9.1
0.9 ]
[ 1.6
0.7 ]
[ 5.2
1.0 ]
[ 5.1
0.2 ]
[ 1.0
0.2 ]
9

Challenge
Design fast, explicit graph kernels that can handle continuous
labels.
𝜑 Cont. Labels Run time
Random Walk [Gärtner et al., 2003] IM 𝒪(n2𝜔
)
Shortest-Path [Borgwardt et al., 2005] IM 𝒪(n4
)
Subgraph Matching [Kriege, Mutzel, 2012] IM 𝒪(kn2k+2
)
GraphHopper [Feragen et al., 2013] IM 𝒪(n2
m)
Graphlet [Shervashidze et al., 2009] EX
NSPDK [Costa et al., 2010] EX
Weisfeiler-Lehman [Shervashidze et al., 2011] EX 𝒪(hm)
Propagation [Neumann et al., 2016] EX
HGK Framework [Morris et al., 2016] EX Linear in BK
10

(G, a)
(G, l1)
(G, l2)
Hash
φ(G, l1)
φ(G, l2)
1/I[φ(G,l1),...,φ(G,lI)]
Feat. Vectors
(G, lI) φ(G, lI)
11

Contribution: Hash Graph Kernel Framework
• Use explicit instead of implicit kernels, i.e., avoid kernel trick!
12

• Applicable to a wide range of graph kernel functions
(Weisfeiler-Lehman, Shortest-Path, Graphlet, ...)
12

• Theoretical approximation bounds
12

• State-of-the-art classiﬁcation accuracies but orders of
magnitude faster than implicit kernels
12

• State-of-the-art classiﬁcation accuracies but orders of
magnitude faster than implicit kernels
Question
Is there no beneﬁt from employing the kernel trick at all on the
graph domain?
12

Part III: Positive-semideﬁnite Optimal Assignments
Challenge
Design valid graph kernel that is based on optimal assignments.
X Y
a
a
a
b
c
a
b
b
c
c
N. M. Kriege, Giscard. P.-L., and R. C. Wilson. “On Valid Optimal
Assignment Kernels and Applications to Graph Classiﬁcation”. In:
Advances in Neural Information Processing Systems. 2016,
pp. 1615–1623
13

Intuition
Optimal Assignments are a “natural” measure of similarity.
Deﬁnition (Optimal Assignment Kernel)
Let ℬ(X, Y) be the bijections between X, Y in [𝒮]n
, the optimal
assignment kernel on [𝒮]n
is deﬁned as
Kk
ℬ(X, Y) = max
B∈ℬ(X,Y)
W(B), where W(B) =
∑︁
(x,y)∈B
k(x, y)
and k is a base kernel on 𝒮.
14

Previous Work:
• Optimal assignment kernels for attributed molecular graphs
[Fröhlich, Wegner, Sieker, Zell, 2005], ICML
• The optimal assignment kernel is not positive deﬁnite
[Vert, 2008], CoRR, abs/0801.4061
Problem
Optimal assignments yield indeﬁnite functions.
15

Deﬁnition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
16

k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• every object is most similar to itself
16

k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• strong kernels are indeed PSD
16

k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• strong kernels give rise to hierarchies
16

k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
Contribution
• Strong base kernels that guarantee PSD optimal assignment
kernels
16

k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
Contribution
kernels
• Linear time computation of optimal assignment kernels
16

k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
Contribution
kernels
• Linear time computation of optimal assignment kernels
• Weisfeiler-Lehman optimal assignment kernels 16

Outlook/What’s next?
Classical Graph Kernels
C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman
Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE
International Conference on Data Mining. 2017
17

Optimization Based Graph Feature Maps
Classical:
Feature Engineering
Phase I
Classiﬁer
Phase II
17

Optimization Based Graph Feature Maps
Classical:
Feature Engineering
Phase I
Classiﬁer
Phase II
End-to-End:
Feature Engineering + Classiﬁer
Phase I + Phase II
Optimize Parameters
17

Conclusion
1 Explicit vs. Implicit Kernels
2 Hash Graph Kernel Framework
3 Valid Kernels from Optimal Assignments
Collection of Graph Classiﬁcation Benchmarks
graphkernels.cs.tu-dortmund.de
18

References I
Kriege, N. M., Giscard. P.-L., and R. C. Wilson. “On Valid Optimal
Assignment Kernels and Applications to Graph Classiﬁcation”. In:
Advances in Neural Information Processing Systems. 2016,
pp. 1615–1623.
Kriege, N. M. et al. “A Unifying View of Explicit and Implicit Feature
Maps for Structured Data: Systematic Studies of Graph Kernels”. In:
CoRR abs/1703.00676 (2017). url:
http://arxiv.org/abs/1703.00676.
Kriege, N. et al. “Explicit versus Implicit Graph Feature Maps: A
Computational Phase Transition for Walk Kernels”. In: IEEE
International Conference on Data Mining. 2014, pp. 881–886.
Morris, C., K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman
International Conference on Data Mining. 2017.
19

References II
Morris, C. et al. “Faster Kernel for Graphs with Continuous Attributes
via Hashing”. In: IEEE International Conference on Data Mining.
2016, pp. 1095–1100.
20

Recent Advances in Kernel-Based Graph Classification

More Related Content

What's hot

Similar to Recent Advances in Kernel-Based Graph Classification

Recently uploaded

Recent Advances in Kernel-Based Graph Classification