Recent Advances in Kernel-Based Graph
Classification
ECML PKDD 2017, Nectar Track
Nils Kriege, Christopher Morris
June 20, 2017
TU Dortmund University, Algorithm Engineering Group
Motivation
Question
How similar are two graphs?
(a) Sildenafil (b) Vardenafil
1
High-level View: Supervised Graph Classification
⊆ H
φ
2
Primer on Graph Kernels
Question
How similar are two graphs?
Definition (Graph Kernel)
Let 𝒢 be a non-empty set of graphs and let k: 𝒢 × 𝒢 → R. Then k is
a graph kernel if there is a real Hilbert space ℋ and a feature map
𝜑: 𝒢 → ℋ such that k(G, H) = ⟨𝜑(G), 𝜑(H)⟩.
Explicit vs. Implicit
Exp.
(EX)
Imp.
(IM)
G H
Inner Product
PSD function
𝜑(G)
𝜑(H)
k(G, H)
3
Talk Structure
1 Explict vs. Implicit Graph Kernels, IEEE ICDM 2014
2 Fast Kernels for Graphs with Continuous Labels, IEEE ICDM 2016
3 Graph Kernels Based on Optimal Assignments, NIPS 2016
4 Outlook/What’s next?
4
Part I: Explicit vs. Implicit Graph Kernels
Challenge
Investigate the benefits of explicit and implicit graph kernels.
N. Kriege, M. Neumann, K. Kersting, and P. Mutzel. “Explicit versus
Implicit Graph Feature Maps: A Computational Phase Transition for
Walk Kernels”. In: IEEE International Conference on Data Mining.
2014, pp. 881–886
N. M. Kriege, M. Neumann, C. Morris, K. Kersting, and P. Mutzel. “A
Unifying View of Explicit and Implicit Feature Maps for Structured
Data: Systematic Studies of Graph Kernels”. In: CoRR abs/1703.00676
(2017). url: http://arxiv.org/abs/1703.00676
5
Part I: Explicit vs. Implicit Graph Kernels
𝜑 Cont. Labels Run time
Random Walk [Gärtner et al., 2003] IM  𝒪(n2𝜔
)
Shortest-Path [Borgwardt et al., 2005] IM  𝒪(n4
)
Subgraph Matching [Kriege, Mutzel, 2012] IM  𝒪(kn2k+2
)
GraphHopper [Feragen et al., 2013] IM  𝒪(n2
m)
Graphlet [Shervashidze et al., 2009] EX 
NSPDK [Costa et al., 2010] EX 
Weisfeiler-Lehman [Shervashidze et al., 2011] EX  𝒪(hm)
Propagation [Neumann et al., 2016] EX 
Implicit vs. Explicit
• Implicit Kernels: do not scale, extendable to continuous labels
• Explicit Kernels: do scale, only discrete labels
6
Part I: Explicit vs. Implicit Graph Kernels
Challenge
Investigate the benefits of explicit and implicit graph kernels.
7
Part I: Explicit vs. Implicit Graph Kernels
Challenge
Investigate the benefits of explicit and implicit graph kernels.
Contribution
• Conditions under which the computation of a
finite-dimensional explicit mapping is possible
7
Part I: Explicit vs. Implicit Graph Kernels
Challenge
Investigate the benefits of explicit and implicit graph kernels.
Contribution
• Conditions under which the computation of a
finite-dimensional explicit mapping is possible
• Explicit feature maps for convolution kernels
7
Part I: Explicit vs. Implicit Graph Kernels
Challenge
Investigate the benefits of explicit and implicit graph kernels.
Contribution
• Conditions under which the computation of a
finite-dimensional explicit mapping is possible
• Explicit feature maps for convolution kernels
• Weighted vertex kernels: derived approximate finite-dimensional
explicit feature maps
7
Part I: Explicit vs. Implicit Graph Kernels
Challenge
Investigate the benefits of explicit and implicit graph kernels.
Contribution
• Conditions under which the computation of a
finite-dimensional explicit mapping is possible
• Explicit feature maps for convolution kernels
• Weighted vertex kernels: derived approximate finite-dimensional
explicit feature maps
• Validated theoretical results in experimental study
7
Part I: Explicit vs. Implicit Graph Kernels
implicit
explicit
100
150
200
250
300
Data set size
0
1020
30
40
50
60
Label diversity
0
2
4
6
8
10
Runtime [s]
Experimental Results
Discrete Labels: explicit feature maps outperform implicit kernels
(for most kernels and benchmark data sets)
8
Part I: Explicit vs. Implicit Graph Kernels
implicit
explicit
100
150
200
250
300
Data set size
0
1020
30
40
50
60
Label diversity
0
2
4
6
8
10
Runtime [s]
Experimental Results
Discrete Labels: explicit feature maps outperform implicit kernels
(for most kernels and benchmark data sets)
Continuous Labels: approximation by explicit feature maps not
competitive for complex kernels
8
Part II: Hash Graph Kernel Framework
Challenge
Design fast, explicit graph kernels that can handle continuous
labels.
C. Morris, N. M. Kriege, K. Kersting, and P. Mutzel. “Faster Kernel for
Graphs with Continuous Attributes via Hashing”. In: IEEE
International Conference on Data Mining. 2016, pp. 1095–1100
[ 1.2
0.3 ]
[ 9.1
0.9 ]
[ 1.6
0.7 ]
[ 5.2
1.0 ]
[ 5.1
0.2 ]
[ 1.0
0.2 ]
9
Part II: Hash Graph Kernel Framework
Challenge
Design fast, explicit graph kernels that can handle continuous
labels.
𝜑 Cont. Labels Run time
Random Walk [Gärtner et al., 2003] IM  𝒪(n2𝜔
)
Shortest-Path [Borgwardt et al., 2005] IM  𝒪(n4
)
Subgraph Matching [Kriege, Mutzel, 2012] IM  𝒪(kn2k+2
)
GraphHopper [Feragen et al., 2013] IM  𝒪(n2
m)
Graphlet [Shervashidze et al., 2009] EX 
NSPDK [Costa et al., 2010] EX 
Weisfeiler-Lehman [Shervashidze et al., 2011] EX  𝒪(hm)
Propagation [Neumann et al., 2016] EX 
HGK Framework [Morris et al., 2016] EX  Linear in BK
10
Part II: Hash Graph Kernel Framework
(G, a)
(G, l1)
(G, l2)
Hash
φ(G, l1)
φ(G, l2)
1/I[φ(G,l1),...,φ(G,lI)]
Feat. Vectors
(G, lI) φ(G, lI)
11
Part II: Hash Graph Kernel Framework
Contribution: Hash Graph Kernel Framework
• Use explicit instead of implicit kernels, i.e., avoid kernel trick!
12
Part II: Hash Graph Kernel Framework
Contribution: Hash Graph Kernel Framework
• Use explicit instead of implicit kernels, i.e., avoid kernel trick!
• Applicable to a wide range of graph kernel functions
(Weisfeiler-Lehman, Shortest-Path, Graphlet, ...)
12
Part II: Hash Graph Kernel Framework
Contribution: Hash Graph Kernel Framework
• Use explicit instead of implicit kernels, i.e., avoid kernel trick!
• Applicable to a wide range of graph kernel functions
(Weisfeiler-Lehman, Shortest-Path, Graphlet, ...)
• Theoretical approximation bounds
12
Part II: Hash Graph Kernel Framework
Contribution: Hash Graph Kernel Framework
• Use explicit instead of implicit kernels, i.e., avoid kernel trick!
• Applicable to a wide range of graph kernel functions
(Weisfeiler-Lehman, Shortest-Path, Graphlet, ...)
• Theoretical approximation bounds
• State-of-the-art classification accuracies but orders of
magnitude faster than implicit kernels
12
Part II: Hash Graph Kernel Framework
Contribution: Hash Graph Kernel Framework
• Use explicit instead of implicit kernels, i.e., avoid kernel trick!
• Applicable to a wide range of graph kernel functions
(Weisfeiler-Lehman, Shortest-Path, Graphlet, ...)
• Theoretical approximation bounds
• State-of-the-art classification accuracies but orders of
magnitude faster than implicit kernels
Question
Is there no benefit from employing the kernel trick at all on the
graph domain?
12
Part III: Positive-semidefinite Optimal Assignments
Challenge
Design valid graph kernel that is based on optimal assignments.
X Y
a
a
a
b
c
a
b
b
c
c
N. M. Kriege, Giscard. P.-L., and R. C. Wilson. “On Valid Optimal
Assignment Kernels and Applications to Graph Classification”. In:
Advances in Neural Information Processing Systems. 2016,
pp. 1615–1623
13
Part III: Positive-semidefinite Optimal Assignments
Intuition
Optimal Assignments are a “natural” measure of similarity.
Definition (Optimal Assignment Kernel)
Let ℬ(X, Y) be the bijections between X, Y in [𝒮]n
, the optimal
assignment kernel on [𝒮]n
is defined as
Kk
ℬ(X, Y) = max
B∈ℬ(X,Y)
W(B), where W(B) =
∑︁
(x,y)∈B
k(x, y)
and k is a base kernel on 𝒮.
14
Part III: Positive-semidefinite Optimal Assignments
Previous Work:
• Optimal assignment kernels for attributed molecular graphs
[Fröhlich, Wegner, Sieker, Zell, 2005], ICML
• The optimal assignment kernel is not positive definite
[Vert, 2008], CoRR, abs/0801.4061
Problem
Optimal assignments yield indefinite functions.
15
Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
16
Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• every object is most similar to itself
16
Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• every object is most similar to itself
• strong kernels are indeed PSD
16
Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• every object is most similar to itself
• strong kernels are indeed PSD
• strong kernels give rise to hierarchies
16
Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• every object is most similar to itself
• strong kernels are indeed PSD
• strong kernels give rise to hierarchies
Contribution
• Strong base kernels that guarantee PSD optimal assignment
kernels
16
Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• every object is most similar to itself
• strong kernels are indeed PSD
• strong kernels give rise to hierarchies
Contribution
• Strong base kernels that guarantee PSD optimal assignment
kernels
• Linear time computation of optimal assignment kernels
16
Part III: Positive-semidefinite Optimal Assignments
Definition (Strong Kernel)
A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳
k(x, y) ≥ min{k(x, z), k(z, y)}.
a b c
a 4 3 1
b 3 5 1
c 1 1 2
• every object is most similar to itself
• strong kernels are indeed PSD
• strong kernels give rise to hierarchies
Contribution
• Strong base kernels that guarantee PSD optimal assignment
kernels
• Linear time computation of optimal assignment kernels
• Weisfeiler-Lehman optimal assignment kernels 16
Outlook/What’s next?
Classical Graph Kernels
C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman
Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE
International Conference on Data Mining. 2017
17
Outlook/What’s next?
Classical Graph Kernels
C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman
Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE
International Conference on Data Mining. 2017
Optimization Based Graph Feature Maps
Classical:
Feature Engineering
Phase I
Classifier
Phase II
17
Outlook/What’s next?
Classical Graph Kernels
C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman
Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE
International Conference on Data Mining. 2017
Optimization Based Graph Feature Maps
Classical:
Feature Engineering
Phase I
Classifier
Phase II
End-to-End:
Feature Engineering + Classifier
Phase I + Phase II
Optimize Parameters
17
Conclusion
1 Explicit vs. Implicit Kernels
2 Hash Graph Kernel Framework
3 Valid Kernels from Optimal Assignments
Collection of Graph Classification Benchmarks
graphkernels.cs.tu-dortmund.de
18
References I
Kriege, N. M., Giscard. P.-L., and R. C. Wilson. “On Valid Optimal
Assignment Kernels and Applications to Graph Classification”. In:
Advances in Neural Information Processing Systems. 2016,
pp. 1615–1623.
Kriege, N. M. et al. “A Unifying View of Explicit and Implicit Feature
Maps for Structured Data: Systematic Studies of Graph Kernels”. In:
CoRR abs/1703.00676 (2017). url:
http://arxiv.org/abs/1703.00676.
Kriege, N. et al. “Explicit versus Implicit Graph Feature Maps: A
Computational Phase Transition for Walk Kernels”. In: IEEE
International Conference on Data Mining. 2014, pp. 881–886.
Morris, C., K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman
Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE
International Conference on Data Mining. 2017.
19
References II
Morris, C. et al. “Faster Kernel for Graphs with Continuous Attributes
via Hashing”. In: IEEE International Conference on Data Mining.
2016, pp. 1095–1100.
20

Recent Advances in Kernel-Based Graph Classification

  • 1.
    Recent Advances inKernel-Based Graph Classification ECML PKDD 2017, Nectar Track Nils Kriege, Christopher Morris June 20, 2017 TU Dortmund University, Algorithm Engineering Group
  • 2.
    Motivation Question How similar aretwo graphs? (a) Sildenafil (b) Vardenafil 1
  • 3.
    High-level View: SupervisedGraph Classification ⊆ H φ 2
  • 4.
    Primer on GraphKernels Question How similar are two graphs? Definition (Graph Kernel) Let 𝒢 be a non-empty set of graphs and let k: 𝒢 × 𝒢 → R. Then k is a graph kernel if there is a real Hilbert space ℋ and a feature map 𝜑: 𝒢 → ℋ such that k(G, H) = ⟨𝜑(G), 𝜑(H)⟩. Explicit vs. Implicit Exp. (EX) Imp. (IM) G H Inner Product PSD function 𝜑(G) 𝜑(H) k(G, H) 3
  • 5.
    Talk Structure 1 Explictvs. Implicit Graph Kernels, IEEE ICDM 2014 2 Fast Kernels for Graphs with Continuous Labels, IEEE ICDM 2016 3 Graph Kernels Based on Optimal Assignments, NIPS 2016 4 Outlook/What’s next? 4
  • 6.
    Part I: Explicitvs. Implicit Graph Kernels Challenge Investigate the benefits of explicit and implicit graph kernels. N. Kriege, M. Neumann, K. Kersting, and P. Mutzel. “Explicit versus Implicit Graph Feature Maps: A Computational Phase Transition for Walk Kernels”. In: IEEE International Conference on Data Mining. 2014, pp. 881–886 N. M. Kriege, M. Neumann, C. Morris, K. Kersting, and P. Mutzel. “A Unifying View of Explicit and Implicit Feature Maps for Structured Data: Systematic Studies of Graph Kernels”. In: CoRR abs/1703.00676 (2017). url: http://arxiv.org/abs/1703.00676 5
  • 7.
    Part I: Explicitvs. Implicit Graph Kernels 𝜑 Cont. Labels Run time Random Walk [Gärtner et al., 2003] IM 𝒪(n2𝜔 ) Shortest-Path [Borgwardt et al., 2005] IM 𝒪(n4 ) Subgraph Matching [Kriege, Mutzel, 2012] IM 𝒪(kn2k+2 ) GraphHopper [Feragen et al., 2013] IM 𝒪(n2 m) Graphlet [Shervashidze et al., 2009] EX NSPDK [Costa et al., 2010] EX Weisfeiler-Lehman [Shervashidze et al., 2011] EX 𝒪(hm) Propagation [Neumann et al., 2016] EX Implicit vs. Explicit • Implicit Kernels: do not scale, extendable to continuous labels • Explicit Kernels: do scale, only discrete labels 6
  • 8.
    Part I: Explicitvs. Implicit Graph Kernels Challenge Investigate the benefits of explicit and implicit graph kernels. 7
  • 9.
    Part I: Explicitvs. Implicit Graph Kernels Challenge Investigate the benefits of explicit and implicit graph kernels. Contribution • Conditions under which the computation of a finite-dimensional explicit mapping is possible 7
  • 10.
    Part I: Explicitvs. Implicit Graph Kernels Challenge Investigate the benefits of explicit and implicit graph kernels. Contribution • Conditions under which the computation of a finite-dimensional explicit mapping is possible • Explicit feature maps for convolution kernels 7
  • 11.
    Part I: Explicitvs. Implicit Graph Kernels Challenge Investigate the benefits of explicit and implicit graph kernels. Contribution • Conditions under which the computation of a finite-dimensional explicit mapping is possible • Explicit feature maps for convolution kernels • Weighted vertex kernels: derived approximate finite-dimensional explicit feature maps 7
  • 12.
    Part I: Explicitvs. Implicit Graph Kernels Challenge Investigate the benefits of explicit and implicit graph kernels. Contribution • Conditions under which the computation of a finite-dimensional explicit mapping is possible • Explicit feature maps for convolution kernels • Weighted vertex kernels: derived approximate finite-dimensional explicit feature maps • Validated theoretical results in experimental study 7
  • 13.
    Part I: Explicitvs. Implicit Graph Kernels implicit explicit 100 150 200 250 300 Data set size 0 1020 30 40 50 60 Label diversity 0 2 4 6 8 10 Runtime [s] Experimental Results Discrete Labels: explicit feature maps outperform implicit kernels (for most kernels and benchmark data sets) 8
  • 14.
    Part I: Explicitvs. Implicit Graph Kernels implicit explicit 100 150 200 250 300 Data set size 0 1020 30 40 50 60 Label diversity 0 2 4 6 8 10 Runtime [s] Experimental Results Discrete Labels: explicit feature maps outperform implicit kernels (for most kernels and benchmark data sets) Continuous Labels: approximation by explicit feature maps not competitive for complex kernels 8
  • 15.
    Part II: HashGraph Kernel Framework Challenge Design fast, explicit graph kernels that can handle continuous labels. C. Morris, N. M. Kriege, K. Kersting, and P. Mutzel. “Faster Kernel for Graphs with Continuous Attributes via Hashing”. In: IEEE International Conference on Data Mining. 2016, pp. 1095–1100 [ 1.2 0.3 ] [ 9.1 0.9 ] [ 1.6 0.7 ] [ 5.2 1.0 ] [ 5.1 0.2 ] [ 1.0 0.2 ] 9
  • 16.
    Part II: HashGraph Kernel Framework Challenge Design fast, explicit graph kernels that can handle continuous labels. 𝜑 Cont. Labels Run time Random Walk [Gärtner et al., 2003] IM 𝒪(n2𝜔 ) Shortest-Path [Borgwardt et al., 2005] IM 𝒪(n4 ) Subgraph Matching [Kriege, Mutzel, 2012] IM 𝒪(kn2k+2 ) GraphHopper [Feragen et al., 2013] IM 𝒪(n2 m) Graphlet [Shervashidze et al., 2009] EX NSPDK [Costa et al., 2010] EX Weisfeiler-Lehman [Shervashidze et al., 2011] EX 𝒪(hm) Propagation [Neumann et al., 2016] EX HGK Framework [Morris et al., 2016] EX Linear in BK 10
  • 17.
    Part II: HashGraph Kernel Framework (G, a) (G, l1) (G, l2) Hash φ(G, l1) φ(G, l2) 1/I[φ(G,l1),...,φ(G,lI)] Feat. Vectors (G, lI) φ(G, lI) 11
  • 18.
    Part II: HashGraph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! 12
  • 19.
    Part II: HashGraph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! • Applicable to a wide range of graph kernel functions (Weisfeiler-Lehman, Shortest-Path, Graphlet, ...) 12
  • 20.
    Part II: HashGraph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! • Applicable to a wide range of graph kernel functions (Weisfeiler-Lehman, Shortest-Path, Graphlet, ...) • Theoretical approximation bounds 12
  • 21.
    Part II: HashGraph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! • Applicable to a wide range of graph kernel functions (Weisfeiler-Lehman, Shortest-Path, Graphlet, ...) • Theoretical approximation bounds • State-of-the-art classification accuracies but orders of magnitude faster than implicit kernels 12
  • 22.
    Part II: HashGraph Kernel Framework Contribution: Hash Graph Kernel Framework • Use explicit instead of implicit kernels, i.e., avoid kernel trick! • Applicable to a wide range of graph kernel functions (Weisfeiler-Lehman, Shortest-Path, Graphlet, ...) • Theoretical approximation bounds • State-of-the-art classification accuracies but orders of magnitude faster than implicit kernels Question Is there no benefit from employing the kernel trick at all on the graph domain? 12
  • 23.
    Part III: Positive-semidefiniteOptimal Assignments Challenge Design valid graph kernel that is based on optimal assignments. X Y a a a b c a b b c c N. M. Kriege, Giscard. P.-L., and R. C. Wilson. “On Valid Optimal Assignment Kernels and Applications to Graph Classification”. In: Advances in Neural Information Processing Systems. 2016, pp. 1615–1623 13
  • 24.
    Part III: Positive-semidefiniteOptimal Assignments Intuition Optimal Assignments are a “natural” measure of similarity. Definition (Optimal Assignment Kernel) Let ℬ(X, Y) be the bijections between X, Y in [𝒮]n , the optimal assignment kernel on [𝒮]n is defined as Kk ℬ(X, Y) = max B∈ℬ(X,Y) W(B), where W(B) = ∑︁ (x,y)∈B k(x, y) and k is a base kernel on 𝒮. 14
  • 25.
    Part III: Positive-semidefiniteOptimal Assignments Previous Work: • Optimal assignment kernels for attributed molecular graphs [Fröhlich, Wegner, Sieker, Zell, 2005], ICML • The optimal assignment kernel is not positive definite [Vert, 2008], CoRR, abs/0801.4061 Problem Optimal assignments yield indefinite functions. 15
  • 26.
    Part III: Positive-semidefiniteOptimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. 16
  • 27.
    Part III: Positive-semidefiniteOptimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself 16
  • 28.
    Part III: Positive-semidefiniteOptimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD 16
  • 29.
    Part III: Positive-semidefiniteOptimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD • strong kernels give rise to hierarchies 16
  • 30.
    Part III: Positive-semidefiniteOptimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD • strong kernels give rise to hierarchies Contribution • Strong base kernels that guarantee PSD optimal assignment kernels 16
  • 31.
    Part III: Positive-semidefiniteOptimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD • strong kernels give rise to hierarchies Contribution • Strong base kernels that guarantee PSD optimal assignment kernels • Linear time computation of optimal assignment kernels 16
  • 32.
    Part III: Positive-semidefiniteOptimal Assignments Definition (Strong Kernel) A function k : 𝒳 × 𝒳 → R is a strong kernel if for all x, y, z ∈ 𝒳 k(x, y) ≥ min{k(x, z), k(z, y)}. a b c a 4 3 1 b 3 5 1 c 1 1 2 • every object is most similar to itself • strong kernels are indeed PSD • strong kernels give rise to hierarchies Contribution • Strong base kernels that guarantee PSD optimal assignment kernels • Linear time computation of optimal assignment kernels • Weisfeiler-Lehman optimal assignment kernels 16
  • 33.
    Outlook/What’s next? Classical GraphKernels C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE International Conference on Data Mining. 2017 17
  • 34.
    Outlook/What’s next? Classical GraphKernels C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE International Conference on Data Mining. 2017 Optimization Based Graph Feature Maps Classical: Feature Engineering Phase I Classifier Phase II 17
  • 35.
    Outlook/What’s next? Classical GraphKernels C. Morris, K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE International Conference on Data Mining. 2017 Optimization Based Graph Feature Maps Classical: Feature Engineering Phase I Classifier Phase II End-to-End: Feature Engineering + Classifier Phase I + Phase II Optimize Parameters 17
  • 36.
    Conclusion 1 Explicit vs.Implicit Kernels 2 Hash Graph Kernel Framework 3 Valid Kernels from Optimal Assignments Collection of Graph Classification Benchmarks graphkernels.cs.tu-dortmund.de 18
  • 37.
    References I Kriege, N.M., Giscard. P.-L., and R. C. Wilson. “On Valid Optimal Assignment Kernels and Applications to Graph Classification”. In: Advances in Neural Information Processing Systems. 2016, pp. 1615–1623. Kriege, N. M. et al. “A Unifying View of Explicit and Implicit Feature Maps for Structured Data: Systematic Studies of Graph Kernels”. In: CoRR abs/1703.00676 (2017). url: http://arxiv.org/abs/1703.00676. Kriege, N. et al. “Explicit versus Implicit Graph Feature Maps: A Computational Phase Transition for Walk Kernels”. In: IEEE International Conference on Data Mining. 2014, pp. 881–886. Morris, C., K. Kersting, and M. Mutzel. “Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs”. In: IEEE International Conference on Data Mining. 2017. 19
  • 38.
    References II Morris, C.et al. “Faster Kernel for Graphs with Continuous Attributes via Hashing”. In: IEEE International Conference on Data Mining. 2016, pp. 1095–1100. 20