High-Performance Graph Analysis and Modeling

§ This talk is NOT about graph processing systems
• (e.g., graphx, Giraph, …).

§ This talk is NOT about graph processing systems
• (e.g., graphx, Giraph, …).
§ Instead, this talk is about:
(1) Knowledge discovery and extracting insights from graph data.
(2) Graph Machine Learning
(3) High-‐Performance algorithms for solving (1) and (2)

§ Graphs encode dependencies/relationships between entities
IID Relational/graph

p(Y | X) p(Y | X, XR)
IID Classification Relational Classification

How
to
detect
abnormal
traffic?
Port scanning DDoS Normal Traffic
Adj. Matrix
ibm.com
google.com
IP src
IP dest
IP src
IP dest
IP src
IP dest

How
to
select
k ‘best’ nodes
for
immunization?
34
33
25
26
27
28
29
30
31 32
22
21
20
19
18
17
23 24
12
13
14
15
16
1
9
10
11
3
4
5
6
7
8
2
Ebola
virus
epidemic
(costs
9,000+
lives,
potentially
32+
bn)
SARS
(costs
700+
lives;
$40+
bn)

-‐
-‐
-‐
-‐
-‐
Social network
Human Disease Network
[Barabasi 2007]
Food Web [2007]
Terrorist Network
[Krebs 2002]Internet (AS) [2005]
Gene Regulatory Network
[Decourty 2008]
Protein Interactions
[breast cancer]
Political blogs
Power grid

New Insights
Knowledge
Reports
Data Graph
Cleaning
Selection
Processing
Modeling
Ranking
Querying

New Insights
Knowledge
Reports
Data Graph
Cleaning
Selection
Processing
Modeling
Ranking
Querying
Observation 1: Graphs are never given/observed
Graphs are usually constructed/inferred from input data

How to construct/infer the graph from input data?
Data Graph

Social Network
Relationship may represent:
- Friendship
- Email/IM/Communication
- Co-location
- Re-tweet
- Tagging
Biological Network /
Chemoinformatics
Relationship may represent:
- Protein Interaction
- Chemical bonds between
Atoms

Infrastructure Network
e.g. Power Grid
Web/Information Network

Xu et. al, Frontiers in behavioral neuroscience, 2015
Brain Functional Connectivity Network

New Insights
Knowledge
Reports
Data Graph
XGraph Representation
What’s a node?
Attributes? Types?
What’s an edge?
Directed? Undirected?
Time-evolving? Dynamic?

Observation 2: Graph Data Management is challenging

GPU$
CPU$
Core Core
Core Core
System'
Memory''
Memory'
(GPU)
GPU$
CPU$
Core Core
Core Core
System'
Memory''
Memory'
(GPU)
Node Interconnect
v1 ! v4 ! v7 !v10 ! v2 ! v5 ! v8 !v11 ! v3 !v6 ! v9 ! v12 !
GPU$
CPU$
Core Core
Core Core
System'
Memory''
Memory'
(GPU)
v1 ! v2 ! v3 ! v4 ! v5 ! v6 ! v7 ! v8 ! v9 !v10 !v11 !v12 ! …"" vn !
t-p t-1 t
⋯
⋯
⋯
⋯
Large data
Attributed
Dynamic
Heterogeneous

GPU$
CPU$
Core Core
Core Core
System'
Memory''
Memory'
(GPU)
GPU$
CPU$
Core Core
Core Core
System'
Memory''
Memory'
(GPU)
Node Interconnect
v1 ! v4 ! v7 !v10 ! v2 ! v5 ! v8 !v11 ! v3 !v6 ! v9 ! v12 !
GPU$
CPU$
Core Core
Core Core
System'
Memory''
Memory'
(GPU)
v1 ! v2 ! v3 ! v4 ! v5 ! v6 ! v7 ! v8 ! v9 !v10 !v11 !v12 ! …"" vn !
t-p t-1 t
⋯
⋯
⋯
⋯
Large data
Attributed
Dynamic
Heterogeneous
Graph
Mining
& ML
Machine learning/Data mining
+
Statistics, Graph theory/algorithms

New Insights
Knowledge
Reports
Data
Graph Representation

How to extract insights from data represented as a graph?
New Insights
Knowledge
Reports

How to extract insights from data represented as a graph?
New Insights
Knowledge
Reports
(1) Graph Decomposition
(1) Unsupervised Representation Learning

Network Motifs: Simple Building Blocks of Complex Networks – [Milo et. al – Science 2002]
The Structure and Function of Complex Networks – [Newman – Siam Review 2003]
2-node
Graphlets
3-node
Graphlets
4-node
Graphlets
Connected
Disconnected

Ex: Given an input graph G
-‐ How many triangles in G?
-‐ How many cliques of size 4-‐nodes in G?
-‐ How many cycles of size 4-‐nodes in G?
à In practice, we would like to count all k-‐vertex graphlets

Ranking by graphlet counts
Nodes are colored/weighted
by triangle counts
Links are colored/weighted
by stars of size 4 nodes
Leukemia
Colon
cancer
Deafness

§ Enumerate all possible graphlets
à Exhaustive enumeration is too expensive
§ Count graphlets for each node – and combine all node counts
à Still expensive for relatively large k [Shervashidze et. al – AISTAT 2009]
§ Other recent work counts only connected graphlets of size k=4
[Marcus & Shavitt – Computer Networks 2012]
Not practical – scales only for small graphs with few
hundred/thousand nodes/edges
-‐ taking 2400 secs for a graph with 26K nodes

± 1 edge
Graphlet Transition Diagram

± 1 edge
Count Cliques & Cycles ONLY
Use relationships & transitions
to count all other graphlets in constant time
4-‐Cliques
4-‐Cycles
Maximum no. triangles
Incident to an edge
Maximum no. stars
Incident to an edge
Graphlet Transition Diagram

T T
Relationship between 4-‐cliques & 4-‐ChordalCycles
4-‐Cliques 4-‐ChordalCycle
e
T T
e
No. 4-‐ChordalCycles No. 4-‐Cliques
Proof in Lemma 1 - Ahmed et al., ICDM 2015

T T
Relationship between 4-‐cliques & 4-‐ChordalCycles
T T
No. 4-‐ChordalCycles No. 4-‐Cliques
4-‐Cliques 4-‐ChordalCycle
e e
Proof in Lemma 1 - Ahmed et al., ICDM 2015

1 2 4 8 12 16
0
2
4
6
8
10
12
14
16
Number of processing units
Speedup
socfb−MIT
bio−dmela
soc−gowalla
tech−RL−caida
web−wikipedia09
1 2 4 8 12 16
0
2
4
6
8
10
12
14
16
Number of processing units
Speedup
Strong scaling results
Using Intel Xeon E5-‐2687W server, 16 cores
Motif Counting

input 0 …
1 …
0 …
Feature

Engineering
features
1 …
1 … 0
0
1
0
0
Learning

AlgorithmModel
Prediction
Task
Link
prediction
Classification

Anomaly
detection

input 0 …
1 …
0 …
Feature

Engineering
features
1 …
1 … 0
0
1
0
0
Learning

AlgorithmModel
Prediction
Task
Automatic

Feature
Learning
Link
prediction
Classification

Anomaly
detection

§ Goal: Learn representation (features) for a set of graph
elements (nodes, edges, etc.)
§ Key intuition: Map the graph elements (e.g., nodes) to the
d-‐dimension space, while preserving node similarity
§ Use the features for any downstream prediction task

Communities: cohesive subsets of nodes
Roles: represent structural patterns
-‐ two nodes belong to the same role if they’ve similar structural patterns
Cj#
Ci#
Ck#
Rossi
&
Ahmed
TKDE
2015
Ahmed
et
al.
AAAI
2017

Goal: Find a mapping of nodes to d-‐dimensions that preserves
proximity and node similarity
Using structure + attributes (if any)

A (conditional) attributed walk is a finite sequence of adjacent
node types (words) in the graph
Ahmed
et.
al
2017

Deepwalk (DW) – Perrozi et al. KDD 2014
node2vec (N2V) – Grover et al. KDD 2016
LINE: Tang et al. – WWW 2015
Link Prediction

Observation 3: Useful insights and accurate modeling
depend on the data representation

§ Open data repository with interactive visual analytics &
exploration
§ Largest with 500+ graphs, over 20+ collections
§ Community-‐oriented
• discuss, post data, comments, vis, etc.
AAAI’15
NetworkRepository.com

Observation 3: Useful insights and accurate modeling
depend on the data representation
Observation 2: Graph Data Management is challenging
Observation 1: Graphs are never given/observe
Graphs are usually constructed/inferred from input data

§ Efficient estimation of word representations in vector space. ICLR 2013 [Mikolov et. al]
§ A Framework for Generalizing Graph-‐based Representation Learning Methods. arXiv:1709.04596 2017 [Ahmed et. al]
§ Role Discovery in Networks. TKDE 2015 [Rossi & Ahmed]
§ A Higher-‐order Latent Space Network Model. AAAI 2017 [Ahmed, Rossi, Willke, Zhou]
§ node2vec: Scalable Feature Learning for Networks. KDD 2016 [Grover, Leskovec]
§ DeepWalk: online learning of social representations. KDD 2014 [Perozzi, Al-‐Rafou, Skiena]
§ Efficient Graphlet Counting for Large Networks. ICDM 2015, [Ahmed et al.]
§ Graphlet Decomposition: Framework, Algorithms, and Applications. J. Know. & Info. 2016 [Ahmed et al.]
§ Network Motifs: Simple Building Blocks of Complex Networks. Science 2002, [Milo et al.]
§ Uncovering Biological Network Function via Graphlet Degree Signatures. Cancer Informatics 2008 [Milenković-‐Pržulj]
§ Graph Kernels. JMLR 2010, [Vishwanathan et al.]
§ The Structure and Function of Complex Networks. SIAM Review 2003, [Newman]
§ Biological network comparison using graphlet degree distribution. Bioinformatics 2007 [Pržulj]
§ Efficient Graphlet Kernels for Large Graph Comparison. AISTAT 2009 [Shervashidze et al.]
§ Local structure in social networks. Sociological methodology 1976, [Holland-‐Leinhardt]
§ The strength of weak ties: A network theory revisited. Sociological theory 1983 [Granovetter]

Thank you!
Questions?
nesreen.k.ahmed@intel.com
http://nesreenahmed.com

High-Performance Graph Analysis and Modeling

More Related Content

What's hot

Similar to High-Performance Graph Analysis and Modeling

Recently uploaded

High-Performance Graph Analysis and Modeling