Using variable length pattern matching in Neo4j

Question

I've been using Neo4j for the past 3 months.

I've built a 10M node graph database.

I've been reading: http://neo4j.com/docs/stable/introduction-pattern.html

My goal is to look-up a single node's value by it's property (easy part) and then I need to discover all nodes connected to my look-up node that have a specific edge label. The question I have is I get very different behavior and I really don't understand why.

Bottom-line is I need a pattern that will allow me to look up Node1 and find every single node connected to that Node1 having a specific edge label and then assign a single identifier value to it (so that I can say this group of 100 nodes is part of ClusterIDGroup1).

Pattern 1

MATCH (l:CodeType { id_value : '050001' })-[:IDENTIFIED_BY*]-(m:CodeType)
WHERE 1=1
RETURN *
LIMIT 10000
;

Returns: 62 nodes

Pattern 2

MATCH (l:CodeType { id_value : '050001' })-[:IDENTIFIED_BY*30]-(m:CodeType)
WHERE 1=1
RETURN *
LIMIT 10000
;

Returns: 90 nodes

Pattern 3

MATCH (l:CodeType { id_value : '050001' })-[:IDENTIFIED_BY*0..30]-(m:CodeType)
WHERE 1=1
RETURN *
LIMIT 10000
;

Returns: 115 nodes

Why would I get 115, 90, and 62 depending on the variable length expression? I would think that 1) * would get me the most nodes 2) *0..30 would get me the second most and 3) *30 would get me the least.

Thanks

Supamiu · Accepted Answer · 2015-11-25 14:11:55Z

1

According to neo4j documentation:

If the distance between two nodes is zero, they are by definition the same node.

So, to answer your questions:

In the first query, you are getting every node within every depth, excluding the codeType nodes (no 0 depth).
In the second one, you get only nodes that are at exactly 30 relations far from your l node.
In the third query, you get more nodes because you also get your codeType nodes, and you also get every nodes having a relation depth from 0 to 30 from your l node.

I think that returning only l should help you a lot, return * is never a good choice in my opinion.

Also, using WHERE 1=1 is useless in your queries, you can safely remove it.

edited Nov 25, 2015 at 14:11

answered Nov 25, 2015 at 14:04

Supamiu

8,7417 gold badges47 silver badges79 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

DAE Over a year ago

Very useful. Thank you. One more follow-on question: When I use Pattern 3 and do *0..30 I get 115 nodes returned, whereas if I do *0..50 I get 84 nodes returned. Why do I get less nodes returned for 0..50? Shouldn't I get more nodes?

Supamiu Over a year ago

Can you please provide a dataset using console.neo4j.org? Should be nice to try it by myself and figure out the problem.

DAE Over a year ago

I'm trying to think how to do that, because my dataset is 10M nodes and since each pattern match is not reproducible (e.g. how do I get all of the nodes in the cluster), how do I do that? Would it be valuable if I provided all of the nodes that appear to be connected by variable length (e.g. 150 nodes) and just provide that example?

DAE Over a year ago

I see that I get back > 10k nodes back in the rows result window in neo4j, but I'm uncertain why I would get back any more than 150...? There are tons of duplicates... THANKS SO MUCH, I owe you!

Supamiu Over a year ago

Don't forget to tag as answer, to help the other devs who will maybe, one day, have the same problem :)

|

Collectives™ on Stack Overflow

Using variable length pattern matching in Neo4j

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related