0

I've been using Neo4j for the past 3 months.

I've built a 10M node graph database.

I've been reading: http://neo4j.com/docs/stable/introduction-pattern.html

My goal is to look-up a single node's value by it's property (easy part) and then I need to discover all nodes connected to my look-up node that have a specific edge label. The question I have is I get very different behavior and I really don't understand why.

Bottom-line is I need a pattern that will allow me to look up Node1 and find every single node connected to that Node1 having a specific edge label and then assign a single identifier value to it (so that I can say this group of 100 nodes is part of ClusterIDGroup1).

Pattern 1

MATCH (l:CodeType { id_value : '050001' })-[:IDENTIFIED_BY*]-(m:CodeType)
WHERE 1=1
RETURN *
LIMIT 10000
;

Returns: 62 nodes

Pattern 2

MATCH (l:CodeType { id_value : '050001' })-[:IDENTIFIED_BY*30]-(m:CodeType)
WHERE 1=1
RETURN *
LIMIT 10000
;

Returns: 90 nodes

Pattern 3

MATCH (l:CodeType { id_value : '050001' })-[:IDENTIFIED_BY*0..30]-(m:CodeType)
WHERE 1=1
RETURN *
LIMIT 10000
;

Returns: 115 nodes

Why would I get 115, 90, and 62 depending on the variable length expression? I would think that 1) * would get me the most nodes 2) *0..30 would get me the second most and 3) *30 would get me the least.

Thanks

1 Answer 1

1

According to neo4j documentation:

If the distance between two nodes is zero, they are by definition the same node.

So, to answer your questions:

  • In the first query, you are getting every node within every depth, excluding the codeType nodes (no 0 depth).
  • In the second one, you get only nodes that are at exactly 30 relations far from your l node.
  • In the third query, you get more nodes because you also get your codeType nodes, and you also get every nodes having a relation depth from 0 to 30 from your l node.

I think that returning only l should help you a lot, return * is never a good choice in my opinion.

Also, using WHERE 1=1 is useless in your queries, you can safely remove it.

Sign up to request clarification or add additional context in comments.

6 Comments

Very useful. Thank you. One more follow-on question: When I use Pattern 3 and do *0..30 I get 115 nodes returned, whereas if I do *0..50 I get 84 nodes returned. Why do I get less nodes returned for 0..50? Shouldn't I get more nodes?
Can you please provide a dataset using console.neo4j.org? Should be nice to try it by myself and figure out the problem.
I'm trying to think how to do that, because my dataset is 10M nodes and since each pattern match is not reproducible (e.g. how do I get all of the nodes in the cluster), how do I do that? Would it be valuable if I provided all of the nodes that appear to be connected by variable length (e.g. 150 nodes) and just provide that example?
I see that I get back > 10k nodes back in the rows result window in neo4j, but I'm uncertain why I would get back any more than 150...? There are tons of duplicates... THANKS SO MUCH, I owe you!
Don't forget to tag as answer, to help the other devs who will maybe, one day, have the same problem :)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.