How to Join two RDDs in pyspark with nested tuples

Question

I need to join two RDDs as part of my programming assignment. The problem is that the first RDD is nested, while the other is flat. I tried different things, but nothing seemed to work. Is there any expert on PySpark that can help me?

First RDD is:

[(('brand', 1), ('queen', 1), ('elizabeth', 1), ...),
(('50', 1), ('worst', 1), ('habit', 2), ...),
 (('cost', 1), ('trump', 1), ('aid', 1), ..., ('hole', 1))]

Second RDD is:

[('brand', 1), ('queen', 3), ('elizabeth', 2), ...]

Please trim your code to make it easier to find your problem. Follow these guidelines to create a minimal reproducible example. — Community
– Community Bot, Commented Dec 1 at 6:18
2026 is about to start. who is still using raw rdd ? please use dataframes. — Steven
– Steven, Commented Dec 1 at 14:51
@DerekO Oups, sorry, I did not realise that SO was the right website to get all the answer to your assignments without doing anything by yourself. — Steven
– Steven, Commented Dec 3 at 8:58

AtilaSol · Accepted Answer · 2025-12-02 00:30:38Z

0

First, flatten the nested RDD and then join with the second RDD. I also merged the duplicates; however, you can skip that step if needed.

joined = (
    rdd1
        .flatMap(lambda group: group)
        .reduceByKey(lambda a, b: a + b)
        .join(rdd2)
)

answered Dec 2 at 0:30

AtilaSol

5611 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ahmed Sohail Aslam PhDCS 2025 Dec 5 at 2:35

Your comments are valuable but my problem is little different. I need to keep the nested structure as each structure represents one document and I want to see how many words of that document overlap with overall set of words. I tried something like this rdd1.join(rdd2.map(lambda x:x[0:-1]) but it gives me empty list.

Collectives™ on Stack Overflow

How to Join two RDDs in pyspark with nested tuples

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related