0

I have two data tables in the form of Columns namely pair of Diseases and their measures as a pair. Below is the first one(sample data) disease_table1

  **d1**   **d2** **Value**

Disease1 Disease2  3.5
Disease3 Disease4  5
Disease5 Disease6  1.1
Disease1 Disease3  2.4
Disease6 Disease2  6.7

the real Dataset 1(disease_table1) is below:

 Bladder cancer                         X-linked ichthyosis (XLI)        3.5
 Leukocyte adhesion deficiency (LAD)    Aldosterone synthase Deficiency  1.8
 Leukocyte adhesion deficiency (LAD)    Brain Cancer                     1.5
 Tangier disease                        Pancreatic cancer                0.66

I want to show the difference between these two data tables while plotting the disease pairs and its values for both tables. I used the plot function and lines function but its too simple,and is not able to differentiate much.Also I would like to have the names of the disease pairs while plotting.

   plot(density(disease_table1$value))
   lines(density(disease_table1$value))

Thanks

3
  • 3
    Could you provide us with a reproducable example? Commented Jan 28, 2014 at 18:55
  • I have added the real dataset,code as an example. Commented Jan 28, 2014 at 19:39
  • With 400,000+ disease pairs you probably need a clustering approach. can you post a link to your data, or a more representative subset, say a few thousand records? Commented Jan 28, 2014 at 21:09

1 Answer 1

2

Some sample code:

# creating dataframes (i made up a second one)
df1 <- read.table(text = "d1   d2 x
Disease1 Disease2  3.5
Disease3 Disease4  5
Disease5 Disease6  1.1
Disease1 Disease3  2.4
Disease6 Disease2  6.7", header = TRUE, strip.white = TRUE)

df2 <- read.table(text = "d1   d2 y
Disease1 Disease2  4.5
Disease3 Disease4  2
Disease5 Disease6  3.1
Disease1 Disease3  1.4
Disease6 Disease2  5.7", header = TRUE, strip.white = TRUE)

# needed libraries
library(reshape2)
library(ggplot2)

# merging dataframes & creating unique identifier variable
data <- merge(df1, df2, by = c("d1","d2"))
data$diseasepair <- paste0(data$d1,"-",data$d2)

data.long <- melt(data, id="diseasepair", measure=c("x","y"), variable="group")

# make the plot
ggplot(data.long) +
  geom_bar(aes(x = diseasepair, y = value, fill = group), 
           stat="identity", position = "dodge", width = 0.7) +
  scale_fill_manual("Group\n", values = c("red","blue"), 
                    labels = c(" X", " Y")) +
  labs(x="\nDisease pair",y="Value\n") +
  theme_bw()

The result:

enter image description here

Is this what you're lookin for?

Sign up to request clarification or add additional context in comments.

4 Comments

I have 400k pairs of such kind,so I don't think this would work.It would have worked great for a smaller dataset though.I believe , a curve or heat map could work?
For 400k pairs a heat map won't work either IMHO. Do you want to compare the values for each pair? Or just for specific pairs?
Basically I want to show enrichment of disease pairs using the values in one dataset vs the other.So, I want to compare the values for each pair.
It's possibly a better solution to make subsets of your dataset for groups or for specific combinations. All those 400k pairs in one plot won't produce a plot of any value (at least that's what I think). First decide what you're looking for, then create subsets & create some plots.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.