1

I have a dataset of countries health expenditure and life expectancy and wish to plot these visually.

I currently have the code:

    dd = data.frame(Series_Name = "Health expenditure per capita (current US$) Australia",
  Year = c(2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014),
  Value = c(1665.200,1883.316,2370.881,2933.229,3214.031,3421.908,4077.852,4410.438,4256.641,5324.517,6368.424,6543.524,6258.467,6031.107))

Which I am then plotting with:

require(ggplot2)
##The values Year, Value, School_ID are
##inherited by the geoms
ggplot(dd, aes(Year, Value,colour=Series_Name)) + 
    geom_line() + 
    geom_point()

This displays the graph how I would like, although the issue is that I would to be able to specify which series of data should be placed within the value variable to avoid inputting it manually, the year does not need to be changed and can stay how it is.

The data has been read in from a csv file and saved to the variable 'statistics'. The data looks like this:

Series Name 2001    2002    2003    2004    2005    2006    2007    2008    2009    2010    2011    2012    2013    2014
Health expenditure per capita (current US$) Australia   1665.200    1883.316    2370.881    2933.229    3214.031    3421.908    4077.852    4410.438    4256.641    5324.517    6368.424    6543.524    6258.467    6031.107

If I wished to change data from Australia to Japan, how would I go about doing so, the Series name is set out the same with the exception of the country name.

Thanks for your help!

EDIT: Thought it may beneficial to add an image of the data layout.

image

The statistics.csv file - https://ufile.io/ocynw

3
  • 1
    One option is to melt() your data frame (from wide to long format). See ?reshape2::melt. Then you could plot all the countries or just select some of them. If you add the csv dataset we could illustrate how it works. Note that a pic of the data is not useful. Commented May 28, 2017 at 8:19
  • 1
    @ed_sans Thanks, I have added a link to the file. That would be much appreciated! Commented May 28, 2017 at 8:33
  • Try tidyr::gather(df, key = year, value = expenditure, -Series Name) to reshape your data Commented May 28, 2017 at 9:07

1 Answer 1

1

You could use the following approach. If your data frame is called dd:

names(dd) <- c("Series_Name", seq(2001,2014,1))
library(reshape2)
library(tidyverse)
library(stringr)

We first convert your data frame from wide to long format:

dd2 <- melt(dd, id.vars=c("Series_Name"), value.name = c("value"))

Selecting the variables 'Health expenditure per capita' only

dd2 <- dd2[startsWith(as.character(dd2$Series_Name), prefix = "Health expenditure per capita"), ]

Creating a column with the name of the country that will appear in the legend:

dd2$country <- as.factor(word(dd2$Series_Name,-1) )

Sorting your data:

dd2 <- arrange(dd2, country)

and plotting all the countries:

ggplot(dd2, aes(x = variable, y = value, group=country, color=country)) + geom_line() + 
  geom_point()

enter image description here

If you want just Japan:

filter(dd2, country == "Japan") %>%
ggplot(aes(x = variable, y = value, group=country, color=country)) + 
  geom_line() +   geom_point()
Sign up to request clarification or add additional context in comments.

3 Comments

I like this idea, I may need to change a few things due to requirements but this is pretty much what I wanted. I haven't loaded the dataset into a data frame yet as you can see I was just experimenting and doing it manually before. Can you give me any tips on how exactly you loaded the csv into the frame dd?
Using dd <- read.csv("C:\\path\\statistics.csv"). Also please see stackoverflow.com/help/someone-answers
Thankyou very much for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.