We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.
Quick note - if you’re reading this before April 1 it’s because due to something I need to do on April 2 that might keep me away from the computer for a day or two, I’m getting ahead of the challenge by doing some charts in late March. I’ll have at least 3 done, if not more. To at least play within the spirit of the challenge I won’t post them to social media until April 1 and I’m putting the first publish date on blog posts as April 1.
It’s time again for the 30 Day Chart Challenge, a social-media-driven data viz challenge to create and share a data visualization on any topic, but somehow corresponding to the daily prompt as suggested by the organizers.
I did participate last year, though started late and did not get to every prompt. Like last year I am going to keep all the prompt posts in one mega-post, adding to it regularly. Perhaps not each day, but I will try to get to every prompt. Like last year:
I will post contributions to social media via my Bluesky, and LinkedIn accounts.
I will add contributions to this post, meaning the posting date will refresh to the date I’ve added the chart. So If I get to a lot or even all of them, this will be a long post.
Use the table of contents to the right to navigate to a specific chart.
Images are “lightboxed”, so click on them to enlarge. Click outside the image to return.
One major change is that this year I am going to only pull data from Danmarks Statistik (Statistics Danmark, the national statistics agency) via the danstat package for r. To the extent possible I’ll be focusing on education data. These constraints will help me by not having to figure out which data source to use each day, plus I’ll hopefully be telling a story of the coming weeks.
It may not be a straight linear narrative and not every post will necessarily connect to the whole. I may not even know exactly what I have until I’m done. I have sketched out ideas for many of the prompts, but things may change as I go along. If the daily prompt just doesn’t lend itself to Danish education data, or Danish data at all, the subject for that chart will be something else.
As with all of my posts, I’m offering a tutorial/how-to as much as I am trying to convey insights from the data. If a topic or approach interests you, hopefully there’s enough explanation in the writing or the code so that you can do something similar or use it to hone your r skills.
Let’s get to the charts.
Prompt #1 - Fractions
The first week of prompts is devoted to comparisons, and the first day is fractions. I cheated a bit and replicated last year’s 1st prompt “Part-to-Whole”. But where last year I looked at educational attainment in the UK, this year it’s Denmark.
I won’t do too much explaining about education levels in Denmark, you can read up on them on the ministry’s page.
We’ll do a horizontal stacked bar chart displaying the highest education attained by Danes aged 25-69 as of 2023, with this population divided into four age groups.
To pull data from danstat, which accesses the Danmarks Statistik API, you have to know at least the table name and number. So it’s necessary to poke around Danmarks Statistik’s StatBank page and figure out which table has the data you need. The value added of the API and package is not having to download and keep flat files.
The table_meta command produces a nested tibble with values for each field in the table. To get the data you first create a list of fields and values (see the varaibles_ed object) then use the get_data command to actually fetch the data.
Show code for getting and cleaning data
library(tidyverse) # to do tidyverse thingslibrary(tidylog) # to get a log of what's happening to the datalibrary(janitor) # tools for data cleaninglibrary(danstat) # package to get Danish statistics via apilibrary(ggtext) # enhancements for text in ggplot# some custom functionssource("~/Data/r/basic functions.R")# metadata for table variables, click thru nested tables to find variables and ids for filterstable_meta <- danstat::get_table_metadata(table_id ="hfudd11", variables_only =TRUE)# create variable list using the ID value in the variablevariables_ed <-list(list(code ="bopomr", values =c("000", "081", "082", "083", "084", "085")),list(code ="hfudd", values =c("H10", "H20", "H30", "H35","H40", "H50", "H60", "H70", "H80", "H90")),list(code ="køn", values =c("TOT")),list(code ="alder", values =c("25-29", "30-34", "35-39", "40-44", "45-49","50-54", "55-59", "60-64", "65-69")),list(code ="tid", values =2023))# past variable list along with table name.# note that in package, table name is lower case, though upper case on statbank page.edattain1 <-get_data("hfudd11", variables_ed, language ="da") %>%as_tibble() %>%select(region = BOPOMR, age = ALDER, edlevel = HFUDD, n = INDHOLD)# arrange data for chart# collapse age and education levels into groupsedattain <- edattain1 %>%mutate(age_group =case_when(age %in%c("25-29 år", "30-34 år", "35-39 år") ~"25-39", age %in%c("40-44 år","45-49 år") ~"40-49", age %in%c("50-54 år","55-59 år") ~"50-59", age %in%c("60-64 år","65-69 år") ~"60-69")) %>%mutate(ed_group =case_when(edlevel =="H10 Grundskole"~"Grundskole/Primary", edlevel %in%c("H20 Gymnasiale uddannelser","H30 Erhvervsfaglige uddannelser","H35 Adgangsgivende uddannelsesforløb") ~"Secondary", edlevel =="H40 Korte videregående uddannelser, KVU"~"Tertiary - 2yr", edlevel %in%c("H50 Mellemlange videregående uddannelser, MVU","H60 Bacheloruddannelser, BACH") ~"Tertiary - Bachelor", edlevel =="H70 Lange videregående uddannelser, LVU"~"Tertiary - Masters", edlevel =="H80 Ph.d. og forskeruddannelser"~"Tertiary - PhD", edlevel =="H90 Uoplyst mv."~"Not stated")) %>%group_by(region, age_group, ed_group) %>%mutate(n2 =sum(n)) %>%ungroup() %>%select(-n, -age) %>%distinct(region, age_group, ed_group, .keep_all = T) %>%rename(n = n2) %>%mutate(ed_group =factor(ed_group,levels =c("Grundskole/Primary", "Secondary", "Tertiary - 2yr","Tertiary - Bachelor", "Tertiary - Masters", "Tertiary - PhD", "Not stated")))
Now that we have a nice tidy tibble, let’s make a chart. The approach is fairly straight-forward ggplot. I used tricks like fct_rev to order the age groups on the chart, and the scales and ggtext packages to make things look nicer.
Show code for making the chart
## chart - all DK, horizontal bar age groups, percent stacks percent by deg leveledattain %>%filter(region =="Hele landet") %>%group_by(age_group) %>%mutate(age_total =sum(n)) %>%mutate(age_pct =round(n/age_total, 2)) %>%select(age_group, ed_group, age_pct) %>%# the line below passes temporary changes to the data object through to the chart commands {. ->> tmp} %>%ggplot(aes(age_pct, fct_rev(age_group), fill =fct_rev(ed_group))) +geom_bar(stat ="identity") +scale_x_continuous(expand =c(0,0),breaks =c(0, 0.25, 0.50, 0.75, 1),labels =c("0", "25%", "50%", "75%", "100%")) +geom_text(data =subset(tmp, age_pct >0.025),aes(label = scales::percent(round(age_pct , 2))),position =position_stack(vjust =0.5),color="white", vjust =0.5, size =12) +scale_fill_brewer(palette ="Set3") +labs(title ="Danes under 40 have a higher rate of post-Bachelor educational attainment than other age groups",subtitle ="Highest education level attained by age groups",caption ="*Data from Danmarks Statistik via danstat package*",x ="", y ="Age Group") +theme_minimal() +theme(legend.position ="bottom", legend.spacing.x =unit(0, 'cm'),legend.key.width =unit(4, 'cm'), legend.margin=margin(-10, 0, 0, 0),legend.text =element_text(size =12), legend.title =element_text(size =16),plot.title =element_text(hjust = .5, size =20),plot.subtitle =element_text(size =16),plot.caption =element_markdown(size =12, face ="italic"),axis.text.x =element_text(size =14),axis.text.y =element_text(size =14),panel.grid.major =element_blank(), panel.grid.minor =element_blank()) +guides(fill =guide_legend(label.position ="bottom", reverse =TRUE, direction ="horizontal", nrow =1, title ="Highest Educational Attainment", title.position ="top")) rm(tmp)
As you might expect, younger Danes tend to have higher levels of educational attainment than Danes over 60. That’s not uncommmon in countries with post-secondary education policies designed to increase access and attainment. As in the US this is a post-WWII development, though accelerated in the 1960s and after.
It’s been successful public policy in Denmark for many decades to increase access to schooling beyond the Grundskole (primary years…to about grade 9 in a US context). More than 45% of Danes between 25-40 have at least a Bachelor’s degree, and 20% have a Master’s. There would of course be more nuance in disaggregated age groups, but I wanted to keep this first story a bit more simple.
created and posted March 24, 2025
Prompt #2 - Slope
Building on what we saw in Prompt 1 for educational attainment by age group in 2023, let’s use the slope prompt to compare 2023 to 2005, the furthest back we can get educational attainment by age statistics via Danmarks Statistik’s StatBank.
The process for getting data is more or less the same as in Prompt 1, so I won’t repeat the code here. The only thing I did differently was collapse the 2-year and Bachelor’s degrees into one “college” category and collapsed Master’s and PhD into one post-bacc group.
For this chart I want to see if there are differences in highest educational attainment level by age group in two distinct years. Instead of one chart with too many lines, essentially a spaghetti graph, I made four charts and used the patchwork package (created by a Dane!) to facet out the individual charts into a (hopefully) coherent whole.
Keep in mind that while of course many people will be represented in both 2005 and 2023, this is not a longitudinal study. It’s a snapshot of the Danish population by age groups at two distinct points in time.
We have the data, and it’s time to build the charts. But instead of copy-pasting the ggplot code to build each charts, let’s create a function and use that four times.
Show code for the chart function
# plotdf is the placeholder name in the function for the data you pass to run itslope_graph <-function(plotdf) { plotdf %>%ggplot(aes(x = year, y = age_ed_pct, group = age_group, color = age_group)) +geom_line(size =1) +geom_point(size = .2) +scale_x_continuous(breaks =c(2005, 2023),labels =c("2005", "2023")) +scale_y_continuous(limits =c(0, .5),labels = scales::label_percent()) +scale_color_brewer(palette ="Set2") +labs(x ="", y ="") +theme_minimal() +theme(legend.position ="bottom", legend.spacing.x =unit(0, 'cm'),legend.key.width =unit(3, 'cm'), legend.margin=margin(-10, 0, 0, 0),legend.text =element_text(size =10), legend.title =element_text(size =12),plot.title =element_text(hjust = .5, size =16),plot.subtitle =element_markdown(size =14, vjust =-.5),plot.caption =element_markdown(size =12, face ="italic"),axis.text.x =element_text(size =11),axis.text.y =element_text(size =11),panel.grid.major =element_blank(), panel.grid.minor =element_blank(),strip.background =element_blank()) +guides(color =guide_legend(label.position ="bottom", reverse =FALSE, direction ="horizontal", nrow =1,title ="Age Group", title.position ="top"))}
Now let’s put the function to use. We’ll create four plots, then stitch them together. Each plot will have its own descriptive title and subtitle, and the final chart will have one as well. Notice that we’re using the titles to tell a quick story or share an insight, not to dryly name what the chart is doing. The ggtext package gives us markdown enhancements for chart elements like the title and subtitle.
Show code for making the final chart
# create individual plots with titles and annotationsplot_grundsk <- edattain2 %>%filter(ed_group2 =="Grundskole/Primary") %>%slope_graph() +labs(title ="Primary school (thru grade 10)",subtitle ="*In 2023 Danes of all ages were much less likely to have stopped their education at<br>primary school than they were in 2005.<br>*")plot_hs <- edattain2 %>%filter(ed_group2 =="Secondary") %>%slope_graph() +labs(title ="Gymnasium & Vocational (High School)",subtitle ="*From 2005 to 2023, there was a steep decline in the percentage of Danes<br> aged 25-49 who were finished with education at the high school level, especially<br> Danes under 40. For Danes older than 50 there was a very slight increase.*")plot_colldegs <- edattain2 %>%filter(ed_group2 =="Tertiary - 2yr/Bach") %>%slope_graph() +labs(title ="2-year & Bachelor's Degrees",subtitle ="*For Danes of all age groups, but especially those under 50, there was a<br> noticable increase in the percentage earning 2 or 4 year degrees.*")plot_masters <- edattain2 %>%filter(ed_group2 =="Tertiary - Masters+") %>%slope_graph() +labs(title ="Master's & PhD Degrees",subtitle ="*The percentage of Danes earning a Master's or PhD increased across all age groups between 2005 and 2025;<br> the increase was strongest among Danes under 50.*")plot_grundsk + plot_hs + plot_colldegs + plot_masters +plot_annotation(title ="Danes of all ages have become more likely to continue their education beyond primary level. Danes under 50 have over time become likely to earn a Master's.",subtitle ="*Highest level of education earned by age groups, in 2005 and 2023.*",caption ="*Data from Danmarks Statistik via danstat package. Groups are not longitudinal - age is for the person in the year of data collection.*") &theme(plot.title =element_text(size =16), plot.subtitle =element_markdown(),plot.caption =element_markdown(),# plot.background add the grey lines between plotsplot.background =element_rect(colour ="grey", fill=NA))
The story here is hopefully clear. In 2005, more Danes, especially Danes 60+ years of age, had ended their education at primary level. By 2023 many more Danes were likely to have progressed beyond the primary level, and younger Danes especially were finishing Bachelor’s degrees and earning Master’s degrees.
It’s been government policy to support furthering education levels via free tuition and living stipends. This is made possible by a democratic-socialism approach to the economy, with higher tax rates on some earners than you might see in the US or UK, but for those taxes you received and your kids get at least 4 years of college paid for, and a small living stipend to boot.
In the US public higher education affordability models vary from state to state. Some states try to keep cost-of-attendance low via more direct state support. But not all students get full rides, even with need-based aid. I am much more a fan of the socialized approach to college affordability. the data shows it works in terms of increasing overall educational attainment.
created and posted March 24, 2025
Prompt #3 - Circular
For this post I’ll make a simple circle packing chart, sort of a heat map or tree map but with circles, the size of the circles corresponding to the variable value. I’m using the code-through from the R Graph Gallery to get going, as I’ve never done this type of chart before. The packcircles package does the computational work to create the circle objects.
Keeping with the thematic constraint I’ve set, I’ll once again be using Danish education data from Danmarks Statistik via the danstat package.
This chart will display Bachelor’s degrees awarded in 2023, by top-line degree field (Science, Humanities, Education, etc). For instance, Law is a sub-field in Social Sciences, one of many. To make it a bit easier I’ll limit this chart to academic Bachelor’s, not the vocational degrees. There’s definitely a more focused project worth doing that displays the sub-fields within the top-line areas, shown as nested smaller circles within larger circles, or something interactive. for now this will suffice.
To start, let’s get the data. It’s from a different table than the last chart, so I’ll show the work.
Show code for getting and cleaning data
library(tidyverse) # to do tidyverse thingslibrary(tidylog) # to get a log of what's happening to the datalibrary(janitor) # tools for data cleaninglibrary(danstat) # package to get Danish statistics via apilibrary(packcircles) # makes the circles!library(ggtext) # enhancements for text in ggplotlibrary(ggrepel) # helps with label offsets# some custom functionssource("~/Data/r/basic functions.R")# this table is UDDAKT60, set to all lower-case in the table_id calltable_meta <- danstat::get_table_metadata(table_id ="uddakt60", variables_only =TRUE)# create variable list using the ID value in the variable# the uddanelse codes are the top-line degree fields.variables_ed <-list(list(code ="uddannelse", values =c("H6020", "H6025", "H6030", "H6035","H6039", "H6059", "H6075", "H6080", "H6090")),list(code ="fstatus", values =c("F")),list(code ="tid", values =2023))# start with a larger dataset in case we want to do more analysis or viz with itdegs1 <-get_data("uddakt60", variables_ed, language ="en") %>%as_tibble() %>%# shorten the given degree field namesmutate(deg_field =case_when(UDDANNELSE =="H6020 Educational, BACH"~"Educ.", UDDANNELSE =="H6025 Humanities and theological, BACH"~"Humanities", UDDANNELSE =="H6030 Arts, BACH"~"Arts", UDDANNELSE =="H6035 Science, BACH"~"Science", UDDANNELSE =="H6039 Social Sciences, BACH"~"Social Science", UDDANNELSE =="H6059 Technical sciences, BACH"~"Tech Science", UDDANNELSE =="H6075 Food, biotechnology and laboratory technology, BACH"~"Food/Biotech/LabTech", UDDANNELSE =="H6080 Agriculture, nature and environment, BACH"~"Agricultural Science", UDDANNELSE =="H6090 Health science, BACH"~"Health Sciences")) # keep only what we ween to do the circlesdegs2 <- degs1 %>%select(group = deg_field, value = INDHOLD, text)# this creates x & y coordinates and circle radius values.degs_packing <-circleProgressiveLayout(degs2$value, sizetype='area')# new tibble putting it togetherdegs3 <-cbind(degs2, degs_packing)# not sure exactly what this does, but it's an important stepdegs.gg <-circleLayoutVertices(degs_packing, npoints=50)
Ok, we now have a nice clean tibble, time for the plot. What took the most time here was working on the labels…getting them to the correct size, adjusting for circle size. You’ll notice one label set off to the side thanks to the ggrepel package.
Show code for making the chart
ggplot() +geom_polygon_interactive(data = degs.gg,aes(x, y, group = id, fill=id, tooltip = data$text[id], data_id = id),colour ="black", alpha =0.6) +scale_fill_viridis() +geom_text(data = degs3 %>%filter(value >6000),aes(x, y, label =paste0(group, "\n", format(value, big.mark=","))),size=7, color="black") +geom_text(data = degs3 %>%filter(between(value, 2410, 3350)),aes(x, y, label =paste0(group, "\n", format(value, big.mark=","))),size=7, color="black") +geom_text(data = degs3 %>%filter(between(value, 600, 2410)),aes(x, y, label =paste0(group, "\n", format(value, big.mark=","))),size=6, color="black") +geom_text_repel(data = degs3 %>%filter(between(value, 200, 400)),aes(x, y, label =paste0(group, "\n", format(value, big.mark=","))),size=4, color="black",max.overlaps =Inf, nudge_x =-110, nudge_y =50,segment.curvature =0,segment.ncp =8,segment.angle =30) +labs(x ="", y ="",title ="Social Sciences were the most popular Bachelor's degrees awarded by Danish universities in 2023",subtitle ="*Labels not diplayed: Education = 134, Food Science = 61*",caption ="*Data from Danmarks Statistik via danstat package*") +theme_void() +theme(legend.position="none", plot.margin=unit(c(0,0,0,0),"cm"),plot.title =element_markdown(size =16),plot.subtitle =element_markdown(size =12),plot.caption =element_markdown(size =8)) +coord_equal()
There you have it, academic Bachelor’s degrees awarded by Danish universities in 2023. The Social Sciences category accounts for more than twice as many degrees as the next field. Again, the next interesting step would be to visualize the disaggregation of the main fields, and see what subfields are the most popular. It would also be interesting to track the numbers over time, see which majors rose or fell.