Americans Have Eight Kinds of Days

R
Clustering
Open Data
Author

Louis Teitelbaum

Published

August 23, 2023

The American Time Use Survey takes a wide sample of Americans and interviews them in great detail about what they did the day before. They have done this each year going back to 2003. With a little discretion, I can group the activities into broader categories and produce the following graph:

There is a lot to unpack in this graph. I may come back to some of the trends here in future posts, but right now I want to take an entirely different approach to the data: cluster analysis.

Tip

Code for this article is available here.

Looking at average daily hours spent is enlightening, but it obscures the fundamental experience of time-use: What kinds of days do Americans have? A 1 hour-a-day average for socializing could mean that every American spends one hour each day socializing, or it could mean that 5% of Americans spend 20 hours a day socializing and the rest don’t socialize at all. It could also mean that half of Americans spend 7 hours a day socializing on weekends, and the rest don’t socialize at all. My point: Averages don’t tell us much about the ways in which individual Americans spend their days.

Cluster analysis can solve this problem. Clustering is a family of machine learning techniques that ask the question: What types of cases are there? It can help us say things like “There are two kinds of people in the world…” or “There are three types of days that Americans have”.

I will be using the k-means clustering algorithm - probably the most popular clustering method. Given the number of clusters k, it finds a “center” for each cluster - the average of all cases in the cluster - so that each case is assigned to its nearest center. This makes it very easy to visualize the clusters, since each cluster’s center is the paradigmatic example for all of its members. But k-means does have one tricky aspect: The algorithm only figures out what the clusters are like, not how many of them there should be. The right number of clusters needs to be figured out in advance.

A popular, sophisticated method for finding the optimal number of clusters (the optimal k) is the gap statistic method, introduced by Tibshirani et al. (Standford University, 2001). This metric compares the total within-cluster dispersion for each k (how spread out the cluster members are) with the dispersion that could be expected if the data were totally random and uncluttered. The “gap” is the difference between these two numbers. The chosen number of clusters is the lowest one for which one higher would result in a significantly smaller gap.

Running this algorithm on a subset of the time use data results in the following graph:

$data
       logW   E.logW      gap      SE.sim clusters     ymin     ymax
1  13.50450 14.78657 1.282068 0.001480213        1 1.280588 1.283548
2  13.30652 14.74339 1.436868 0.001392369        2 1.435476 1.438261
3  13.21941 14.72478 1.505371 0.001388561        3 1.503982 1.506759
4  13.19028 14.70927 1.518987 0.001386684        4 1.517600 1.520373
5  13.16623 14.69873 1.532501 0.001375256        5 1.531126 1.533877
6  13.14326 14.68896 1.545699 0.001373610        6 1.544325 1.547073
7  13.12145 14.68063 1.559187 0.001364544        7 1.557822 1.560551
8  13.09773 14.67253 1.574803 0.001375587        8 1.573427 1.576178
9  13.07577 14.66646 1.590693 0.001347867        9 1.589345 1.592041
10 13.05627 14.66081 1.604540 0.001351497       10 1.603188 1.605891
11 13.03194 14.65612 1.624181 0.001339475       11 1.622842 1.625521
12 13.01210 14.65159 1.639487 0.001369429       12 1.638118 1.640857
13 12.99600 14.64733 1.651325 0.001376263       13 1.649949 1.652702
14 12.98763 14.64309 1.655458 0.001348115       14 1.654110 1.656806
15 12.97476 14.63887 1.664107 0.001363974       15 1.662743 1.665471
16 12.96563 14.63474 1.669111 0.001412159       16 1.667699 1.670523
17 12.95959 14.63177 1.672184 0.001374198       17 1.670809 1.673558
18 12.94847 14.62906 1.680590 0.001350035       18 1.679240 1.681940
19 12.94091 14.62653 1.685616 0.001341744       19 1.684274 1.686958
20 12.93651 14.62405 1.687544 0.001363016       20 1.686181 1.688907

$layers
$layers[[1]]
mapping: group = 1 
geom_line: na.rm = FALSE, orientation = NA
stat_identity: na.rm = FALSE
position_identity 

$layers[[2]]
mapping:  
geom_point: na.rm = FALSE
stat_identity: na.rm = FALSE
position_identity 

$layers[[3]]
mapping: ymin = ~ymin, ymax = ~ymax 
geom_errorbar: na.rm = FALSE, orientation = NA, width = 0.2
stat_identity: na.rm = FALSE
position_identity 

$layers[[4]]
mapping: xintercept = ~xintercept 
geom_vline: na.rm = FALSE
stat_identity: na.rm = FALSE
position_identity 


$scales
<ggproto object: Class ScalesList, gg>
    add: function
    add_defaults: function
    add_missing: function
    backtransform_df: function
    clone: function
    find: function
    get_scales: function
    has_scale: function
    input: function
    map_df: function
    n: function
    non_position_scales: function
    scales: list
    set_palettes: function
    train_df: function
    transform_df: function
    super:  <ggproto object: Class ScalesList, gg>

$mapping
$x
<quosure>
expr: ^clusters
env:  0x11553c040

$y
<quosure>
expr: ^gap
env:  0x11553c040

attr(,"class")
[1] "uneval"

$theme
$line
$colour
[1] "black"

$linewidth
[1] 0.5454545

$linetype
[1] 1

$lineend
[1] "butt"

$arrow
[1] FALSE

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_line" "element"     

$rect
$fill
[1] "white"

$colour
[1] "black"

$linewidth
[1] 0.5454545

$linetype
[1] 1

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_rect" "element"     

$text
$family
[1] ""

$face
[1] "plain"

$colour
[1] "black"

$size
[1] 12

$hjust
[1] 0.5

$vjust
[1] 0.5

$angle
[1] 0

$lineheight
[1] 0.9

$margin
[1] 0points 0points 0points 0points

$debug
[1] FALSE

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$title
NULL

$aspect.ratio
NULL

$axis.title
NULL

$axis.title.x
$family
NULL

$face
NULL

$colour
NULL

$size
NULL

$hjust
NULL

$vjust
[1] 1

$angle
NULL

$lineheight
NULL

$margin
[1] 3points 0points 0points 0points

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$axis.title.x.top
$family
NULL

$face
NULL

$colour
NULL

$size
NULL

$hjust
NULL

$vjust
[1] 0

$angle
NULL

$lineheight
NULL

$margin
[1] 0points 0points 3points 0points

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$axis.title.x.bottom
NULL

$axis.title.y
$family
NULL

$face
NULL

$colour
NULL

$size
NULL

$hjust
NULL

$vjust
[1] 1

$angle
[1] 90

$lineheight
NULL

$margin
[1] 0points 3points 0points 0points

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$axis.title.y.left
NULL

$axis.title.y.right
$family
NULL

$face
NULL

$colour
NULL

$size
NULL

$hjust
NULL

$vjust
[1] 0

$angle
[1] -90

$lineheight
NULL

$margin
[1] 0points 0points 0points 3points

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$axis.text
$family
NULL

$face
NULL

$colour
[1] "black"

$size
NULL

$hjust
NULL

$vjust
NULL

$angle
NULL

$lineheight
NULL

$margin
NULL

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$axis.text.x
$family
NULL

$face
NULL

$colour
NULL

$size
NULL

$hjust
NULL

$vjust
[1] 1

$angle
NULL

$lineheight
NULL

$margin
[1] 2.4points 0points   0points   0points  

$debug
NULL

$inherit.blank
[1] FALSE

attr(,"class")
[1] "element_text" "element"     

$axis.text.x.top
$family
NULL

$face
NULL

$colour
NULL

$size
NULL

$hjust
NULL

$vjust
[1] 0

$angle
NULL

$lineheight
NULL

$margin
[1] 0points   0points   2.4points 0points  

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$axis.text.x.bottom
NULL

$axis.text.y
$family
NULL

$face
NULL

$colour
NULL

$size
NULL

$hjust
[1] 1

$vjust
NULL

$angle
NULL

$lineheight
NULL

$margin
[1] 0points   2.4points 0points   0points  

$debug
NULL

$inherit.blank
[1] FALSE

attr(,"class")
[1] "element_text" "element"     

$axis.text.y.left
NULL

$axis.text.y.right
$family
NULL

$face
NULL

$colour
NULL

$size
NULL

$hjust
[1] 0

$vjust
NULL

$angle
NULL

$lineheight
NULL

$margin
[1] 0points   0points   0points   2.4points

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$axis.ticks
$colour
[1] "black"

$linewidth
NULL

$linetype
NULL

$lineend
NULL

$arrow
[1] FALSE

$inherit.blank
[1] FALSE

attr(,"class")
[1] "element_line" "element"     

$axis.ticks.x
NULL

$axis.ticks.x.top
NULL

$axis.ticks.x.bottom
NULL

$axis.ticks.y
NULL

$axis.ticks.y.left
NULL

$axis.ticks.y.right
NULL

$axis.ticks.length
[1] 3points

$axis.ticks.length.x
NULL

$axis.ticks.length.x.top
NULL

$axis.ticks.length.x.bottom
NULL

$axis.ticks.length.y
NULL

$axis.ticks.length.y.left
NULL

$axis.ticks.length.y.right
NULL

$axis.line
$colour
[1] "black"

$linewidth
[1] 0.5

$linetype
NULL

$lineend
NULL

$arrow
[1] FALSE

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_line" "element"     

$axis.line.x
NULL

$axis.line.x.top
NULL

$axis.line.x.bottom
NULL

$axis.line.y
NULL

$axis.line.y.left
NULL

$axis.line.y.right
NULL

$legend.background
$fill
NULL

$colour
[1] NA

$linewidth
NULL

$linetype
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_rect" "element"     

$legend.margin
[1] 6points 6points 6points 6points

$legend.spacing
[1] 12points

$legend.spacing.x
NULL

$legend.spacing.y
NULL

$legend.key
list()
attr(,"class")
[1] "element_blank" "element"      

$legend.key.size
[1] 1.2lines

$legend.key.height
NULL

$legend.key.width
NULL

$legend.text
$family
NULL

$face
NULL

$colour
NULL

$size
[1] 0.8 *

$hjust
NULL

$vjust
NULL

$angle
NULL

$lineheight
NULL

$margin
NULL

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$legend.text.align
NULL

$legend.title
$family
NULL

$face
NULL

$colour
NULL

$size
NULL

$hjust
[1] 0

$vjust
NULL

$angle
NULL

$lineheight
NULL

$margin
NULL

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$legend.title.align
NULL

$legend.position
[1] "top"

$legend.direction
NULL

$legend.justification
[1] "center"

$legend.box
NULL

$legend.box.just
NULL

$legend.box.margin
[1] 0cm 0cm 0cm 0cm

$legend.box.background
list()
attr(,"class")
[1] "element_blank" "element"      

$legend.box.spacing
[1] 12points

$panel.background
$fill
[1] "white"

$colour
[1] NA

$linewidth
NULL

$linetype
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_rect" "element"     

$panel.border
list()
attr(,"class")
[1] "element_blank" "element"      

$panel.spacing
[1] 6points

$panel.spacing.x
NULL

$panel.spacing.y
NULL

$panel.grid
$colour
[1] "grey92"

$linewidth
NULL

$linetype
NULL

$lineend
NULL

$arrow
[1] FALSE

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_line" "element"     

$panel.grid.major
list()
attr(,"class")
[1] "element_blank" "element"      

$panel.grid.minor
list()
attr(,"class")
[1] "element_blank" "element"      

$panel.grid.major.x
NULL

$panel.grid.major.y
NULL

$panel.grid.minor.x
NULL

$panel.grid.minor.y
NULL

$panel.ontop
[1] FALSE

$plot.background
$fill
NULL

$colour
[1] "white"

$linewidth
NULL

$linetype
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_rect" "element"     

$plot.title
$family
NULL

$face
NULL

$colour
NULL

$size
[1] 1.2 *

$hjust
[1] 0

$vjust
[1] 1

$angle
NULL

$lineheight
NULL

$margin
[1] 0points 0points 6points 0points

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$plot.title.position
[1] "panel"

$plot.subtitle
$family
NULL

$face
NULL

$colour
NULL

$size
NULL

$hjust
[1] 0

$vjust
[1] 1

$angle
NULL

$lineheight
NULL

$margin
[1] 0points 0points 6points 0points

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$plot.caption
$family
NULL

$face
NULL

$colour
NULL

$size
[1] 0.8 *

$hjust
[1] 1

$vjust
[1] 1

$angle
NULL

$lineheight
NULL

$margin
[1] 6points 0points 0points 0points

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$plot.caption.position
[1] "panel"

$plot.tag
$family
NULL

$face
NULL

$colour
NULL

$size
[1] 1.2 *

$hjust
[1] 0.5

$vjust
[1] 0.5

$angle
NULL

$lineheight
NULL

$margin
NULL

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$plot.tag.position
[1] "topleft"

$plot.margin
[1] 6points 6points 6points 6points

$strip.background
$fill
[1] "#F2F2F2"

$colour
[1] "black"

$linewidth
[1] 0.7

$linetype
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_rect" "element"     

$strip.background.x
NULL

$strip.background.y
NULL

$strip.clip
[1] "inherit"

$strip.placement
[1] "inside"

$strip.text
$family
NULL

$face
NULL

$colour
[1] "grey10"

$size
[1] 0.8 *

$hjust
NULL

$vjust
NULL

$angle
NULL

$lineheight
NULL

$margin
[1] 4.8points 4.8points 4.8points 4.8points

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$strip.text.x
NULL

$strip.text.x.bottom
NULL

$strip.text.x.top
NULL

$strip.text.y
$family
NULL

$face
NULL

$colour
NULL

$size
NULL

$hjust
NULL

$vjust
NULL

$angle
[1] -90

$lineheight
NULL

$margin
NULL

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$strip.text.y.left
$family
NULL

$face
NULL

$colour
NULL

$size
NULL

$hjust
NULL

$vjust
NULL

$angle
[1] 90

$lineheight
NULL

$margin
NULL

$debug
NULL

$inherit.blank
[1] TRUE

attr(,"class")
[1] "element_text" "element"     

$strip.text.y.right
NULL

$strip.switch.pad.grid
[1] 3points

$strip.switch.pad.wrap
[1] 3points

attr(,"class")
[1] "theme" "gg"   
attr(,"complete")
[1] TRUE
attr(,"validate")
[1] TRUE

$coordinates
<ggproto object: Class CoordCartesian, Coord, gg>
    aspect: function
    backtransform_range: function
    clip: on
    default: TRUE
    distance: function
    draw_panel: function
    expand: TRUE
    is_free: function
    is_linear: function
    labels: function
    limits: list
    modify_scales: function
    range: function
    render_axis_h: function
    render_axis_v: function
    render_bg: function
    render_fg: function
    reverse: none
    setup_data: function
    setup_layout: function
    setup_panel_guides: function
    setup_panel_params: function
    setup_params: function
    train_panel_guides: function
    transform: function
    super:  <ggproto object: Class CoordCartesian, Coord, gg>

$facet
<ggproto object: Class FacetNull, Facet, gg>
    attach_axes: function
    attach_strips: function
    compute_layout: function
    draw_back: function
    draw_front: function
    draw_labels: function
    draw_panel_content: function
    draw_panels: function
    finish_data: function
    format_strip_labels: function
    init_gtable: function
    init_scales: function
    map_data: function
    params: list
    set_panel_size: function
    setup_data: function
    setup_panel_params: function
    setup_params: function
    shrink: TRUE
    train_scales: function
    vars: function
    super:  <ggproto object: Class FacetNull, Facet, gg>

$plot_env
<environment: 0x10d1a0390>

$labels
$labels$subtitle
[1] "Gap Statistic"

$labels$y
[1] "Gap statistic (k)"

$labels$x
[1] "Number of clusters k"

$labels$title
[1] "Optimal number of clusters"

$labels$group
[1] "group"

$labels$ymin
[1] "ymin"

$labels$ymax
[1] "ymax"

$labels$xintercept
[1] "xintercept"


attr(,"class")
[1] "gg"     "ggplot"

The maximum gap statistic is identified here at k = 20, the maximum k tested. Nevertheless, it is clear from the full graph that 3 is the lowest reasonable number of clusters. A pattern similar to this, in which the gap statistic continues to rise after an initial local maximum, is discussed by Tibshirani et al. in the original paper:

In cases where there are smaller subclusters within larger well-separated clusters, it can exhibit non-monotone behaviour. Hence it is important to examine the entire gap curve rather than simply to find the position of its maximum.

In our case, this means that there are a lot of fine distinctions in the ways people spend their days, but k = 3 may be a good start for a very general breakdown. For now then, there are three types of days that Americans have!

I’ve named the three clusters “Leisure With Responsibilities”, “Leisure Without Responsibilities”, and “Work Day.” A day of leisure with responsibilities has basically no work and a fair amount of relaxing, watching TV, or surfing the internet, but it also has time spent caring for children, preparing food, and doing housework. It also has a notable amount of socializing. A day of leisure without responsibilities is similar, but with none of the household responsibilities and none of the socializing. Almost all time not spent sleeping or eating is spent relaxing and in front of the TV or computer. A work day has about 8 hours of work, but otherwise looks more similar to the day of leisure with responsibilities, minus the housework and home maintenance.

How are the clusters distributed over the week?

Sure enough, workdays in America mostly happen from Monday through Friday. On weekends, Americans tend to have days of leisure with responsibilities. The proportion of leisure without responsibilities also goes up a bit on weekends, but not dramatically. I get the sense that the kind of people who have days of leisure without responsibilities are often not the kind of people who have day jobs. Can we see this with a breakdown by age group?

Sure enough, the most avid practitioners of leisure without responsibilities are senior citizens. Children under 18 also have a few more of those than the working-age adults, but not much. This suggests to me that the algorithm has identified school days as leisure with responsibilities. To investigate this possibility further, it may be worthwhile to try out a breakdown into more clusters. Before I do that, though, I’d like to recreate the first graph in this post, but now with types of days instead of averages for individual activities.

If you squint, the 2008 financial crisis is recognizable as a drop in workdays and rise in days of leisure without responsibilities. COVID is even more dramatic. The 2020 fall in days of leisure with responsibilities among children under 18 confirms my suspicion that school days count.

Interestingly, there looks to be a steady rise in days of leisure without responsibilities, especially among the elderly. This is accompanied by a steady decline in days of leisure with responsibilities across all age groups.

To see some finer detail, I’d like to re-cluster with a higher k and see what it comes up with. I’m not sure I could make sense of 18 different categories, so I’ll chose k = 8.

Here is the school day in its full glory! Now we also have two types of leisure days without responsibilities, a parenting day, a non-parental household responsibilities day, a day of socializing, and a day with extra sleep.

We can now see that school days and work days both tend to happen on Monday through Friday. Days of extra sleeping happen most often on Sundays, likely reflecting a sleep deficit from the work week. Social days are most common on Saturdays.

Again we see the prominent 2020 drop in both school days (for children) and work days (for working-age people), corresponding with a rise in days of leisure of all sorts. It looks like the younger people in 2020 spiked more dramatically toward days of TV/Radio/Computer use, whereas older people found other ways to kick back. The slow trends are more interesting though. Days of household responsibilities are clearly on a long and slow decline, as are days of socializing. Days of sleeping in look to be on the rise. Again the slow rise in days of leisure without responsibilities is most evident among the elderly, but not all days of leisure without responsibilities - only ones spend entirely on the TV/Radio/Computer. The other sort of leisure days are on the decline.

I’m curious to see the same graph, but broken up by income rather than age.

This looks like all the trends observed above are true of all income brackets. Looking at these together with the individual-activity graph from the beginning of this post, the most dramatic long-term trends are:

Finally, I wonder how these clusters are distributed between sexes.

Males have more workdays, and more days spent entirely on the TV/Radio/Computer. Females have more parenting days and more days of household responsibilities. Females look to have slighty more social days than males do, but not by much. In 2020, males increased their days doing household responsibilities more than did females, presumably since household responsibilities replaced their workdays. This is all basically what I expected. Interestingly, it does look like females are increasing their days sleeping in faster than are males. I don’t have a good explanation for why that might be.