{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "from datascience import *\n",
    "import numpy as np\n",
    "path_data = 'https://raw.githubusercontent.com/ChemeketaCS/datasci-textbook/main/assets/data/'\n",
    "np.set_printoptions(threshold=50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Example: Population Trends\n",
    "\n",
    "We are now ready to work with large tables of data. The file below contains \"Annual Estimates of the Resident Population by Single Year of Age and Sex for the United States.\" Notice that `read_table` can read data directly from a URL."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>SEX</th> <th>AGE</th> <th>CENSUS2010POP</th> <th>ESTIMATESBASE2010</th> <th>POPESTIMATE2010</th> <th>POPESTIMATE2011</th> <th>POPESTIMATE2012</th> <th>POPESTIMATE2013</th> <th>POPESTIMATE2014</th> <th>POPESTIMATE2015</th> <th>POPESTIMATE2016</th> <th>POPESTIMATE2017</th> <th>POPESTIMATE2018</th> <th>POPESTIMATE2019</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>0   </td> <td>3944153      </td> <td>3944160          </td> <td>3951430        </td> <td>3963092        </td> <td>3926570        </td> <td>3931258        </td> <td>3954787        </td> <td>3983981        </td> <td>3954773        </td> <td>3893990        </td> <td>3815343        </td> <td>3783052        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>1   </td> <td>3978070      </td> <td>3978090          </td> <td>3957730        </td> <td>3966225        </td> <td>3977549        </td> <td>3942698        </td> <td>3948891        </td> <td>3973133        </td> <td>4002903        </td> <td>3972711        </td> <td>3908830        </td> <td>3829599        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>2   </td> <td>4096929      </td> <td>4096939          </td> <td>4090621        </td> <td>3970654        </td> <td>3978925        </td> <td>3991740        </td> <td>3958711        </td> <td>3966321        </td> <td>3991349        </td> <td>4020045        </td> <td>3987032        </td> <td>3922044        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>3   </td> <td>4119040      </td> <td>4119051          </td> <td>4111688        </td> <td>4101644        </td> <td>3981531        </td> <td>3991017        </td> <td>4005928        </td> <td>3974351        </td> <td>3982984        </td> <td>4006946        </td> <td>4033038        </td> <td>3998665        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>4   </td> <td>4063170      </td> <td>4063186          </td> <td>4077346        </td> <td>4121488        </td> <td>4111490        </td> <td>3992502        </td> <td>4004032        </td> <td>4020292        </td> <td>3989750        </td> <td>3997280        </td> <td>4018719        </td> <td>4043323        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>5   </td> <td>4056858      </td> <td>4056872          </td> <td>4064521        </td> <td>4087054        </td> <td>4131049        </td> <td>4121876        </td> <td>4004576        </td> <td>4017589        </td> <td>4035033        </td> <td>4003452        </td> <td>4008443        </td> <td>4028281        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>6   </td> <td>4066381      </td> <td>4066412          </td> <td>4072904        </td> <td>4074531        </td> <td>4096631        </td> <td>4141126        </td> <td>4133372        </td> <td>4017388        </td> <td>4031568        </td> <td>4048018        </td> <td>4014057        </td> <td>4017227        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>7   </td> <td>4030579      </td> <td>4030594          </td> <td>4042990        </td> <td>4082821        </td> <td>4084175        </td> <td>4106756        </td> <td>4152666        </td> <td>4145872        </td> <td>4030888        </td> <td>4044139        </td> <td>4058370        </td> <td>4022319        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>8   </td> <td>4046486      </td> <td>4046497          </td> <td>4025501        </td> <td>4052773        </td> <td>4092559        </td> <td>4094513        </td> <td>4118349        </td> <td>4165033        </td> <td>4158848        </td> <td>4042924        </td> <td>4054236        </td> <td>4066194        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>9   </td> <td>4148353      </td> <td>4148369          </td> <td>4125312        </td> <td>4035319        </td> <td>4062726        </td> <td>4103052        </td> <td>4106068        </td> <td>4130887        </td> <td>4177895        </td> <td>4170813        </td> <td>4053179        </td> <td>4061874        </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>\n",
       "<p>... (296 rows omitted)</p>"
      ],
      "text/plain": [
       "SEX  | AGE  | CENSUS2010POP | ESTIMATESBASE2010 | POPESTIMATE2010 | POPESTIMATE2011 | POPESTIMATE2012 | POPESTIMATE2013 | POPESTIMATE2014 | POPESTIMATE2015 | POPESTIMATE2016 | POPESTIMATE2017 | POPESTIMATE2018 | POPESTIMATE2019\n",
       "0    | 0    | 3944153       | 3944160           | 3951430         | 3963092         | 3926570         | 3931258         | 3954787         | 3983981         | 3954773         | 3893990         | 3815343         | 3783052\n",
       "0    | 1    | 3978070       | 3978090           | 3957730         | 3966225         | 3977549         | 3942698         | 3948891         | 3973133         | 4002903         | 3972711         | 3908830         | 3829599\n",
       "0    | 2    | 4096929       | 4096939           | 4090621         | 3970654         | 3978925         | 3991740         | 3958711         | 3966321         | 3991349         | 4020045         | 3987032         | 3922044\n",
       "0    | 3    | 4119040       | 4119051           | 4111688         | 4101644         | 3981531         | 3991017         | 4005928         | 3974351         | 3982984         | 4006946         | 4033038         | 3998665\n",
       "0    | 4    | 4063170       | 4063186           | 4077346         | 4121488         | 4111490         | 3992502         | 4004032         | 4020292         | 3989750         | 3997280         | 4018719         | 4043323\n",
       "0    | 5    | 4056858       | 4056872           | 4064521         | 4087054         | 4131049         | 4121876         | 4004576         | 4017589         | 4035033         | 4003452         | 4008443         | 4028281\n",
       "0    | 6    | 4066381       | 4066412           | 4072904         | 4074531         | 4096631         | 4141126         | 4133372         | 4017388         | 4031568         | 4048018         | 4014057         | 4017227\n",
       "0    | 7    | 4030579       | 4030594           | 4042990         | 4082821         | 4084175         | 4106756         | 4152666         | 4145872         | 4030888         | 4044139         | 4058370         | 4022319\n",
       "0    | 8    | 4046486       | 4046497           | 4025501         | 4052773         | 4092559         | 4094513         | 4118349         | 4165033         | 4158848         | 4042924         | 4054236         | 4066194\n",
       "0    | 9    | 4148353       | 4148369           | 4125312         | 4035319         | 4062726         | 4103052         | 4106068         | 4130887         | 4177895         | 4170813         | 4053179         | 4061874\n",
       "... (296 rows omitted)"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# As of August 2021, this census file is online here: \n",
    "data = 'http://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2019/nc-est2019-agesex-res.csv'\n",
    "\n",
    "# A local copy can be accessed here in case census.gov moves the file:\n",
    "# data = path_data + 'nc-est2019-agesex-res.csv'\n",
    "\n",
    "full_census_table = Table.read_table(data)\n",
    "full_census_table"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Only the first 10 rows of the table are displayed. Later we will see how to display the entire table; however, this is typically not useful with large tables.\n",
    "\n",
    "A [description of the table](https://www2.census.gov/programs-surveys/popest/technical-documentation/file-layouts/2010-2019/nc-est2019-agesex-res.pdf) appears online. \n",
    "\n",
    "The `SEX` column contains numeric codes: `0` stands for the total, `1` for male, and `2` for female. The assumptions underlying this binary categorization are questionable. We will discuss them in the next section.\n",
    "\n",
    "In this section we will focus on age and population size.\n",
    "\n",
    "The `AGE` column contains ages in completed years, but the special value `999` is a sum of the total population. The \"age\" `100` too has a special status. In that category, the Census Bureau includes all people aged 100 or more.\n",
    "\n",
    "The other columns contain estimates of the US population in each category of sex and age in the years 2010 through 2019. The Census is decennial: it takes place every 10 years. The most recent Census was held in 2020 and the one before that in 2010. The Census Bureau also estimates population changes each year. As explained in the Bureau's description of its [methodology](https://www2.census.gov/programs-surveys/popest/technical-documentation/methodology/2010-2020/methods-statement-v2020-final.pdf), it \"adds [the estimated changes] to the last decennial census to produce updated population estimates every year.\"\n",
    "\n",
    "Typically, a publicly available table will contain more information than necessary for a particular investigation or analysis. To get the large table into a more usable form, we have to do some *data cleaning*.\n",
    "\n",
    "Suppose we are only interested in the population changes from 2014 to 2019. Let's `select` the relevant columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>SEX</th> <th>AGE</th> <th>POPESTIMATE2014</th> <th>POPESTIMATE2019</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>0   </td> <td>3954787        </td> <td>3783052        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>1   </td> <td>3948891        </td> <td>3829599        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>2   </td> <td>3958711        </td> <td>3922044        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>3   </td> <td>4005928        </td> <td>3998665        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>4   </td> <td>4004032        </td> <td>4043323        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>5   </td> <td>4004576        </td> <td>4028281        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>6   </td> <td>4133372        </td> <td>4017227        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>7   </td> <td>4152666        </td> <td>4022319        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>8   </td> <td>4118349        </td> <td>4066194        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>9   </td> <td>4106068        </td> <td>4061874        </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>\n",
       "<p>... (296 rows omitted)</p>"
      ],
      "text/plain": [
       "SEX  | AGE  | POPESTIMATE2014 | POPESTIMATE2019\n",
       "0    | 0    | 3954787         | 3783052\n",
       "0    | 1    | 3948891         | 3829599\n",
       "0    | 2    | 3958711         | 3922044\n",
       "0    | 3    | 4005928         | 3998665\n",
       "0    | 4    | 4004032         | 4043323\n",
       "0    | 5    | 4004576         | 4028281\n",
       "0    | 6    | 4133372         | 4017227\n",
       "0    | 7    | 4152666         | 4022319\n",
       "0    | 8    | 4118349         | 4066194\n",
       "0    | 9    | 4106068         | 4061874\n",
       "... (296 rows omitted)"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "partial_census_table = full_census_table.select('SEX', 'AGE', 'POPESTIMATE2014', 'POPESTIMATE2019')\n",
    "partial_census_table"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can simplify the labels of the selected columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>SEX</th> <th>AGE</th> <th>2014</th> <th>2019</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>0   </td> <td>3954787</td> <td>3783052</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>1   </td> <td>3948891</td> <td>3829599</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>2   </td> <td>3958711</td> <td>3922044</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>3   </td> <td>4005928</td> <td>3998665</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>4   </td> <td>4004032</td> <td>4043323</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>5   </td> <td>4004576</td> <td>4028281</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>6   </td> <td>4133372</td> <td>4017227</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>7   </td> <td>4152666</td> <td>4022319</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>8   </td> <td>4118349</td> <td>4066194</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>9   </td> <td>4106068</td> <td>4061874</td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>\n",
       "<p>... (296 rows omitted)</p>"
      ],
      "text/plain": [
       "SEX  | AGE  | 2014    | 2019\n",
       "0    | 0    | 3954787 | 3783052\n",
       "0    | 1    | 3948891 | 3829599\n",
       "0    | 2    | 3958711 | 3922044\n",
       "0    | 3    | 4005928 | 3998665\n",
       "0    | 4    | 4004032 | 4043323\n",
       "0    | 5    | 4004576 | 4028281\n",
       "0    | 6    | 4133372 | 4017227\n",
       "0    | 7    | 4152666 | 4022319\n",
       "0    | 8    | 4118349 | 4066194\n",
       "0    | 9    | 4106068 | 4061874\n",
       "... (296 rows omitted)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "us_pop = partial_census_table.relabeled('POPESTIMATE2014', '2014').relabeled('POPESTIMATE2019', '2019')\n",
    "us_pop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Ages 97-100\n",
    "As a warm-up, let's examine the total population, labeled by `SEX` code 0. Since all these rows will have the same value 0 in the `SEX` column, we will drop that column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>AGE</th> <th>2014</th> <th>2019</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>3954787</td> <td>3783052</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>1   </td> <td>3948891</td> <td>3829599</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>2   </td> <td>3958711</td> <td>3922044</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>3   </td> <td>4005928</td> <td>3998665</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>4   </td> <td>4004032</td> <td>4043323</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>5   </td> <td>4004576</td> <td>4028281</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>6   </td> <td>4133372</td> <td>4017227</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>7   </td> <td>4152666</td> <td>4022319</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>8   </td> <td>4118349</td> <td>4066194</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>9   </td> <td>4106068</td> <td>4061874</td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>\n",
       "<p>... (92 rows omitted)</p>"
      ],
      "text/plain": [
       "AGE  | 2014    | 2019\n",
       "0    | 3954787 | 3783052\n",
       "1    | 3948891 | 3829599\n",
       "2    | 3958711 | 3922044\n",
       "3    | 4005928 | 3998665\n",
       "4    | 4004032 | 4043323\n",
       "5    | 4004576 | 4028281\n",
       "6    | 4133372 | 4017227\n",
       "7    | 4152666 | 4022319\n",
       "8    | 4118349 | 4066194\n",
       "9    | 4106068 | 4061874\n",
       "... (92 rows omitted)"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "us_pop_by_age = us_pop.where('SEX', are.equal_to(0)).drop('SEX')\n",
    "us_pop_by_age"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's look at the population in the highest ages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>AGE</th> <th>2014</th> <th>2019</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>97  </td> <td>82948</td> <td>116969</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>98  </td> <td>59546</td> <td>86150 </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>99  </td> <td>41277</td> <td>57124 </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>100 </td> <td>70685</td> <td>100322</td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "AGE  | 2014  | 2019\n",
       "97   | 82948 | 116969\n",
       "98   | 59546 | 86150\n",
       "99   | 41277 | 57124\n",
       "100  | 70685 | 100322"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "us_pop_by_age.where('AGE', are.between(97, 101))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Not surprisingly, the numbers of people are smaller at higher ages. For example, there are fewer 99-year-olds than 98-year-olds. \n",
    "\n",
    "But the numbers for `AGE` 100 are quite a bit larger than those for age 99. That is because the row with `AGE` 100 doesn't just represent 100-year-olds. It also includes those who are older than 100. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Percent Change\n",
    "\n",
    "Each column of the table `us_pop_by_age` is an array of the same length, and so columns can be combined using arithmetic. The array below contains the change in population between 2014 and 2019. There is one entry corresponding to each row of `us_pop_by_age`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([-171735, -119292,  -36667, ...,   15847,   29637, 9938515])"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "change = us_pop_by_age.column('2019') - us_pop_by_age.column('2014')\n",
    "change"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can augment `us_pop_by_age` with a column that contains these changes, both in absolute terms and as percents relative to the value in 2014."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>AGE</th> <th>2014</th> <th>2019</th> <th>Change</th> <th>Percent Change</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>0   </td> <td>3954787</td> <td>3783052</td> <td>-171735</td> <td>-4.34%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>1   </td> <td>3948891</td> <td>3829599</td> <td>-119292</td> <td>-3.02%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>2   </td> <td>3958711</td> <td>3922044</td> <td>-36667 </td> <td>-0.93%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>3   </td> <td>4005928</td> <td>3998665</td> <td>-7263  </td> <td>-0.18%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>4   </td> <td>4004032</td> <td>4043323</td> <td>39291  </td> <td>0.98%         </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>5   </td> <td>4004576</td> <td>4028281</td> <td>23705  </td> <td>0.59%         </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>6   </td> <td>4133372</td> <td>4017227</td> <td>-116145</td> <td>-2.81%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>7   </td> <td>4152666</td> <td>4022319</td> <td>-130347</td> <td>-3.14%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>8   </td> <td>4118349</td> <td>4066194</td> <td>-52155 </td> <td>-1.27%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>9   </td> <td>4106068</td> <td>4061874</td> <td>-44194 </td> <td>-1.08%        </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>\n",
       "<p>... (92 rows omitted)</p>"
      ],
      "text/plain": [
       "AGE  | 2014    | 2019    | Change  | Percent Change\n",
       "0    | 3954787 | 3783052 | -171735 | -4.34%\n",
       "1    | 3948891 | 3829599 | -119292 | -3.02%\n",
       "2    | 3958711 | 3922044 | -36667  | -0.93%\n",
       "3    | 4005928 | 3998665 | -7263   | -0.18%\n",
       "4    | 4004032 | 4043323 | 39291   | 0.98%\n",
       "5    | 4004576 | 4028281 | 23705   | 0.59%\n",
       "6    | 4133372 | 4017227 | -116145 | -2.81%\n",
       "7    | 4152666 | 4022319 | -130347 | -3.14%\n",
       "8    | 4118349 | 4066194 | -52155  | -1.27%\n",
       "9    | 4106068 | 4061874 | -44194  | -1.08%\n",
       "... (92 rows omitted)"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "us_pop_change = us_pop_by_age.with_columns(\n",
    "    'Change', change,\n",
    "    'Percent Change', change/us_pop_by_age.column('2014')\n",
    ")\n",
    "us_pop_change.set_format('Percent Change', PercentFormatter)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Almost all the entries displayed in the `Percent Change` column are negative, demonstrating a drop in population at the youngest ages. However, the overall population grew by about 9.9 million people, a percent change of just over 3%."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>AGE</th> <th>2014</th> <th>2019</th> <th>Change</th> <th>Percent Change</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>999 </td> <td>318301008</td> <td>328239523</td> <td>9938515</td> <td>3.12%         </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "AGE  | 2014      | 2019      | Change  | Percent Change\n",
       "999  | 318301008 | 328239523 | 9938515 | 3.12%"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "us_pop_change.where('AGE', are.equal_to(999))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us compare this to the change at each age. For ease of interpretation, we will sort the table in decreasing order of the absolute change in population, contained in the column `Change`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>AGE</th> <th>2014</th> <th>2019</th> <th>Change</th> <th>Percent Change</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>72  </td> <td>2191642</td> <td>3191048</td> <td>999406</td> <td>45.60%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>68  </td> <td>2567511</td> <td>3345475</td> <td>777964</td> <td>30.30%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>69  </td> <td>2530460</td> <td>3252423</td> <td>721963</td> <td>28.53%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>70  </td> <td>2461426</td> <td>3136704</td> <td>675278</td> <td>27.43%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>71  </td> <td>2516392</td> <td>3083083</td> <td>566691</td> <td>22.52%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>76  </td> <td>1692960</td> <td>2222392</td> <td>529432</td> <td>31.27%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>62  </td> <td>3677408</td> <td>4156645</td> <td>479237</td> <td>13.03%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>28  </td> <td>4345247</td> <td>4818725</td> <td>473478</td> <td>10.90%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>64  </td> <td>3481789</td> <td>3950578</td> <td>468789</td> <td>13.46%        </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>38  </td> <td>3848856</td> <td>4305576</td> <td>456720</td> <td>11.87%        </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>\n",
       "<p>... (91 rows omitted)</p>"
      ],
      "text/plain": [
       "AGE  | 2014    | 2019    | Change | Percent Change\n",
       "72   | 2191642 | 3191048 | 999406 | 45.60%\n",
       "68   | 2567511 | 3345475 | 777964 | 30.30%\n",
       "69   | 2530460 | 3252423 | 721963 | 28.53%\n",
       "70   | 2461426 | 3136704 | 675278 | 27.43%\n",
       "71   | 2516392 | 3083083 | 566691 | 22.52%\n",
       "76   | 1692960 | 2222392 | 529432 | 31.27%\n",
       "62   | 3677408 | 4156645 | 479237 | 13.03%\n",
       "28   | 4345247 | 4818725 | 473478 | 10.90%\n",
       "64   | 3481789 | 3950578 | 468789 | 13.46%\n",
       "38   | 3848856 | 4305576 | 456720 | 11.87%\n",
       "... (91 rows omitted)"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "us_pop_change.where(\n",
    "    'AGE', are.below(999)\n",
    ").sort('Change', descending=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a look at the top few rows. While the percent change is about 3% for the overall population, it jumps to well over 20% for the people in their late sixties and early seventies. This stunning change contributes to what is known as the greying of America.\n",
    "\n",
    "What could explain this large increase? We can explore this question by examining the years in which the relevant groups were born.\n",
    "\n",
    "- Those who were in the age group 69 to 72 in 2014 were born in the years 1942 to 1945. The attack on Pearl Harbor was in late 1941, and by 1942 U.S. forces were heavily engaged in a massive war that ended in 1945. \n",
    "\n",
    "- Those who were 69 to 72 years old in 2019 were born in the years 1947 to 1950, at the height of the post-WWII baby boom in the United States. \n",
    "\n",
    "The post-war jump in births is a major reason for the large changes that we have observed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}