{
"cells": [
{
"cell_type": "markdown",
"id": "d1fa173c",
"metadata": {},
"source": [
"# The basics "
]
},
{
"cell_type": "markdown",
"id": "5b8b91e6",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"Python gives powerful tools for data analysis - among the main ones we find [Polars](https://www.pola.rs/), which gives fast and flexible data structures, especially for interactive data analysis.\n",
"\n",
"Let's start off by importing it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cb38eed2",
"metadata": {},
"outputs": [],
"source": [
"import polars as pl"
]
},
{
"cell_type": "markdown",
"id": "aff29c1d",
"metadata": {},
"source": [
"## A dataframe?"
]
},
{
"cell_type": "markdown",
"id": "27fe412c",
"metadata": {},
"source": [
"A dataframe is basically a table, that is a collection of columns, also called `Series`, which all contain the same number of cells -- meaning they're all of the same height. Generally, each row can be thought of as corresponding to one unit of study (one measurement, one individual, one geographical area, you name it), while each column corresponds to a specific attribute, or variable (name, size, age, description, creation date...). \n",
"\n",
"\n",
"\n",
"In Polars, each column has a specific type (integer, string, date, but also list or dictionary!), similarly to NumPy arrays. And, [similarly to NumPy arrays](numpy_nan), they can take \"null\" values in certain rows, to represent an absence of data."
]
},
{
"cell_type": "markdown",
"id": "0803d07a",
"metadata": {},
"source": [
"But how to create one in practice? Let's start with a simple example. The concept of a DataFrame is materialised in Polars in the `DataFrame` class, which we can instantiate (create) simply as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b8957768",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
"shape: (0, 0)\n",
"┌┐\n",
"╞╡\n",
"└┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pl.DataFrame()"
]
},
{
"cell_type": "markdown",
"id": "e830c07b",
"metadata": {},
"source": [
"Not much to be seen here, because we didn't pass it any data!\n",
"\n",
"```{admonition} Question\n",
"Before running the cells below, try to guess what their output will look like. What are the column names? The values in each row?\n",
"```\n",
"\n",
"Let's add some data in the form of a dictionary of lists:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "55d99d1d",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (2, 2)name | age |
---|
str | i64 |
"Alice" | 78 |
"Bob" | null |
"
],
"text/plain": [
"shape: (2, 2)\n",
"┌───────┬──────┐\n",
"│ name ┆ age │\n",
"│ --- ┆ --- │\n",
"│ str ┆ i64 │\n",
"╞═══════╪══════╡\n",
"│ Alice ┆ 78 │\n",
"│ Bob ┆ null │\n",
"└───────┴──────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pl.DataFrame(\n",
" {\n",
" \"name\": [\"Alice\", \"Bob\"],\n",
" \"age\": [78, None],\n",
" },\n",
" orient=\"col\",\n",
")\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "2fad4e9e",
"metadata": {},
"source": [
"Or, equivalently, with a list of dictionaries:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e9b77425",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (2, 2)name | age |
---|
str | i64 |
"Alice" | 78 |
"Bob" | null |
"
],
"text/plain": [
"shape: (2, 2)\n",
"┌───────┬──────┐\n",
"│ name ┆ age │\n",
"│ --- ┆ --- │\n",
"│ str ┆ i64 │\n",
"╞═══════╪══════╡\n",
"│ Alice ┆ 78 │\n",
"│ Bob ┆ null │\n",
"└───────┴──────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = pl.DataFrame(\n",
" [\n",
" {\"name\": \"Alice\", \"age\": 78},\n",
" {\"name\": \"Bob\"},\n",
" ],\n",
" orient=\"row\",\n",
")\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "f5d2e5aa",
"metadata": {},
"source": [
"```{admonition} Question\n",
"In the second case, adding the `null` entry to the output enables Polars to comply with an important property of dataframes. Which one?\n",
"```\n",
"\n",
"```{admonition} Answer\n",
":class: tip dropdown\n",
"\n",
"That columns -- or, equivalently, the rows -- should all have the same number of cells!\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "d51fd90f",
"metadata": {},
"source": [
"## Ingesting data\n",
"\n",
"\n",
"In most cases, you actually want to deal with an existing dataset, that you'd like to analyse.\n",
"For this course, let's try analyzing data recording protests from around the globe, [collected as part of the Mass Mobilization project](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/HTTWYL). No need to download anything, the data are already in the repository in the form of a CSV file. So let's read our data, using `pl.read_csv`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e60b7bf0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (17_145, 30)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str |
201990001 | "Canada" | 20 | 1990 | "North America" | 1 | 1 | 15 | 1 | 1990 | 15 | 1 | 1990 | 0 | "national" | null | "1000s" | "unspecified" | "political behavior, process" | "labor wage dispute" | null | null | "ignore" | null | null | null | null | null | null | "1. great canadian train journe… |
201990002 | "Canada" | 20 | 1990 | "North America" | 1 | 2 | 25 | 6 | 1990 | 25 | 6 | 1990 | 0 | "Montreal, Quebec" | null | "1000" | "unspecified" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. autonomy s cry revived in q… |
201990003 | "Canada" | 20 | 1990 | "North America" | 1 | 3 | 1 | 7 | 1990 | 1 | 7 | 1990 | 0 | "Montreal, Quebec" | null | "500" | "separatist parti quebecois" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. quebec protest after queen … |
201990004 | "Canada" | 20 | 1990 | "North America" | 1 | 4 | 12 | 7 | 1990 | 6 | 9 | 1990 | 1 | "Montreal, Quebec" | null | "100s" | "mohawk indians" | "land farm issue" | null | null | null | "accomodation" | null | null | null | null | null | null | "1. indians gather as siege int… |
201990005 | "Canada" | 20 | 1990 | "North America" | 1 | 5 | 14 | 8 | 1990 | 15 | 8 | 1990 | 1 | "Montreal, Quebec" | null | "950" | "local residents" | "political behavior, process" | null | null | null | "crowd dispersal" | "arrests" | "accomodation" | null | null | null | null | "1. dozens hurt in mohawk block… |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
9102017002 | "Papua New Guinea" | 910 | 2017 | "Oceania" | 1 | 2 | 15 | 7 | 2017 | 15 | 7 | 2017 | 1 | "Mount Hagen" | "50-99" | "50+" | "protesters opposed to counting… | "political behavior, process" | null | null | null | "crowd dispersal" | null | null | null | null | null | null | "violence, chaos and fraud: fra… |
9102017003 | "Papua New Guinea" | 910 | 2017 | "Oceania" | 1 | 3 | 31 | 10 | 2017 | 31 | 10 | 2017 | 0 | "Lorengau" | "100-999" | "About 100" | "locals" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "refugees dig in as camp closes… |
9102018000 | "Papua New Guinea" | 910 | 2018 | "Oceania" | 0 | 0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
9102019000 | "Papua New Guinea" | 910 | 2019 | "Oceania" | 0 | 0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | "." | null | null | null | null | "." | null | null | null |
9102020000 | "Papua New Guinea" | 910 | 2020 | "Oceania" | 0 | 0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
"
],
"text/plain": [
"shape: (17_145, 30)\n",
"┌────────────┬─────────────┬───────┬──────┬───┬─────────────┬────────────┬────────────┬────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ staterespon ┆ staterespo ┆ staterespo ┆ sources │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ se5 ┆ nse6 ┆ nse7 ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ --- ┆ --- ┆ --- ┆ str │\n",
"│ ┆ ┆ ┆ ┆ ┆ str ┆ str ┆ str ┆ │\n",
"╞════════════╪═════════════╪═══════╪══════╪═══╪═════════════╪════════════╪════════════╪════════════╡\n",
"│ 201990001 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. great │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ canadian │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ train │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ journe… │\n",
"│ 201990002 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ autonomy s │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ cry │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ revived in │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ q… │\n",
"│ 201990003 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. quebec │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ protest │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ after │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ queen … │\n",
"│ 201990004 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. indians │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ gather as │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ siege int… │\n",
"│ 201990005 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. dozens │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ hurt in │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ mohawk │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ block… │\n",
"│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │\n",
"│ 9102017002 ┆ Papua New ┆ 910 ┆ 2017 ┆ … ┆ null ┆ null ┆ null ┆ violence, │\n",
"│ ┆ Guinea ┆ ┆ ┆ ┆ ┆ ┆ ┆ chaos and │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ fraud: │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ fra… │\n",
"│ 9102017003 ┆ Papua New ┆ 910 ┆ 2017 ┆ … ┆ null ┆ null ┆ null ┆ refugees │\n",
"│ ┆ Guinea ┆ ┆ ┆ ┆ ┆ ┆ ┆ dig in as │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ camp │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ closes… │\n",
"│ 9102018000 ┆ Papua New ┆ 910 ┆ 2018 ┆ … ┆ null ┆ null ┆ null ┆ null │\n",
"│ ┆ Guinea ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n",
"│ 9102019000 ┆ Papua New ┆ 910 ┆ 2019 ┆ … ┆ . ┆ null ┆ null ┆ null │\n",
"│ ┆ Guinea ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n",
"│ 9102020000 ┆ Papua New ┆ 910 ┆ 2020 ┆ … ┆ null ┆ null ┆ null ┆ null │\n",
"│ ┆ Guinea ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n",
"└────────────┴─────────────┴───────┴──────┴───┴─────────────┴────────────┴────────────┴────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests = pl.read_csv(\"data/protests.csv\")\n",
"protests"
]
},
{
"cell_type": "markdown",
"id": "345df9c6",
"metadata": {},
"source": [
"Here it is!\n",
"\n",
"Note that here, we dealt with a simple, well-formatted file. In the wild (aka, \"the Internet\"), you will not always meet such well-behaved data. In that case you'll probably need to specify one of the (many) arguments of `read_csv`. Inspect the function to take a quick look! "
]
},
{
"cell_type": "markdown",
"id": "caef30c9",
"metadata": {},
"source": [
"Not all data are in CSV format (and they shouldn't, it's a very slow-to-read and heavy file format!), so Polars provides a number of other functions to ingest data. They are all in the form `pl.read_(path)`. Symmetrically, to write from a `DataFrame` to a file, the equivalent `.write_(path)` are available.\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "51889c0f",
"metadata": {},
"source": [
"```{tip}\n",
"In case of doubt, choose the `.parquet` format to save your dataframes. It strikes a very good balance between file size and reading / writing time.\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "53a5d9a9",
"metadata": {},
"source": [
"## Taking a glimpse"
]
},
{
"cell_type": "markdown",
"id": "6ec4eab9",
"metadata": {},
"source": [
"The one DataFrame method you will probably use the most is `.head()`. It allows you to see the first 5 rows of the DataFrame, which is very useful to have a quick look at its current state:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "25568534",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 30)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str |
201990001 | "Canada" | 20 | 1990 | "North America" | 1 | 1 | 15 | 1 | 1990 | 15 | 1 | 1990 | 0 | "national" | null | "1000s" | "unspecified" | "political behavior, process" | "labor wage dispute" | null | null | "ignore" | null | null | null | null | null | null | "1. great canadian train journe… |
201990002 | "Canada" | 20 | 1990 | "North America" | 1 | 2 | 25 | 6 | 1990 | 25 | 6 | 1990 | 0 | "Montreal, Quebec" | null | "1000" | "unspecified" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. autonomy s cry revived in q… |
201990003 | "Canada" | 20 | 1990 | "North America" | 1 | 3 | 1 | 7 | 1990 | 1 | 7 | 1990 | 0 | "Montreal, Quebec" | null | "500" | "separatist parti quebecois" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. quebec protest after queen … |
201990004 | "Canada" | 20 | 1990 | "North America" | 1 | 4 | 12 | 7 | 1990 | 6 | 9 | 1990 | 1 | "Montreal, Quebec" | null | "100s" | "mohawk indians" | "land farm issue" | null | null | null | "accomodation" | null | null | null | null | null | null | "1. indians gather as siege int… |
201990005 | "Canada" | 20 | 1990 | "North America" | 1 | 5 | 14 | 8 | 1990 | 15 | 8 | 1990 | 1 | "Montreal, Quebec" | null | "950" | "local residents" | "political behavior, process" | null | null | null | "crowd dispersal" | "arrests" | "accomodation" | null | null | null | null | "1. dozens hurt in mohawk block… |
"
],
"text/plain": [
"shape: (5, 30)\n",
"┌───────────┬─────────┬───────┬──────┬───┬──────────────┬──────────────┬─────────────┬─────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ staterespons ┆ staterespons ┆ staterespon ┆ sources │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ e5 ┆ e6 ┆ se7 ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ --- ┆ --- ┆ --- ┆ str │\n",
"│ ┆ ┆ ┆ ┆ ┆ str ┆ str ┆ str ┆ │\n",
"╞═══════════╪═════════╪═══════╪══════╪═══╪══════════════╪══════════════╪═════════════╪═════════════╡\n",
"│ 201990001 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. great │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ canadian │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ train │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ journe… │\n",
"│ 201990002 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. autonomy │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ s cry │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ revived in │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ q… │\n",
"│ 201990003 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. quebec │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ protest │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ after queen │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ … │\n",
"│ 201990004 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. indians │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ gather as │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ siege int… │\n",
"│ 201990005 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. dozens │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ hurt in │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ mohawk │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ block… │\n",
"└───────────┴─────────┴───────┴──────┴───┴──────────────┴──────────────┴─────────────┴─────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.head()"
]
},
{
"cell_type": "markdown",
"id": "0e59c7de",
"metadata": {},
"source": [
"```{note}\n",
"In the rest of the course, whenever we wish to show the result of some computation, we'll append `.head()` to these computations to limit the size of the output. Remember to remove it if you wish to get the full result!\n",
"```\n",
"\n",
"Also, you can get the list of column names using the `.columns` attribute:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e82ead0",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['id',\n",
" 'country',\n",
" 'ccode',\n",
" 'year',\n",
" 'region',\n",
" 'protest',\n",
" 'protestnumber',\n",
" 'startday',\n",
" 'startmonth',\n",
" 'startyear',\n",
" 'endday',\n",
" 'endmonth',\n",
" 'endyear',\n",
" 'protesterviolence',\n",
" 'location',\n",
" 'participants_category',\n",
" 'participants',\n",
" 'protesteridentity',\n",
" 'protesterdemand1',\n",
" 'protesterdemand2',\n",
" 'protesterdemand3',\n",
" 'protesterdemand4',\n",
" 'stateresponse1',\n",
" 'stateresponse2',\n",
" 'stateresponse3',\n",
" 'stateresponse4',\n",
" 'stateresponse5',\n",
" 'stateresponse6',\n",
" 'stateresponse7',\n",
" 'sources']"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.columns"
]
},
{
"cell_type": "markdown",
"id": "845863fc",
"metadata": {},
"source": [
"And the shape of the frame (number of rows and columns) using `.shape`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4d7b09d5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(17145, 30)"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.shape"
]
},
{
"cell_type": "markdown",
"id": "42ee1cec",
"metadata": {},
"source": [
"Finally, you can take a quick glimpse at it, featuring the two pieces of info we got above, plus the data types and a few values, with `.glimpse()`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "39cbd29e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Rows: 17145\n",
"Columns: 30\n",
"$ id 201990001, 201990002, 201990003, 201990004, 201990005, 201990006, 201991001, 201991002, 201992001, 201993001\n",
"$ country 'Canada', 'Canada', 'Canada', 'Canada', 'Canada', 'Canada', 'Canada', 'Canada', 'Canada', 'Canada'\n",
"$ ccode 20, 20, 20, 20, 20, 20, 20, 20, 20, 20\n",
"$ year 1990, 1990, 1990, 1990, 1990, 1990, 1991, 1991, 1992, 1993\n",
"$ region 'North America', 'North America', 'North America', 'North America', 'North America', 'North America', 'North America', 'North America', 'North America', 'North America'\n",
"$ protest 1, 1, 1, 1, 1, 1, 1, 1, 1, 1\n",
"$ protestnumber 1, 2, 3, 4, 5, 6, 1, 2, 1, 1\n",
"$ startday 15, 25, 1, 12, 14, 19, 10, 28, 4, 16\n",
"$ startmonth 1, 6, 7, 7, 8, 9, 9, 9, 5, 5\n",
"$ startyear 1990, 1990, 1990, 1990, 1990, 1990, 1991, 1991, 1992, 1993\n",
"$ endday 15, 25, 1, 6, 15, 19, 17, 2, 5, 16\n",
"$ endmonth 1, 6, 7, 9, 8, 9, 9, 10, 5, 5\n",
"$ endyear 1990, 1990, 1990, 1990, 1990, 1990, 1991, 1991, 1992, 1993\n",
"$ protesterviolence 0, 0, 0, 1, 1, 0, 0, 0, 1, 0\n",
"$ location 'national', 'Montreal, Quebec', 'Montreal, Quebec', 'Montreal, Quebec', 'Montreal, Quebec', 'Kahnawake Reservation near Montreal, Quebec', 'national', 'national', 'Toronto', 'Ottawa'\n",
"$ participants_category None, None, None, None, None, None, None, None, None, None\n",
"$ participants '1000s', '1000', '500', '100s', '950', '200', '110000', '110000', '1000', '10000s'\n",
"$ protesteridentity 'unspecified', 'unspecified', 'separatist parti quebecois', 'mohawk indians', 'local residents', 'mohawk indians', 'public service alliance of canada', 'public service alliance of canada', 'gangs of black and white youths', 'unspecified'\n",
"$ protesterdemand1 'political behavior, process', 'political behavior, process', 'political behavior, process', 'land farm issue', 'political behavior, process', 'police brutality', 'labor wage dispute', 'labor wage dispute', 'police brutality', 'political behavior, process'\n",
"$ protesterdemand2 'labor wage dispute', None, None, None, None, None, None, None, None, None\n",
"$ protesterdemand3 None, None, None, None, None, None, None, None, None, None\n",
"$ protesterdemand4 None, None, None, None, None, None, None, None, None, None\n",
"$ stateresponse1 'ignore', 'ignore', 'ignore', 'accomodation', 'crowd dispersal', 'crowd dispersal', 'ignore', 'ignore', 'arrests', 'ignore'\n",
"$ stateresponse2 None, None, None, None, 'arrests', 'shootings', None, None, None, None\n",
"$ stateresponse3 None, None, None, None, 'accomodation', None, None, None, None, None\n",
"$ stateresponse4 None, None, None, None, None, None, None, None, None, None\n",
"$ stateresponse5 None, None, None, None, None, None, None, None, None, None\n",
"$ stateresponse6 None, None, None, None, None, None, None, None, None, None\n",
"$ stateresponse7 None, None, None, None, None, None, None, None, None, None\n",
"$ sources '1. great canadian train journeys into history; passenger rail system cut the times london , january 16 1990, tuesday, issue 63604., 427 words, from john best, ottawa;', '1. autonomy s cry revived in quebec the new york times, june 25, 1990, monday, late edition final, section a; page 3, column 4; foreign desk, 1014 words, by alessandra stanley, special to the new york times;', '1. quebec protest after queen calls for unity the times, july 2, 1990, monday, overseas news, 703 words, from john best in ottawa 2. in canada, royal rejection ; quebecois protesters turn out for queen elizabeth s visit the washington post, july 2, 1990, monday, final edition, style; page b1, 633 words, william claiborne, washington post foreign service;', '1. indians gather as siege intensifies; armed confrontation in canada reflects growing militancy the washington post, july 19, 1990, thursday, final edition, first section; page a25, 1330 words, william claiborne, washington post foreign service 2. canada to acquire site of mohawk land dispute the washington post, july 20, 1990, friday, final edition, first section; page a13, 824 words, william claiborne, washington post foreign service 3. mohawks, army still at standoff; bridge to montreal reopens to traffic the washington post, september 7, 1990, friday, final edition, first section; page a17, 1199 words, william claiborne, washington post foreign service;', '1. dozens hurt in mohawk blockade protest the times, august 14, 1990, tuesday, overseas news, 254 words, from john best in ottawa 2. canadians clash with police near a mohawk reservation the new york times, august 15, 1990, wednesday, late edition final, section a; page 16, column 6; foreign desk, 165 words, reuters;', '1. mohawks tear gassed in clash the washington post, september 19, 1990, wednesday, final edition, first section; page a17, 172 words, washington post foreign service;', '1. mulroney acts to crush strike by federal workers the new york times, september 17, 1991, tuesday, late edition final, section a; page 6; column 4; foreign desk, 761 words, by clyde h. farnsworth, special to the new york times 2. canadian strike to end but talks to go on the new york times, september 18, 1991, wednesday, late edition final, section a; page 9; column 2; foreign desk, 188 words, ap;', '1. the washington post, canada seeks to compel end of nationwide federal strike, william claiborne, washington post foreign service, october 1, 1991, tuesday, final edition, first section; page a15, 342 words;', '1. window smashing and looting spree erupts in toronto the new york times, may 5, 1992, tuesday, late edition final, section a; page 3; column 1; foreign desk 2. toronto rioters pelt police the times, may 6, 1992, wednesday, overseas news, 243 words, from john best in ottawa 3. the washington post, toronto debates causes of 2 days of violence; normally peaceful city shaken by gangs, william claiborne, washington post foreign service, may 7, 1992, thursday, final edition, first section; page a27, 685 word;', '1. canadians protest trade pact the new york times, may 16, 1993, sunday, late edition final, section 1; page 7; column 1; foreign desk, 50 words, reuters;'\n",
"\n"
]
}
],
"source": [
"protests.glimpse()"
]
},
{
"cell_type": "markdown",
"id": "8c473ae7",
"metadata": {},
"source": [
"## Operations on columns\n",
"\n",
"Let's see now how to select, add and transform columns.\n",
"\n",
"### Selecting columns\n",
"\n",
"If we want a subset of columns, we pass the name of each column to `.select()`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1b753800",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 3)id | country | year |
---|
i64 | str | i64 |
201990001 | "Canada" | 1990 |
201990002 | "Canada" | 1990 |
201990003 | "Canada" | 1990 |
201990004 | "Canada" | 1990 |
201990005 | "Canada" | 1990 |
"
],
"text/plain": [
"shape: (5, 3)\n",
"┌───────────┬─────────┬──────┐\n",
"│ id ┆ country ┆ year │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 │\n",
"╞═══════════╪═════════╪══════╡\n",
"│ 201990001 ┆ Canada ┆ 1990 │\n",
"│ 201990002 ┆ Canada ┆ 1990 │\n",
"│ 201990003 ┆ Canada ┆ 1990 │\n",
"│ 201990004 ┆ Canada ┆ 1990 │\n",
"│ 201990005 ┆ Canada ┆ 1990 │\n",
"└───────────┴─────────┴──────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.select('id', 'country', 'year').head() "
]
},
{
"cell_type": "markdown",
"id": "88b6a64d",
"metadata": {},
"source": [
"```{note}\n",
"This does not change the original DataFrame, to save your selection, you need to assign to a new variable!\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "2119d9c6",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "4e99652c",
"metadata": {},
"source": [
"### Selecting and modifying"
]
},
{
"cell_type": "markdown",
"id": "396ef547",
"metadata": {},
"source": [
"Selecting columns is nice, but once they are selected, we will certainly want to do something with them, right? Like, how can we modify the values in them?\n",
"\n",
"But before going into that, let's see a slightly different way of selecting columns. The previous code is actually equivalent to:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "da3d2ab6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 3)id | country | year |
---|
i64 | str | i64 |
201990001 | "Canada" | 1990 |
201990002 | "Canada" | 1990 |
201990003 | "Canada" | 1990 |
201990004 | "Canada" | 1990 |
201990005 | "Canada" | 1990 |
"
],
"text/plain": [
"shape: (5, 3)\n",
"┌───────────┬─────────┬──────┐\n",
"│ id ┆ country ┆ year │\n",
"│ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 │\n",
"╞═══════════╪═════════╪══════╡\n",
"│ 201990001 ┆ Canada ┆ 1990 │\n",
"│ 201990002 ┆ Canada ┆ 1990 │\n",
"│ 201990003 ┆ Canada ┆ 1990 │\n",
"│ 201990004 ┆ Canada ┆ 1990 │\n",
"│ 201990005 ┆ Canada ┆ 1990 │\n",
"└───────────┴─────────┴──────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.select(pl.col(\"id\", \"country\", \"year\")).head()"
]
},
{
"cell_type": "markdown",
"id": "c946a10f",
"metadata": {},
"source": [
"But why bother calling `pl.col()` here, and what does it even do? To answer these questions, for now let's go back to our data and the glimpse we had above. See the `protesterviolence` column, that, as far as we can see, is only filled with zeros and ones? Would be nice to check if it's actually the case, because if it is, each value in this column can probably be treated as a boolean telling us whether part of the protesters used violence.\n",
"\n",
"Well this can simply be done by selecting the column with `pl.col()`, and then calling the `.unique()` method on it, as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c7774781",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (3, 1)protesterviolence |
---|
i64 |
null |
0 |
1 |
"
],
"text/plain": [
"shape: (3, 1)\n",
"┌───────────────────┐\n",
"│ protesterviolence │\n",
"│ --- │\n",
"│ i64 │\n",
"╞═══════════════════╡\n",
"│ null │\n",
"│ 0 │\n",
"│ 1 │\n",
"└───────────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.select(pl.col(\"protesterviolence\").unique())"
]
},
{
"cell_type": "markdown",
"id": "e6244daf",
"metadata": {},
"source": [
"Calling `pl.col()` therefore enabled us to perform an operation on the selected column! And about the latter, it indeed turns out we only have zeros and ones in there, save some nulls that denote an absence of data."
]
},
{
"cell_type": "markdown",
"id": "2837963c",
"metadata": {},
"source": [
"```{note}\n",
"Putting zeros and ones to encode a boolean is actually a pretty common practice, you might come across this again!\n",
"```\n",
"\n",
"```{note}\n",
"It is usually better to start reading any documentation you can find about the dataset before making tests like these. In some situations though, there is no documentation, so you _have to_ make this kind of tests to validate your assumptions.\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "212c89ed",
"metadata": {},
"source": [
"We can subsequently safely transform this integer column into a boolean one, using the `.cast()` method, and (optionally) renaming it using `.alias()`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a782b744",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 1)There was protester violence |
---|
bool |
false |
false |
false |
true |
true |
"
],
"text/plain": [
"shape: (5, 1)\n",
"┌──────────────────────────────┐\n",
"│ There was protester violence │\n",
"│ --- │\n",
"│ bool │\n",
"╞══════════════════════════════╡\n",
"│ false │\n",
"│ false │\n",
"│ false │\n",
"│ true │\n",
"│ true │\n",
"└──────────────────────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.select(pl.col(\"protesterviolence\").cast(bool).alias(\"There was protester violence\")).head()"
]
},
{
"cell_type": "markdown",
"id": "28c83a88",
"metadata": {},
"source": [
"````{admonition} Question\n",
"Why does the following code have absolutely no chance of producing the desired behaviour?\n",
"\n",
"```python\n",
"protests.select(\"protesterviolence\".cast(bool))\n",
"```\n",
"````"
]
},
{
"cell_type": "markdown",
"id": "7bd513f7",
"metadata": {
"tags": [
"answer",
"hide-input"
]
},
"source": [
"```{admonition} Answer\n",
":class: tip dropdown\n",
"\n",
"Because `\"protesterviolence\"` is a string, and the `cast` method is not defined on strings. And even if it was, this string object does not hold the data inside the column, it is just a string, so no string method could ever transform the data inside the column! That's why this `pl.col()` abstraction exists: its methods allows us to access column data from the DataFrame under consideration.\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "f4751ef7",
"metadata": {},
"source": [
"So, to wrap this up, the `pl.col()` function of Polars is absolutely central: it creates an [`Expression`](https://docs.pola.rs/api/python/stable/reference/expressions/index.html), allowing you to call one of its many methods to perform computations on the selected column.\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "fa938218",
"metadata": {},
"source": [
"### Adding columns\n",
"\n",
"To keep existing columns and add new ones, the `.with_columns()` method will come in handy. \n",
"\n",
"Let's first see how to add a new column using data from outside the dataframe:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c424191a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 32)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | from_scalar | from_series |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | i64 |
201990001 | "Canada" | 20 | 1990 | "North America" | 1 | 1 | 15 | 1 | 1990 | 15 | 1 | 1990 | 0 | "national" | null | "1000s" | "unspecified" | "political behavior, process" | "labor wage dispute" | null | null | "ignore" | null | null | null | null | null | null | "1. great canadian train journe… | "a" | 0 |
201990002 | "Canada" | 20 | 1990 | "North America" | 1 | 2 | 25 | 6 | 1990 | 25 | 6 | 1990 | 0 | "Montreal, Quebec" | null | "1000" | "unspecified" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. autonomy s cry revived in q… | "a" | 1 |
201990003 | "Canada" | 20 | 1990 | "North America" | 1 | 3 | 1 | 7 | 1990 | 1 | 7 | 1990 | 0 | "Montreal, Quebec" | null | "500" | "separatist parti quebecois" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. quebec protest after queen … | "a" | 2 |
201990004 | "Canada" | 20 | 1990 | "North America" | 1 | 4 | 12 | 7 | 1990 | 6 | 9 | 1990 | 1 | "Montreal, Quebec" | null | "100s" | "mohawk indians" | "land farm issue" | null | null | null | "accomodation" | null | null | null | null | null | null | "1. indians gather as siege int… | "a" | 3 |
201990005 | "Canada" | 20 | 1990 | "North America" | 1 | 5 | 14 | 8 | 1990 | 15 | 8 | 1990 | 1 | "Montreal, Quebec" | null | "950" | "local residents" | "political behavior, process" | null | null | null | "crowd dispersal" | "arrests" | "accomodation" | null | null | null | null | "1. dozens hurt in mohawk block… | "a" | 4 |
"
],
"text/plain": [
"shape: (5, 32)\n",
"┌───────────┬─────────┬───────┬──────┬───┬──────────────┬──────────────┬─────────────┬─────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ staterespons ┆ sources ┆ from_scalar ┆ from_series │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ e7 ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ --- ┆ str ┆ str ┆ i64 │\n",
"│ ┆ ┆ ┆ ┆ ┆ str ┆ ┆ ┆ │\n",
"╞═══════════╪═════════╪═══════╪══════╪═══╪══════════════╪══════════════╪═════════════╪═════════════╡\n",
"│ 201990001 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ 1. great ┆ a ┆ 0 │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ canadian ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ train ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ journe… ┆ ┆ │\n",
"│ 201990002 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ 1. autonomy ┆ a ┆ 1 │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ s cry ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ revived in ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ q… ┆ ┆ │\n",
"│ 201990003 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ 1. quebec ┆ a ┆ 2 │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ protest ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ after queen ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ … ┆ ┆ │\n",
"│ 201990004 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ 1. indians ┆ a ┆ 3 │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ gather as ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ siege int… ┆ ┆ │\n",
"│ 201990005 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ 1. dozens ┆ a ┆ 4 │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ hurt in ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ mohawk ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ block… ┆ ┆ │\n",
"└───────────┴─────────┴───────┴──────┴───┴──────────────┴──────────────┴─────────────┴─────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.with_columns(\n",
" from_scalar=pl.lit(\"a\"),\n",
" from_series=pl.Series(list(range(protests.shape[0]))),\n",
").head()"
]
},
{
"cell_type": "markdown",
"id": "b6b33d81",
"metadata": {},
"source": [
"We've covered two cases here:\n",
"- when you'd like to have a column filled with the same value, you will need to specify you're providing the literal value you want in each row with `pl.lit()`,\n",
"- when you want to provide directly all the distinct values to be put in rows, you will need `pl.Series()`."
]
},
{
"cell_type": "markdown",
"id": "b9bc5dec",
"metadata": {},
"source": [
"Now, let's see how to add columns using the values from existing ones. We'll try to get the duration in days of all the protests in our dataset. To that end, let's first find out what columns could help us here:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "372ee15f",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 30)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str |
201990001 | "Canada" | 20 | 1990 | "North America" | 1 | 1 | 15 | 1 | 1990 | 15 | 1 | 1990 | 0 | "national" | null | "1000s" | "unspecified" | "political behavior, process" | "labor wage dispute" | null | null | "ignore" | null | null | null | null | null | null | "1. great canadian train journe… |
201990002 | "Canada" | 20 | 1990 | "North America" | 1 | 2 | 25 | 6 | 1990 | 25 | 6 | 1990 | 0 | "Montreal, Quebec" | null | "1000" | "unspecified" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. autonomy s cry revived in q… |
201990003 | "Canada" | 20 | 1990 | "North America" | 1 | 3 | 1 | 7 | 1990 | 1 | 7 | 1990 | 0 | "Montreal, Quebec" | null | "500" | "separatist parti quebecois" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. quebec protest after queen … |
201990004 | "Canada" | 20 | 1990 | "North America" | 1 | 4 | 12 | 7 | 1990 | 6 | 9 | 1990 | 1 | "Montreal, Quebec" | null | "100s" | "mohawk indians" | "land farm issue" | null | null | null | "accomodation" | null | null | null | null | null | null | "1. indians gather as siege int… |
201990005 | "Canada" | 20 | 1990 | "North America" | 1 | 5 | 14 | 8 | 1990 | 15 | 8 | 1990 | 1 | "Montreal, Quebec" | null | "950" | "local residents" | "political behavior, process" | null | null | null | "crowd dispersal" | "arrests" | "accomodation" | null | null | null | null | "1. dozens hurt in mohawk block… |
"
],
"text/plain": [
"shape: (5, 30)\n",
"┌───────────┬─────────┬───────┬──────┬───┬──────────────┬──────────────┬─────────────┬─────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ staterespons ┆ staterespons ┆ staterespon ┆ sources │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ e5 ┆ e6 ┆ se7 ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ --- ┆ --- ┆ --- ┆ str │\n",
"│ ┆ ┆ ┆ ┆ ┆ str ┆ str ┆ str ┆ │\n",
"╞═══════════╪═════════╪═══════╪══════╪═══╪══════════════╪══════════════╪═════════════╪═════════════╡\n",
"│ 201990001 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. great │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ canadian │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ train │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ journe… │\n",
"│ 201990002 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. autonomy │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ s cry │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ revived in │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ q… │\n",
"│ 201990003 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. quebec │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ protest │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ after queen │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ … │\n",
"│ 201990004 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. indians │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ gather as │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ siege int… │\n",
"│ 201990005 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ null ┆ null ┆ 1. dozens │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ hurt in │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ mohawk │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ block… │\n",
"└───────────┴─────────┴───────┴──────┴───┴──────────────┴──────────────┴─────────────┴─────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.head()"
]
},
{
"cell_type": "markdown",
"id": "36e32f10",
"metadata": {},
"source": [
"It does seem that the start and end day of each protest are each encoded in 3 columns of integers, respectively: `[\"startyear\", \"startmonth\", \"startday\"]`, and `[\"endyear\", \"endmonth\", \"endday\"]`. This makes it very annoying to compute the duration of protests, because, for instance, if a protest spans more than a month, we can't just subtract `\"endday\"` with `\"startday\"`, which are day numbers within their month. So let's first compute two new columns for the start and end date, with an actual `date` data type, which will save us a headache. Here we'll use the `pl.date()` function:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "fb835762",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 32)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date |
201990001 | "Canada" | 20 | 1990 | "North America" | 1 | 1 | 15 | 1 | 1990 | 15 | 1 | 1990 | 0 | "national" | null | "1000s" | "unspecified" | "political behavior, process" | "labor wage dispute" | null | null | "ignore" | null | null | null | null | null | null | "1. great canadian train journe… | 1990-01-15 | 1990-01-15 |
201990002 | "Canada" | 20 | 1990 | "North America" | 1 | 2 | 25 | 6 | 1990 | 25 | 6 | 1990 | 0 | "Montreal, Quebec" | null | "1000" | "unspecified" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. autonomy s cry revived in q… | 1990-06-25 | 1990-06-25 |
201990003 | "Canada" | 20 | 1990 | "North America" | 1 | 3 | 1 | 7 | 1990 | 1 | 7 | 1990 | 0 | "Montreal, Quebec" | null | "500" | "separatist parti quebecois" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. quebec protest after queen … | 1990-07-01 | 1990-07-01 |
201990004 | "Canada" | 20 | 1990 | "North America" | 1 | 4 | 12 | 7 | 1990 | 6 | 9 | 1990 | 1 | "Montreal, Quebec" | null | "100s" | "mohawk indians" | "land farm issue" | null | null | null | "accomodation" | null | null | null | null | null | null | "1. indians gather as siege int… | 1990-07-12 | 1990-09-06 |
201990005 | "Canada" | 20 | 1990 | "North America" | 1 | 5 | 14 | 8 | 1990 | 15 | 8 | 1990 | 1 | "Montreal, Quebec" | null | "950" | "local residents" | "political behavior, process" | null | null | null | "crowd dispersal" | "arrests" | "accomodation" | null | null | null | null | "1. dozens hurt in mohawk block… | 1990-08-14 | 1990-08-15 |
"
],
"text/plain": [
"shape: (5, 32)\n",
"┌───────────┬─────────┬───────┬──────┬───┬───────────────┬───────────────┬────────────┬────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ stateresponse ┆ sources ┆ start_date ┆ end_date │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ 7 ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ --- ┆ str ┆ date ┆ date │\n",
"│ ┆ ┆ ┆ ┆ ┆ str ┆ ┆ ┆ │\n",
"╞═══════════╪═════════╪═══════╪══════╪═══╪═══════════════╪═══════════════╪════════════╪════════════╡\n",
"│ 201990001 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ 1. great ┆ 1990-01-15 ┆ 1990-01-15 │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ canadian ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ train journe… ┆ ┆ │\n",
"│ 201990002 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ 1. autonomy s ┆ 1990-06-25 ┆ 1990-06-25 │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ cry revived ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ in q… ┆ ┆ │\n",
"│ 201990003 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ 1. quebec ┆ 1990-07-01 ┆ 1990-07-01 │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ protest after ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ queen … ┆ ┆ │\n",
"│ 201990004 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ 1. indians ┆ 1990-07-12 ┆ 1990-09-06 │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ gather as ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ siege int… ┆ ┆ │\n",
"│ 201990005 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ null ┆ 1. dozens ┆ 1990-08-14 ┆ 1990-08-15 │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ hurt in ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ mohawk block… ┆ ┆ │\n",
"└───────────┴─────────┴───────┴──────┴───┴───────────────┴───────────────┴────────────┴────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests = protests.with_columns(\n",
" pl.date(\"startyear\", \"startmonth\", \"startday\").alias(\"start_date\"),\n",
" pl.date(\"endyear\", \"endmonth\", \"endday\").alias(\"end_date\"),\n",
")\n",
"protests.head()"
]
},
{
"cell_type": "markdown",
"id": "24e32bd5",
"metadata": {},
"source": [
"And now we just have to subtract `\"start_date\"` from `\"end_date\"` to get our duration! There is one small problem though. As you can see from the result above, when a protest occurred over one day, the `start_date` and `end_date` are identical. There is a simple solution to that: we can add a duration of one day to every entry by adding `pl.duration(days=1)`. That works exactly like adding a scalar to a NumPy array: the value is added to each element. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "136d41ca",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 33)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date | duration |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date | duration[ms] |
201990001 | "Canada" | 20 | 1990 | "North America" | 1 | 1 | 15 | 1 | 1990 | 15 | 1 | 1990 | 0 | "national" | null | "1000s" | "unspecified" | "political behavior, process" | "labor wage dispute" | null | null | "ignore" | null | null | null | null | null | null | "1. great canadian train journe… | 1990-01-15 | 1990-01-15 | 1d |
201990002 | "Canada" | 20 | 1990 | "North America" | 1 | 2 | 25 | 6 | 1990 | 25 | 6 | 1990 | 0 | "Montreal, Quebec" | null | "1000" | "unspecified" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. autonomy s cry revived in q… | 1990-06-25 | 1990-06-25 | 1d |
201990003 | "Canada" | 20 | 1990 | "North America" | 1 | 3 | 1 | 7 | 1990 | 1 | 7 | 1990 | 0 | "Montreal, Quebec" | null | "500" | "separatist parti quebecois" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. quebec protest after queen … | 1990-07-01 | 1990-07-01 | 1d |
201990004 | "Canada" | 20 | 1990 | "North America" | 1 | 4 | 12 | 7 | 1990 | 6 | 9 | 1990 | 1 | "Montreal, Quebec" | null | "100s" | "mohawk indians" | "land farm issue" | null | null | null | "accomodation" | null | null | null | null | null | null | "1. indians gather as siege int… | 1990-07-12 | 1990-09-06 | 57d |
201990005 | "Canada" | 20 | 1990 | "North America" | 1 | 5 | 14 | 8 | 1990 | 15 | 8 | 1990 | 1 | "Montreal, Quebec" | null | "950" | "local residents" | "political behavior, process" | null | null | null | "crowd dispersal" | "arrests" | "accomodation" | null | null | null | null | "1. dozens hurt in mohawk block… | 1990-08-14 | 1990-08-15 | 2d |
"
],
"text/plain": [
"shape: (5, 33)\n",
"┌───────────┬─────────┬───────┬──────┬───┬────────────────┬────────────┬────────────┬──────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ sources ┆ start_date ┆ end_date ┆ duration │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ date ┆ date ┆ duration[ms] │\n",
"╞═══════════╪═════════╪═══════╪══════╪═══╪════════════════╪════════════╪════════════╪══════════════╡\n",
"│ 201990001 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1. great ┆ 1990-01-15 ┆ 1990-01-15 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ canadian train ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ journe… ┆ ┆ ┆ │\n",
"│ 201990002 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1. autonomy s ┆ 1990-06-25 ┆ 1990-06-25 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ cry revived in ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ q… ┆ ┆ ┆ │\n",
"│ 201990003 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1. quebec ┆ 1990-07-01 ┆ 1990-07-01 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ protest after ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ queen … ┆ ┆ ┆ │\n",
"│ 201990004 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1. indians ┆ 1990-07-12 ┆ 1990-09-06 ┆ 57d │\n",
"│ ┆ ┆ ┆ ┆ ┆ gather as ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ siege int… ┆ ┆ ┆ │\n",
"│ 201990005 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1. dozens hurt ┆ 1990-08-14 ┆ 1990-08-15 ┆ 2d │\n",
"│ ┆ ┆ ┆ ┆ ┆ in mohawk ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ block… ┆ ┆ ┆ │\n",
"└───────────┴─────────┴───────┴──────┴───┴────────────────┴────────────┴────────────┴──────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests = protests.with_columns(\n",
" duration=pl.col(\"end_date\") - pl.col(\"start_date\") + pl.duration(days=1)\n",
")\n",
"protests.head()"
]
},
{
"cell_type": "markdown",
"id": "0c474668",
"metadata": {},
"source": [
"```{note}\n",
"Here we gave our column a name through the name of the keyword argument passed to `with_columns`. This is just another way to `.alias()`.\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "37608c3a",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "ce171c5c",
"metadata": {},
"source": [
"### Exercise: years of protest \n",
"\n",
"In the `protests` dataframe, create the two following integer columns:\n",
"- `nr_years_calendar` indicating how many different calendar years each protest spanned,\n",
"- `nr_years` indicating how many (actual) years the protest lasted "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e13b5710",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (17_145, 1)duration |
---|
duration[ms] |
1d |
1d |
1d |
57d |
2d |
… |
1d |
1d |
null |
null |
null |
"
],
"text/plain": [
"shape: (17_145, 1)\n",
"┌──────────────┐\n",
"│ duration │\n",
"│ --- │\n",
"│ duration[ms] │\n",
"╞══════════════╡\n",
"│ 1d │\n",
"│ 1d │\n",
"│ 1d │\n",
"│ 57d │\n",
"│ 2d │\n",
"│ … │\n",
"│ 1d │\n",
"│ 1d │\n",
"│ null │\n",
"│ null │\n",
"│ null │\n",
"└──────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.select(pl.col('duration'))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6e4d82bc",
"metadata": {
"tags": [
"answer",
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 35)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date | duration | nr_years_calendar | nr_years |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date | duration[ms] | i64 | i64 |
201990001 | "Canada" | 20 | 1990 | "North America" | 1 | 1 | 15 | 1 | 1990 | 15 | 1 | 1990 | 0 | "national" | null | "1000s" | "unspecified" | "political behavior, process" | "labor wage dispute" | null | null | "ignore" | null | null | null | null | null | null | "1. great canadian train journe… | 1990-01-15 | 1990-01-15 | 1d | 1 | 0 |
201990002 | "Canada" | 20 | 1990 | "North America" | 1 | 2 | 25 | 6 | 1990 | 25 | 6 | 1990 | 0 | "Montreal, Quebec" | null | "1000" | "unspecified" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. autonomy s cry revived in q… | 1990-06-25 | 1990-06-25 | 1d | 1 | 0 |
201990003 | "Canada" | 20 | 1990 | "North America" | 1 | 3 | 1 | 7 | 1990 | 1 | 7 | 1990 | 0 | "Montreal, Quebec" | null | "500" | "separatist parti quebecois" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. quebec protest after queen … | 1990-07-01 | 1990-07-01 | 1d | 1 | 0 |
201990004 | "Canada" | 20 | 1990 | "North America" | 1 | 4 | 12 | 7 | 1990 | 6 | 9 | 1990 | 1 | "Montreal, Quebec" | null | "100s" | "mohawk indians" | "land farm issue" | null | null | null | "accomodation" | null | null | null | null | null | null | "1. indians gather as siege int… | 1990-07-12 | 1990-09-06 | 57d | 1 | 0 |
201990005 | "Canada" | 20 | 1990 | "North America" | 1 | 5 | 14 | 8 | 1990 | 15 | 8 | 1990 | 1 | "Montreal, Quebec" | null | "950" | "local residents" | "political behavior, process" | null | null | null | "crowd dispersal" | "arrests" | "accomodation" | null | null | null | null | "1. dozens hurt in mohawk block… | 1990-08-14 | 1990-08-15 | 2d | 1 | 0 |
"
],
"text/plain": [
"shape: (5, 35)\n",
"┌───────────┬─────────┬───────┬──────┬───┬────────────┬──────────────┬──────────────────┬──────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ end_date ┆ duration ┆ nr_years_calenda ┆ nr_years │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ r ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ date ┆ duration[ms] ┆ --- ┆ i64 │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ i64 ┆ │\n",
"╞═══════════╪═════════╪═══════╪══════╪═══╪════════════╪══════════════╪══════════════════╪══════════╡\n",
"│ 201990001 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1990-01-15 ┆ 1d ┆ 1 ┆ 0 │\n",
"│ 201990002 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1990-06-25 ┆ 1d ┆ 1 ┆ 0 │\n",
"│ 201990003 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1990-07-01 ┆ 1d ┆ 1 ┆ 0 │\n",
"│ 201990004 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1990-09-06 ┆ 57d ┆ 1 ┆ 0 │\n",
"│ 201990005 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1990-08-15 ┆ 2d ┆ 1 ┆ 0 │\n",
"└───────────┴─────────┴───────┴──────┴───┴────────────┴──────────────┴──────────────────┴──────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.with_columns(\n",
" nr_years_calendar=pl.col('endyear') - pl.col('startyear') + 1,\n",
" nr_years=pl.col('duration').dt.total_days() // 365,\n",
").head()"
]
},
{
"cell_type": "markdown",
"id": "ef63e5a2",
"metadata": {},
"source": [
"```{hint}\n",
"The second one is tricky: take into account that the column \"duration\" is a `Duration` data type. You'll need to head over to [the documentation on temporal expressions](https://docs.pola.rs/api/python/stable/reference/expressions/temporal.html), and search in particular for those that apply to a `Duration` to find a method that can help you out.\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "a3a93d8c",
"metadata": {},
"source": [
"## Operations on rows\n",
"\n",
"If we consider the rows of a dataset, typically we may wish to select only some of them, change their order or add some new rows."
]
},
{
"cell_type": "markdown",
"id": "530541fa",
"metadata": {},
"source": [
"### Selecting by row numbers\n",
"\n",
"The simplest way to select only certain rows is through their corresponding number, i.e., their vertical position in the frame. "
]
},
{
"cell_type": "markdown",
"id": "810a3dd8",
"metadata": {},
"source": [
"For instance, you can select the first two or the three last rows in the corresponding following ways:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c348a746",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (2, 33)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date | duration |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date | duration[ms] |
201990001 | "Canada" | 20 | 1990 | "North America" | 1 | 1 | 15 | 1 | 1990 | 15 | 1 | 1990 | 0 | "national" | null | "1000s" | "unspecified" | "political behavior, process" | "labor wage dispute" | null | null | "ignore" | null | null | null | null | null | null | "1. great canadian train journe… | 1990-01-15 | 1990-01-15 | 1d |
201990002 | "Canada" | 20 | 1990 | "North America" | 1 | 2 | 25 | 6 | 1990 | 25 | 6 | 1990 | 0 | "Montreal, Quebec" | null | "1000" | "unspecified" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. autonomy s cry revived in q… | 1990-06-25 | 1990-06-25 | 1d |
"
],
"text/plain": [
"shape: (2, 33)\n",
"┌───────────┬─────────┬───────┬──────┬───┬────────────────┬────────────┬────────────┬──────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ sources ┆ start_date ┆ end_date ┆ duration │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ date ┆ date ┆ duration[ms] │\n",
"╞═══════════╪═════════╪═══════╪══════╪═══╪════════════════╪════════════╪════════════╪══════════════╡\n",
"│ 201990001 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1. great ┆ 1990-01-15 ┆ 1990-01-15 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ canadian train ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ journe… ┆ ┆ ┆ │\n",
"│ 201990002 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1. autonomy s ┆ 1990-06-25 ┆ 1990-06-25 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ cry revived in ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ q… ┆ ┆ ┆ │\n",
"└───────────┴─────────┴───────┴──────┴───┴────────────────┴────────────┴────────────┴──────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.head(2)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1fde7d07",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (3, 33)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date | duration |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date | duration[ms] |
9102018000 | "Papua New Guinea" | 910 | 2018 | "Oceania" | 0 | 0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
9102019000 | "Papua New Guinea" | 910 | 2019 | "Oceania" | 0 | 0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | "." | null | null | null | null | "." | null | null | null | null | null | null |
9102020000 | "Papua New Guinea" | 910 | 2020 | "Oceania" | 0 | 0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
"
],
"text/plain": [
"shape: (3, 33)\n",
"┌────────────┬───────────┬───────┬──────┬───┬─────────┬────────────┬──────────┬──────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ sources ┆ start_date ┆ end_date ┆ duration │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ date ┆ date ┆ duration[ms] │\n",
"╞════════════╪═══════════╪═══════╪══════╪═══╪═════════╪════════════╪══════════╪══════════════╡\n",
"│ 9102018000 ┆ Papua New ┆ 910 ┆ 2018 ┆ … ┆ null ┆ null ┆ null ┆ null │\n",
"│ ┆ Guinea ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n",
"│ 9102019000 ┆ Papua New ┆ 910 ┆ 2019 ┆ … ┆ null ┆ null ┆ null ┆ null │\n",
"│ ┆ Guinea ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n",
"│ 9102020000 ┆ Papua New ┆ 910 ┆ 2020 ┆ … ┆ null ┆ null ┆ null ┆ null │\n",
"│ ┆ Guinea ┆ ┆ ┆ ┆ ┆ ┆ ┆ │\n",
"└────────────┴───────────┴───────┴──────┴───┴─────────┴────────────┴──────────┴──────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.tail(3)"
]
},
{
"cell_type": "markdown",
"id": "a6121f1e",
"metadata": {},
"source": [
"```{note}\n",
"As shown earlier, if you call `.head()` or `.tail()` without any argument, they will show you 5 rows.\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "fd244a1e",
"metadata": {},
"source": [
"It's also possible to select contiguous rows by using _slicing_, as we already did for [strings](https://en.softpython.org/strings/strings2-sol.html#Slices), [lists](https://en.softpython.org/lists/lists2-sol.html#Slices) and [arrays](array_select).\n",
"\n",
"For example, here we select two rows starting from the fifth (_included_):"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ea89e51e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (2, 33)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date | duration |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date | duration[ms] |
201990006 | "Canada" | 20 | 1990 | "North America" | 1 | 6 | 19 | 9 | 1990 | 19 | 9 | 1990 | 0 | "Kahnawake Reservation near Mon… | null | "200" | "mohawk indians" | "police brutality" | null | null | null | "crowd dispersal" | "shootings" | null | null | null | null | null | "1. mohawks tear gassed in clas… | 1990-09-19 | 1990-09-19 | 1d |
201991001 | "Canada" | 20 | 1991 | "North America" | 1 | 1 | 10 | 9 | 1991 | 17 | 9 | 1991 | 0 | "national" | null | "110000" | "public service alliance of can… | "labor wage dispute" | null | null | null | "ignore" | null | null | null | null | null | null | "1. mulroney acts to crush stri… | 1991-09-10 | 1991-09-17 | 8d |
"
],
"text/plain": [
"shape: (2, 33)\n",
"┌───────────┬─────────┬───────┬──────┬───┬────────────────┬────────────┬────────────┬──────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ sources ┆ start_date ┆ end_date ┆ duration │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ date ┆ date ┆ duration[ms] │\n",
"╞═══════════╪═════════╪═══════╪══════╪═══╪════════════════╪════════════╪════════════╪══════════════╡\n",
"│ 201990006 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1. mohawks ┆ 1990-09-19 ┆ 1990-09-19 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ tear gassed in ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ clas… ┆ ┆ ┆ │\n",
"│ 201991001 ┆ Canada ┆ 20 ┆ 1991 ┆ … ┆ 1. mulroney ┆ 1991-09-10 ┆ 1991-09-17 ┆ 8d │\n",
"│ ┆ ┆ ┆ ┆ ┆ acts to crush ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ stri… ┆ ┆ ┆ │\n",
"└───────────┴─────────┴───────┴──────┴───┴────────────────┴────────────┴────────────┴──────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.slice(5, 2)"
]
},
{
"cell_type": "markdown",
"id": "7116e678",
"metadata": {},
"source": [
"### Selecting rows fulfilling some conditions"
]
},
{
"cell_type": "markdown",
"id": "9c51819a",
"metadata": {},
"source": [
"Selecting by row numbers is simple, but most of the time you do not know in which position the relevant rows are, as their relevance depends on one or more specific conditions.\n",
"So let's see now how to filter data according to a condition they should satisfy.\n",
"\n",
"This can be done using the `.filter()` method of DataFrames. As argument, it takes an expression evaluated for each row. All rows for which the expression evaluates to `True` are then kept.\n",
"\n",
"So, for instance, to only keep rows corresponding to protests that happened in Italy:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d62e3253",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 33)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date | duration |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date | duration[ms] |
3251990001 | "Italy" | 325 | 1990 | "Europe" | 1 | 1 | 13 | 7 | 1990 | 13 | 7 | 1990 | 1 | "Sicilian town of Ribera" | "100-999" | "> 50" | "farmers" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "sicilians riot over water shor… | 1990-07-13 | 1990-07-13 | 1d |
3251991001 | "Italy" | 325 | 1991 | "Europe" | 1 | 1 | 6 | 6 | 1991 | 6 | 6 | 1991 | 0 | "the Italian port of Brindisi" | "100-999" | "hundreds" | "albanians" | "political behavior, process" | null | null | null | "accomodation" | null | null | null | null | null | null | "refugees riot. the times. june… | 1991-06-06 | 1991-06-06 | 1d |
3251991002 | "Italy" | 325 | 1991 | "Europe" | 1 | 2 | 22 | 10 | 1991 | 22 | 10 | 1991 | 0 | "major cities" | "5000-10000" | "Thousands " | "people" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. italians strike to protest … | 1991-10-22 | 1991-10-22 | 1d |
3251992001 | "Italy" | 325 | 1992 | "Europe" | 1 | 1 | 28 | 6 | 1992 | 28 | 6 | 1992 | 0 | "Palermo" | ">10000" | "100000" | "people" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. anti mafia protest. the was… | 1992-06-28 | 1992-06-28 | 1d |
3251992002 | "Italy" | 325 | 1992 | "Europe" | 1 | 2 | 21 | 7 | 1992 | 21 | 7 | 1992 | 0 | "Milan, Rome,Palermo, and in ot… | ">10000" | "> 15000" | "people and police officers" | "political behavior, process" | null | null | null | "ignore" | "beatings" | null | null | null | null | null | "1. sicilians jeer italian lead… | 1992-07-21 | 1992-07-21 | 1d |
"
],
"text/plain": [
"shape: (5, 33)\n",
"┌────────────┬─────────┬───────┬──────┬───┬───────────────┬────────────┬────────────┬──────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ sources ┆ start_date ┆ end_date ┆ duration │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ date ┆ date ┆ duration[ms] │\n",
"╞════════════╪═════════╪═══════╪══════╪═══╪═══════════════╪════════════╪════════════╪══════════════╡\n",
"│ 3251990001 ┆ Italy ┆ 325 ┆ 1990 ┆ … ┆ sicilians ┆ 1990-07-13 ┆ 1990-07-13 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ riot over ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ water shor… ┆ ┆ ┆ │\n",
"│ 3251991001 ┆ Italy ┆ 325 ┆ 1991 ┆ … ┆ refugees ┆ 1991-06-06 ┆ 1991-06-06 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ riot. the ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ times. june… ┆ ┆ ┆ │\n",
"│ 3251991002 ┆ Italy ┆ 325 ┆ 1991 ┆ … ┆ 1. italians ┆ 1991-10-22 ┆ 1991-10-22 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ strike to ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ protest … ┆ ┆ ┆ │\n",
"│ 3251992001 ┆ Italy ┆ 325 ┆ 1992 ┆ … ┆ 1. anti mafia ┆ 1992-06-28 ┆ 1992-06-28 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ protest. the ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ was… ┆ ┆ ┆ │\n",
"│ 3251992002 ┆ Italy ┆ 325 ┆ 1992 ┆ … ┆ 1. sicilians ┆ 1992-07-21 ┆ 1992-07-21 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ jeer italian ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ lead… ┆ ┆ ┆ │\n",
"└────────────┴─────────┴───────┴──────┴───┴───────────────┴────────────┴────────────┴──────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.filter(pl.col(\"country\") == \"Italy\").head()"
]
},
{
"cell_type": "markdown",
"id": "7cdba9e4",
"metadata": {},
"source": [
"Notice here the usage of `pl.col()` that we've seen earlier! Again, this allows us to access the values contained in a column, here to make an equality test and return a boolean.\n",
"\n",
"And again, as seen previously when we computed the protests' durations, values from different columns can be compared. To illustrate that, here below we'll select all protests that spanned more than one calendar year:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "38ba0787",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 33)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date | duration |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date | duration[ms] |
701991004 | "Mexico" | 70 | 1991 | "North America" | 1 | 4 | 23 | 11 | 1991 | 15 | 1 | 1992 | 0 | "Tabasco to Mexico City" | null | "100s" | "democratic revolutionary party… | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. the washington post, mexica… | 1991-11-23 | 1992-01-15 | 54d |
1001994005 | "Colombia" | 100 | 1994 | "South America" | 1 | 5 | 22 | 12 | 1994 | 11 | 1 | 1995 | 0 | "southern Colombia" | null | "5000" | "putumayo peasant movement" | "political behavior, process" | null | null | null | "accomodation" | "killings" | "crowd dispersal" | null | null | null | null | "xinhua news agency, january 3,… | 1994-12-22 | 1995-01-11 | 21d |
1011996013 | "Venezuela" | 101 | 1996 | "South America" | 1 | 13 | 27 | 12 | 1996 | 15 | 1 | 1997 | 0 | "national level" | null | "50" | "health workers" | "labor wage dispute" | null | null | null | "ignore" | null | null | null | null | null | null | "the associated press, january … | 1996-12-27 | 1997-01-15 | 20d |
1012002010 | "Venezuela" | 101 | 2002 | "South America" | 1 | 10 | 3 | 12 | 2002 | 4 | 2 | 2003 | 0 | "national level" | null | "10000" | "workers" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "the washington post, december … | 2002-12-03 | 2003-02-04 | 64d |
1012004016 | "Venezuela" | 101 | 2004 | "South America" | 1 | 16 | 16 | 12 | 2004 | 22 | 1 | 2005 | 0 | "national level 4 different pr… | null | "2500" | "inmates" | "political behavior, process" | null | null | null | "accomodation" | null | null | null | null | null | null | "associated press worldstream, … | 2004-12-16 | 2005-01-22 | 38d |
"
],
"text/plain": [
"shape: (5, 33)\n",
"┌────────────┬───────────┬───────┬──────┬───┬──────────────┬────────────┬────────────┬─────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ sources ┆ start_date ┆ end_date ┆ duration │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ date ┆ date ┆ duration[ms │\n",
"│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ] │\n",
"╞════════════╪═══════════╪═══════╪══════╪═══╪══════════════╪════════════╪════════════╪═════════════╡\n",
"│ 701991004 ┆ Mexico ┆ 70 ┆ 1991 ┆ … ┆ 1. the ┆ 1991-11-23 ┆ 1992-01-15 ┆ 54d │\n",
"│ ┆ ┆ ┆ ┆ ┆ washington ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ post, ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ mexica… ┆ ┆ ┆ │\n",
"│ 1001994005 ┆ Colombia ┆ 100 ┆ 1994 ┆ … ┆ xinhua news ┆ 1994-12-22 ┆ 1995-01-11 ┆ 21d │\n",
"│ ┆ ┆ ┆ ┆ ┆ agency, ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ january 3,… ┆ ┆ ┆ │\n",
"│ 1011996013 ┆ Venezuela ┆ 101 ┆ 1996 ┆ … ┆ the ┆ 1996-12-27 ┆ 1997-01-15 ┆ 20d │\n",
"│ ┆ ┆ ┆ ┆ ┆ associated ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ press, ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ january … ┆ ┆ ┆ │\n",
"│ 1012002010 ┆ Venezuela ┆ 101 ┆ 2002 ┆ … ┆ the ┆ 2002-12-03 ┆ 2003-02-04 ┆ 64d │\n",
"│ ┆ ┆ ┆ ┆ ┆ washington ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ post, ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ december … ┆ ┆ ┆ │\n",
"│ 1012004016 ┆ Venezuela ┆ 101 ┆ 2004 ┆ … ┆ associated ┆ 2004-12-16 ┆ 2005-01-22 ┆ 38d │\n",
"│ ┆ ┆ ┆ ┆ ┆ press ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ worldstream, ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ … ┆ ┆ ┆ │\n",
"└────────────┴───────────┴───────┴──────┴───┴──────────────┴────────────┴────────────┴─────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.filter(pl.col(\"startyear\") != pl.col(\"endyear\")).head()"
]
},
{
"cell_type": "markdown",
"id": "c9e67d1b",
"metadata": {},
"source": [
"It's also possible to combine conditions like we already did in [Numpy filtering](array_select): for example by using the special operator conjunction `&` :"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2c1f87a9",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (6, 33)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date | duration |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date | duration[ms] |
3251992001 | "Italy" | 325 | 1992 | "Europe" | 1 | 1 | 28 | 6 | 1992 | 28 | 6 | 1992 | 0 | "Palermo" | ">10000" | "100000" | "people" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. anti mafia protest. the was… | 1992-06-28 | 1992-06-28 | 1d |
3251992002 | "Italy" | 325 | 1992 | "Europe" | 1 | 2 | 21 | 7 | 1992 | 21 | 7 | 1992 | 0 | "Milan, Rome,Palermo, and in ot… | ">10000" | "> 15000" | "people and police officers" | "political behavior, process" | null | null | null | "ignore" | "beatings" | null | null | null | null | null | "1. sicilians jeer italian lead… | 1992-07-21 | 1992-07-21 | 1d |
3251992003 | "Italy" | 325 | 1992 | "Europe" | 1 | 3 | 3 | 10 | 1992 | 3 | 10 | 1992 | 1 | "Rome" | "100-999" | "hundreds" | "left wing italian demonstrator… | "political behavior, process" | null | null | null | "beatings" | "crowd dispersal" | null | null | null | null | null | "police attack rome rioters. th… | 1992-10-03 | 1992-10-03 | 1d |
3251992004 | "Italy" | 325 | 1992 | "Europe" | 1 | 4 | 13 | 10 | 1992 | 13 | 10 | 1992 | 0 | "Across Italy" | ">10000" | "millions" | "workers" | "political behavior, process" | null | null | null | "ignore" | "accomodation" | null | null | null | null | null | "1. millions in italy march in … | 1992-10-13 | 1992-10-13 | 1d |
3251992005 | "Italy" | 325 | 1992 | "Europe" | 1 | 5 | 26 | 10 | 1992 | 26 | 10 | 1992 | 0 | "Rome" | ">10000" | "10000" | "shopkeepers" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "italy shopowners protest tax p… | 1992-10-26 | 1992-10-26 | 1d |
3251992006 | "Italy" | 325 | 1992 | "Europe" | 1 | 6 | 16 | 12 | 1992 | 16 | 12 | 1992 | 0 | "Rome" | ">10000" | "20000" | "doctors" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "rome doctors demonstrate to pr… | 1992-12-16 | 1992-12-16 | 1d |
"
],
"text/plain": [
"shape: (6, 33)\n",
"┌────────────┬─────────┬───────┬──────┬───┬───────────────┬────────────┬────────────┬──────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ sources ┆ start_date ┆ end_date ┆ duration │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ date ┆ date ┆ duration[ms] │\n",
"╞════════════╪═════════╪═══════╪══════╪═══╪═══════════════╪════════════╪════════════╪══════════════╡\n",
"│ 3251992001 ┆ Italy ┆ 325 ┆ 1992 ┆ … ┆ 1. anti mafia ┆ 1992-06-28 ┆ 1992-06-28 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ protest. the ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ was… ┆ ┆ ┆ │\n",
"│ 3251992002 ┆ Italy ┆ 325 ┆ 1992 ┆ … ┆ 1. sicilians ┆ 1992-07-21 ┆ 1992-07-21 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ jeer italian ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ lead… ┆ ┆ ┆ │\n",
"│ 3251992003 ┆ Italy ┆ 325 ┆ 1992 ┆ … ┆ police attack ┆ 1992-10-03 ┆ 1992-10-03 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ rome rioters. ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ th… ┆ ┆ ┆ │\n",
"│ 3251992004 ┆ Italy ┆ 325 ┆ 1992 ┆ … ┆ 1. millions ┆ 1992-10-13 ┆ 1992-10-13 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ in italy ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ march in … ┆ ┆ ┆ │\n",
"│ 3251992005 ┆ Italy ┆ 325 ┆ 1992 ┆ … ┆ italy ┆ 1992-10-26 ┆ 1992-10-26 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ shopowners ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ protest tax ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ p… ┆ ┆ ┆ │\n",
"│ 3251992006 ┆ Italy ┆ 325 ┆ 1992 ┆ … ┆ rome doctors ┆ 1992-12-16 ┆ 1992-12-16 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ demonstrate ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ to pr… ┆ ┆ ┆ │\n",
"└────────────┴─────────┴───────┴──────┴───┴───────────────┴────────────┴────────────┴──────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.filter(\n",
" (pl.col(\"country\") == \"Italy\")\n",
" & (pl.col(\"year\") == 1992)\n",
")"
]
},
{
"cell_type": "markdown",
"id": "a9778a28",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"id": "798e1b57",
"metadata": {},
"source": [
"### Sorting\n",
"\n",
"To obtain a new dataframe sorted according to one or more columns, we can use the `.sort()` method.\n",
"For instance, to see the longest protests, we'd do:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "26ee6091",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 33)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date | duration |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date | duration[ms] |
201998000 | "Canada" | 20 | 1998 | "North America" | 0 | 0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
201999000 | "Canada" | 20 | 1999 | "North America" | 0 | 0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
202001000 | "Canada" | 20 | 2001 | "North America" | 0 | 0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
202002000 | "Canada" | 20 | 2002 | "North America" | 0 | 0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
202004000 | "Canada" | 20 | 2004 | "North America" | 0 | 0 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
"
],
"text/plain": [
"shape: (5, 33)\n",
"┌───────────┬─────────┬───────┬──────┬───┬─────────┬────────────┬──────────┬──────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ sources ┆ start_date ┆ end_date ┆ duration │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ date ┆ date ┆ duration[ms] │\n",
"╞═══════════╪═════════╪═══════╪══════╪═══╪═════════╪════════════╪══════════╪══════════════╡\n",
"│ 201998000 ┆ Canada ┆ 20 ┆ 1998 ┆ … ┆ null ┆ null ┆ null ┆ null │\n",
"│ 201999000 ┆ Canada ┆ 20 ┆ 1999 ┆ … ┆ null ┆ null ┆ null ┆ null │\n",
"│ 202001000 ┆ Canada ┆ 20 ┆ 2001 ┆ … ┆ null ┆ null ┆ null ┆ null │\n",
"│ 202002000 ┆ Canada ┆ 20 ┆ 2002 ┆ … ┆ null ┆ null ┆ null ┆ null │\n",
"│ 202004000 ┆ Canada ┆ 20 ┆ 2004 ┆ … ┆ null ┆ null ┆ null ┆ null │\n",
"└───────────┴─────────┴───────┴──────┴───┴─────────┴────────────┴──────────┴──────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.sort(by='duration', descending=True).head()"
]
},
{
"cell_type": "markdown",
"id": "14b276b4",
"metadata": {},
"source": [
"Oops, that gives us only rows with a `null` duration, let's put them last!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0108206c",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 33)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date | duration |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date | duration[ms] |
6002011005 | "Morocco" | 600 | 2011 | "MENA" | 1 | 5 | 1 | 7 | 2011 | 24 | 1 | 2014 | 0 | "Imider" | "50-99" | ">50" | "protesters, students, locals, … | "political behavior, process" | "price increases, tax policy" | null | null | "killings" | null | null | null | null | null | null | "on moroccan hill, villagers ma… | 2011-07-01 | 2014-01-24 | 939d |
7502014011 | "India" | 750 | 2014 | "Asia" | 1 | 11 | 3 | 6 | 2014 | 3 | 6 | 2016 | 1 | "Jawahar Bagh, a public garden … | "2000-4999" | "About 2,000 people" | "a group that was calling for h… | "political behavior, process" | "price increases, tax policy" | null | null | "crowd dispersal" | "killings" | "shootings" | null | null | null | null | "24 killed as armed protesters … | 2014-06-03 | 2016-06-03 | 732d |
3502013013 | "Greece" | 350 | 2013 | "Europe" | 1 | 13 | 22 | 7 | 2013 | 11 | 5 | 2015 | 0 | "Athens" | "100-999" | "100s" | "state paid cleaning ladies" | "labor wage dispute" | null | null | null | "accomodation" | null | null | null | null | null | "." | "germany urges greece to vote o… | 2013-07-22 | 2015-05-11 | 659d |
6522011003 | "Syria" | 652 | 2011 | "MENA" | 1 | 3 | 18 | 3 | 2011 | 3 | 12 | 2012 | 1 | "National" | null | "10000s " | null | "political behavior, process" | "removal of politician" | "police brutality" | null | "killings" | "shootings" | "beatings" | "crowd dispersal" | "arrests" | "accomodation" | null | "security forces restrained as … | 2011-03-18 | 2012-12-03 | 627d |
1552016001 | "Chile" | 155 | 2016 | "South America" | 1 | 1 | 15 | 5 | 2016 | 17 | 5 | 2017 | 0 | "Chiloe Island" | "50-99" | ">50" | "fishermen" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. toxic red tide in chile pro… | 2016-05-15 | 2017-05-17 | 368d |
"
],
"text/plain": [
"shape: (5, 33)\n",
"┌────────────┬─────────┬───────┬──────┬───┬───────────────┬────────────┬────────────┬──────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ sources ┆ start_date ┆ end_date ┆ duration │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ date ┆ date ┆ duration[ms] │\n",
"╞════════════╪═════════╪═══════╪══════╪═══╪═══════════════╪════════════╪════════════╪══════════════╡\n",
"│ 6002011005 ┆ Morocco ┆ 600 ┆ 2011 ┆ … ┆ on moroccan ┆ 2011-07-01 ┆ 2014-01-24 ┆ 939d │\n",
"│ ┆ ┆ ┆ ┆ ┆ hill, ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ villagers ma… ┆ ┆ ┆ │\n",
"│ 7502014011 ┆ India ┆ 750 ┆ 2014 ┆ … ┆ 24 killed as ┆ 2014-06-03 ┆ 2016-06-03 ┆ 732d │\n",
"│ ┆ ┆ ┆ ┆ ┆ armed ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ protesters … ┆ ┆ ┆ │\n",
"│ 3502013013 ┆ Greece ┆ 350 ┆ 2013 ┆ … ┆ germany urges ┆ 2013-07-22 ┆ 2015-05-11 ┆ 659d │\n",
"│ ┆ ┆ ┆ ┆ ┆ greece to ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ vote o… ┆ ┆ ┆ │\n",
"│ 6522011003 ┆ Syria ┆ 652 ┆ 2011 ┆ … ┆ security ┆ 2011-03-18 ┆ 2012-12-03 ┆ 627d │\n",
"│ ┆ ┆ ┆ ┆ ┆ forces ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ restrained as ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ … ┆ ┆ ┆ │\n",
"│ 1552016001 ┆ Chile ┆ 155 ┆ 2016 ┆ … ┆ 1. toxic red ┆ 2016-05-15 ┆ 2017-05-17 ┆ 368d │\n",
"│ ┆ ┆ ┆ ┆ ┆ tide in chile ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ pro… ┆ ┆ ┆ │\n",
"└────────────┴─────────┴───────┴──────┴───┴───────────────┴────────────┴────────────┴──────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.sort(by='duration', descending=True, nulls_last=True).head()"
]
},
{
"cell_type": "markdown",
"id": "1178b04b",
"metadata": {},
"source": [
"### Exercise - Longest Italian protests\n",
"\n",
"✪ Analyze data from `protests` to find the 5 longest protests in Italy:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5ddcfb06",
"metadata": {
"tags": [
"answer",
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 33)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date | duration |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date | duration[ms] |
3252002008 | "Italy" | 325 | 2002 | "Europe" | 1 | 8 | 20 | 5 | 2002 | 20 | 7 | 2002 | 0 | "Sicily" | "100-999" | "100s" | "peasants, farmers" | "land farm issue" | null | null | null | "crowd dispersal" | "arrests" | "accomodation" | null | null | null | null | "mafia sells water in a land hi… | 2002-05-20 | 2002-07-20 | 62d |
3252010003 | "Italy" | 325 | 2010 | "Europe" | 1 | 3 | 2 | 9 | 2010 | 23 | 10 | 2010 | 1 | "Terzigno and Boscotrecase" | "100-999" | "100s" | "residents" | "land farm issue" | "political behavior, process" | null | null | "crowd dispersal" | "arrests" | "accomodation" | null | null | null | null | "1 why italy s trash wars are s… | 2010-09-02 | 2010-10-23 | 52d |
3252017001 | "Italy" | 325 | 2017 | "Europe" | 1 | 1 | 24 | 3 | 2017 | 6 | 4 | 2017 | 0 | "Rome" | "100-999" | "100" | "residents" | "land farm issue" | null | null | null | "accomodation" | null | null | null | null | null | null | "new bulbs turn romans into zom… | 2017-03-24 | 2017-04-06 | 14d |
3252018001 | "Italy" | 325 | 2018 | "Europe" | 1 | 1 | 1 | 1 | 2018 | 9 | 1 | 2018 | 0 | "national" | "100-999" | "100" | "protesters" | "social restrictions" | null | null | null | "ignore" | null | null | null | null | null | null | "biodegradable bags cause outra… | 2018-01-01 | 2018-01-09 | 9d |
3252017006 | "Italy" | 325 | 2017 | "Europe" | 1 | 6 | 15 | 7 | 2017 | 22 | 7 | 2017 | 0 | "Sicily" | "50-99" | "50" | "five star movement" | "land farm issue" | null | null | null | "ignore" | null | null | null | null | null | null | "italian town loses patience wi… | 2017-07-15 | 2017-07-22 | 8d |
"
],
"text/plain": [
"shape: (5, 33)\n",
"┌────────────┬─────────┬───────┬──────┬───┬───────────────┬────────────┬────────────┬──────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ sources ┆ start_date ┆ end_date ┆ duration │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ date ┆ date ┆ duration[ms] │\n",
"╞════════════╪═════════╪═══════╪══════╪═══╪═══════════════╪════════════╪════════════╪══════════════╡\n",
"│ 3252002008 ┆ Italy ┆ 325 ┆ 2002 ┆ … ┆ mafia sells ┆ 2002-05-20 ┆ 2002-07-20 ┆ 62d │\n",
"│ ┆ ┆ ┆ ┆ ┆ water in a ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ land hi… ┆ ┆ ┆ │\n",
"│ 3252010003 ┆ Italy ┆ 325 ┆ 2010 ┆ … ┆ 1 why italy s ┆ 2010-09-02 ┆ 2010-10-23 ┆ 52d │\n",
"│ ┆ ┆ ┆ ┆ ┆ trash wars ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ are s… ┆ ┆ ┆ │\n",
"│ 3252017001 ┆ Italy ┆ 325 ┆ 2017 ┆ … ┆ new bulbs ┆ 2017-03-24 ┆ 2017-04-06 ┆ 14d │\n",
"│ ┆ ┆ ┆ ┆ ┆ turn romans ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ into zom… ┆ ┆ ┆ │\n",
"│ 3252018001 ┆ Italy ┆ 325 ┆ 2018 ┆ … ┆ biodegradable ┆ 2018-01-01 ┆ 2018-01-09 ┆ 9d │\n",
"│ ┆ ┆ ┆ ┆ ┆ bags cause ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ outra… ┆ ┆ ┆ │\n",
"│ 3252017006 ┆ Italy ┆ 325 ┆ 2017 ┆ … ┆ italian town ┆ 2017-07-15 ┆ 2017-07-22 ┆ 8d │\n",
"│ ┆ ┆ ┆ ┆ ┆ loses ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ patience wi… ┆ ┆ ┆ │\n",
"└────────────┴─────────┴───────┴──────┴───┴───────────────┴────────────┴────────────┴──────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The two following ways are equivalent, the first might save you a few keystrokes.\n",
"protests.filter(pl.col(\"country\") == \"Italy\").sort(by=\"duration\").tail().reverse()\n",
"\n",
"protests.filter(pl.col(\"country\") == \"Italy\").sort(\n",
" by=\"duration\", descending=True, nulls_last=True\n",
").head()"
]
},
{
"cell_type": "markdown",
"id": "531b32db",
"metadata": {},
"source": [
"## Operations on strings\n",
"\n",
"To perform operations on `string` columns, we need to use the `.str` accessor after selecting the right column. This gives you access to a number of methods, some similar to known Python method to manipulate strings, some not. For more details, head over to [the documentation](https://docs.pola.rs/api/python/stable/reference/expressions/string.html)."
]
},
{
"cell_type": "markdown",
"id": "e2cfe917",
"metadata": {},
"source": [
"### Filter by textual values\n",
"\n",
"When we want to filter by text values, we can use `.str.contains`, here for example we select all the protests in which the source article mentioned a \"riot\":"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c2f2d672",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 33)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date | duration |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date | duration[ms] |
201992001 | "Canada" | 20 | 1992 | "North America" | 1 | 1 | 4 | 5 | 1992 | 5 | 5 | 1992 | 1 | "Toronto" | null | "1000" | "gangs of black and white youth… | "police brutality" | null | null | null | "arrests" | null | null | null | null | null | null | "1. window smashing and looting… | 1992-05-04 | 1992-05-05 | 2d |
202008001 | "Canada" | 20 | 2008 | "North America" | 1 | 1 | 10 | 8 | 2008 | 10 | 8 | 2008 | 1 | "Montreal, Quebec" | null | "100s" | "unspecified" | "police brutality" | null | null | null | "crowd dispersal" | null | null | null | null | null | null | "1. canada: rioting in montreal… | 2008-08-10 | 2008-08-10 | 1d |
202012001 | "Canada" | 20 | 2012 | "North America" | 1 | 1 | 10 | 2 | 2012 | 6 | 6 | 2012 | 1 | "Quebec" | null | "1000s" | "university students" | "price increases, tax policy" | null | null | null | "crowd dispersal" | "arrests" | null | null | null | null | null | "1. quebec plans to suspend cla… | 2012-02-10 | 2012-06-06 | 118d |
401994001 | "Cuba" | 40 | 1994 | "North America" | 1 | 1 | 5 | 8 | 1994 | 5 | 8 | 1994 | 1 | "Havana" | null | "1000" | "unspecified" | "political behavior, process" | null | null | null | "arrests" | null | null | null | null | null | null | "1. protesters battle police in… | 1994-08-05 | 1994-08-05 | 1d |
411995002 | "Haiti" | 41 | 1995 | "North America" | 1 | 2 | 23 | 11 | 1995 | 23 | 11 | 1995 | 1 | "Cite Soleil" | null | "50" | "residents of cite soleil" | "police brutality" | null | null | null | "ignore" | null | null | null | null | null | null | "1. fatal clashes in haiti slum… | 1995-11-23 | 1995-11-23 | 1d |
"
],
"text/plain": [
"shape: (5, 33)\n",
"┌───────────┬─────────┬───────┬──────┬───┬────────────────┬────────────┬────────────┬──────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ sources ┆ start_date ┆ end_date ┆ duration │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ date ┆ date ┆ duration[ms] │\n",
"╞═══════════╪═════════╪═══════╪══════╪═══╪════════════════╪════════════╪════════════╪══════════════╡\n",
"│ 201992001 ┆ Canada ┆ 20 ┆ 1992 ┆ … ┆ 1. window ┆ 1992-05-04 ┆ 1992-05-05 ┆ 2d │\n",
"│ ┆ ┆ ┆ ┆ ┆ smashing and ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ looting… ┆ ┆ ┆ │\n",
"│ 202008001 ┆ Canada ┆ 20 ┆ 2008 ┆ … ┆ 1. canada: ┆ 2008-08-10 ┆ 2008-08-10 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ rioting in ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ montreal… ┆ ┆ ┆ │\n",
"│ 202012001 ┆ Canada ┆ 20 ┆ 2012 ┆ … ┆ 1. quebec ┆ 2012-02-10 ┆ 2012-06-06 ┆ 118d │\n",
"│ ┆ ┆ ┆ ┆ ┆ plans to ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ suspend cla… ┆ ┆ ┆ │\n",
"│ 401994001 ┆ Cuba ┆ 40 ┆ 1994 ┆ … ┆ 1. protesters ┆ 1994-08-05 ┆ 1994-08-05 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ battle police ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ in… ┆ ┆ ┆ │\n",
"│ 411995002 ┆ Haiti ┆ 41 ┆ 1995 ┆ … ┆ 1. fatal ┆ 1995-11-23 ┆ 1995-11-23 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ clashes in ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ haiti slum… ┆ ┆ ┆ │\n",
"└───────────┴─────────┴───────┴──────┴───┴────────────────┴────────────┴────────────┴──────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.filter(pl.col(\"sources\").str.contains(\"riot\")).head()"
]
},
{
"cell_type": "markdown",
"id": "9102e1e0",
"metadata": {},
"source": [
"### Extracting strings\n",
"\n",
"We can also use it, for instance, to extract a rough estimate of the number of participants as an integer, as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5d6c1233",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 33)id | country | ccode | year | region | protest | protestnumber | startday | startmonth | startyear | endday | endmonth | endyear | protesterviolence | location | participants_category | participants | protesteridentity | protesterdemand1 | protesterdemand2 | protesterdemand3 | protesterdemand4 | stateresponse1 | stateresponse2 | stateresponse3 | stateresponse4 | stateresponse5 | stateresponse6 | stateresponse7 | sources | start_date | end_date | duration |
---|
i64 | str | i64 | i64 | str | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | i64 | str | str | i64 | str | str | str | str | str | str | str | str | str | str | str | str | str | date | date | duration[ms] |
201990001 | "Canada" | 20 | 1990 | "North America" | 1 | 1 | 15 | 1 | 1990 | 15 | 1 | 1990 | 0 | "national" | null | 1000 | "unspecified" | "political behavior, process" | "labor wage dispute" | null | null | "ignore" | null | null | null | null | null | null | "1. great canadian train journe… | 1990-01-15 | 1990-01-15 | 1d |
201990002 | "Canada" | 20 | 1990 | "North America" | 1 | 2 | 25 | 6 | 1990 | 25 | 6 | 1990 | 0 | "Montreal, Quebec" | null | 1000 | "unspecified" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. autonomy s cry revived in q… | 1990-06-25 | 1990-06-25 | 1d |
201990003 | "Canada" | 20 | 1990 | "North America" | 1 | 3 | 1 | 7 | 1990 | 1 | 7 | 1990 | 0 | "Montreal, Quebec" | null | 500 | "separatist parti quebecois" | "political behavior, process" | null | null | null | "ignore" | null | null | null | null | null | null | "1. quebec protest after queen … | 1990-07-01 | 1990-07-01 | 1d |
201990004 | "Canada" | 20 | 1990 | "North America" | 1 | 4 | 12 | 7 | 1990 | 6 | 9 | 1990 | 1 | "Montreal, Quebec" | null | 100 | "mohawk indians" | "land farm issue" | null | null | null | "accomodation" | null | null | null | null | null | null | "1. indians gather as siege int… | 1990-07-12 | 1990-09-06 | 57d |
201990005 | "Canada" | 20 | 1990 | "North America" | 1 | 5 | 14 | 8 | 1990 | 15 | 8 | 1990 | 1 | "Montreal, Quebec" | null | 950 | "local residents" | "political behavior, process" | null | null | null | "crowd dispersal" | "arrests" | "accomodation" | null | null | null | null | "1. dozens hurt in mohawk block… | 1990-08-14 | 1990-08-15 | 2d |
"
],
"text/plain": [
"shape: (5, 33)\n",
"┌───────────┬─────────┬───────┬──────┬───┬────────────────┬────────────┬────────────┬──────────────┐\n",
"│ id ┆ country ┆ ccode ┆ year ┆ … ┆ sources ┆ start_date ┆ end_date ┆ duration │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ str ┆ i64 ┆ i64 ┆ ┆ str ┆ date ┆ date ┆ duration[ms] │\n",
"╞═══════════╪═════════╪═══════╪══════╪═══╪════════════════╪════════════╪════════════╪══════════════╡\n",
"│ 201990001 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1. great ┆ 1990-01-15 ┆ 1990-01-15 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ canadian train ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ journe… ┆ ┆ ┆ │\n",
"│ 201990002 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1. autonomy s ┆ 1990-06-25 ┆ 1990-06-25 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ cry revived in ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ q… ┆ ┆ ┆ │\n",
"│ 201990003 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1. quebec ┆ 1990-07-01 ┆ 1990-07-01 ┆ 1d │\n",
"│ ┆ ┆ ┆ ┆ ┆ protest after ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ queen … ┆ ┆ ┆ │\n",
"│ 201990004 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1. indians ┆ 1990-07-12 ┆ 1990-09-06 ┆ 57d │\n",
"│ ┆ ┆ ┆ ┆ ┆ gather as ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ siege int… ┆ ┆ ┆ │\n",
"│ 201990005 ┆ Canada ┆ 20 ┆ 1990 ┆ … ┆ 1. dozens hurt ┆ 1990-08-14 ┆ 1990-08-15 ┆ 2d │\n",
"│ ┆ ┆ ┆ ┆ ┆ in mohawk ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ block… ┆ ┆ ┆ │\n",
"└───────────┴─────────┴───────┴──────┴───┴────────────────┴────────────┴────────────┴──────────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"protests.with_columns(pl.col('participants').str.extract(\"([0-9]+)\").cast(int)).head()"
]
},
{
"cell_type": "markdown",
"id": "71acd988",
"metadata": {},
"source": [
"```{note}\n",
"`\"([0-9]+)\"` is a regular expression. A whole course could be dedicated to them, so for now, you should just know that this particular one extracts any sequence of integers from a string. They are a very practical tool, so I wanted to make you aware of their existence, but you don't need to know more than that for this course.\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "f81dac71",
"metadata": {},
"source": [
"```{note}\n",
"There is a bunch of other data types with a special accessor, but we won't cover them here.\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "f86ebf19",
"metadata": {},
"source": [
"### Exercise - Game of Thrones\n",
"\n",
"Open with Polars the file [game-of-thrones.csv](data/game-of-thrones.csv) which holds episodes from various years.\n",
"\n",
"- **IMPORTANT**: place the dataframe into the variable `game`, so not to confuse it with previous dataframes\n",
"\n",
"Data source: [Kaggle](https://www.kaggle.com/datasets/bakar31/game-of-thronesgot) - License: [CC0: Public Domain](https://creativecommons.org/publicdomain/zero/1.0/)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "be2687b8",
"metadata": {
"tags": [
"answer",
"hide-input"
]
},
"outputs": [],
"source": [
"game = pl.read_csv('data/game-of-thrones.csv')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ff3ecec9",
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (5, 10)No. overall | No. in season | Season | Title | Directed by | Written by | Novel(s) adapted | Original air date | U.S. viewers(millions) | Imdb rating |
---|
i64 | i64 | i64 | str | str | str | str | str | f64 | f64 |
1 | 1 | 1 | ""Winter Is Coming"" | "Tim Van Patten" | "David Benioff & D. B. Weiss" | "A Game of Thrones" | "17-Apr-11" | 2.22 | 9.1 |
2 | 2 | 1 | ""The Kingsroad"" | "Tim Van Patten" | "David Benioff & D. B. Weiss" | "A Game of Thrones" | "24-Apr-11" | 2.2 | 8.8 |
3 | 3 | 1 | ""Lord Snow"" | "Brian Kirk" | "David Benioff & D. B. Weiss" | "A Game of Thrones" | "1-May-11" | 2.44 | 8.7 |
4 | 4 | 1 | ""Cripples, Bastards, and Broke… | "Brian Kirk" | "Bryan Cogman" | "A Game of Thrones" | "8-May-11" | 2.45 | 8.8 |
5 | 5 | 1 | ""The Wolf and the Lion"" | "Brian Kirk" | "David Benioff & D. B. Weiss" | "A Game of Thrones" | "15-May-11" | 2.58 | 9.1 |
"
],
"text/plain": [
"shape: (5, 10)\n",
"┌─────────┬────────┬────────┬──────────────┬───┬──────────────┬─────────────┬─────────────┬────────┐\n",
"│ No. ┆ No. in ┆ Season ┆ Title ┆ … ┆ Novel(s) ┆ Original ┆ U.S. viewer ┆ Imdb │\n",
"│ overall ┆ season ┆ --- ┆ --- ┆ ┆ adapted ┆ air date ┆ s(millions) ┆ rating │\n",
"│ --- ┆ --- ┆ i64 ┆ str ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ i64 ┆ ┆ ┆ ┆ str ┆ str ┆ f64 ┆ f64 │\n",
"╞═════════╪════════╪════════╪══════════════╪═══╪══════════════╪═════════════╪═════════════╪════════╡\n",
"│ 1 ┆ 1 ┆ 1 ┆ \"Winter Is ┆ … ┆ A Game of ┆ 17-Apr-11 ┆ 2.22 ┆ 9.1 │\n",
"│ ┆ ┆ ┆ Coming\" ┆ ┆ Thrones ┆ ┆ ┆ │\n",
"│ 2 ┆ 2 ┆ 1 ┆ \"The ┆ … ┆ A Game of ┆ 24-Apr-11 ┆ 2.2 ┆ 8.8 │\n",
"│ ┆ ┆ ┆ Kingsroad\" ┆ ┆ Thrones ┆ ┆ ┆ │\n",
"│ 3 ┆ 3 ┆ 1 ┆ \"Lord Snow\" ┆ … ┆ A Game of ┆ 1-May-11 ┆ 2.44 ┆ 8.7 │\n",
"│ ┆ ┆ ┆ ┆ ┆ Thrones ┆ ┆ ┆ │\n",
"│ 4 ┆ 4 ┆ 1 ┆ \"Cripples, ┆ … ┆ A Game of ┆ 8-May-11 ┆ 2.45 ┆ 8.8 │\n",
"│ ┆ ┆ ┆ Bastards, ┆ ┆ Thrones ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ and Broke… ┆ ┆ ┆ ┆ ┆ │\n",
"│ 5 ┆ 5 ┆ 1 ┆ \"The Wolf ┆ … ┆ A Game of ┆ 15-May-11 ┆ 2.58 ┆ 9.1 │\n",
"│ ┆ ┆ ┆ and the ┆ ┆ Thrones ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ Lion\" ┆ ┆ ┆ ┆ ┆ │\n",
"└─────────┴────────┴────────┴──────────────┴───┴──────────────┴─────────────┴─────────────┴────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"game.head()"
]
},
{
"cell_type": "markdown",
"id": "ecc8dc3d",
"metadata": {},
"source": [
"#### Favorite episodes \n",
"\n",
"You are given a dictionary `favorite` with the most liked episodes of a group of people, who unfortunately don't remember exactly the various titles which are often incomplete: Select the favorite episodes of Paolo and Chiara.\n",
"\n",
"- assume the capitalization in `favorite` is the correct one\n",
"- **NOTE**: the dataset contains insidious double quotes around the titles, but if you write the code in the right way it shouldn't be a problem"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ec39a0dd",
"metadata": {},
"outputs": [],
"source": [
"favorite = {\n",
" \"Paolo\": \"Winter Is\",\n",
" \"Chiara\": \"Wolf and the Lion\",\n",
" \"Anselmo\": \"Fire and\",\n",
" \"Letizia\": \"Garden of\",\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2395c320",
"metadata": {
"tags": [
"hide-input",
"answer"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (2, 10)No. overall | No. in season | Season | Title | Directed by | Written by | Novel(s) adapted | Original air date | U.S. viewers(millions) | Imdb rating |
---|
i64 | i64 | i64 | str | str | str | str | str | f64 | f64 |
1 | 1 | 1 | ""Winter Is Coming"" | "Tim Van Patten" | "David Benioff & D. B. Weiss" | "A Game of Thrones" | "17-Apr-11" | 2.22 | 9.1 |
5 | 5 | 1 | ""The Wolf and the Lion"" | "Brian Kirk" | "David Benioff & D. B. Weiss" | "A Game of Thrones" | "15-May-11" | 2.58 | 9.1 |
"
],
"text/plain": [
"shape: (2, 10)\n",
"┌─────────┬────────┬────────┬──────────────┬───┬──────────────┬─────────────┬─────────────┬────────┐\n",
"│ No. ┆ No. in ┆ Season ┆ Title ┆ … ┆ Novel(s) ┆ Original ┆ U.S. viewer ┆ Imdb │\n",
"│ overall ┆ season ┆ --- ┆ --- ┆ ┆ adapted ┆ air date ┆ s(millions) ┆ rating │\n",
"│ --- ┆ --- ┆ i64 ┆ str ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ i64 ┆ ┆ ┆ ┆ str ┆ str ┆ f64 ┆ f64 │\n",
"╞═════════╪════════╪════════╪══════════════╪═══╪══════════════╪═════════════╪═════════════╪════════╡\n",
"│ 1 ┆ 1 ┆ 1 ┆ \"Winter Is ┆ … ┆ A Game of ┆ 17-Apr-11 ┆ 2.22 ┆ 9.1 │\n",
"│ ┆ ┆ ┆ Coming\" ┆ ┆ Thrones ┆ ┆ ┆ │\n",
"│ 5 ┆ 5 ┆ 1 ┆ \"The Wolf ┆ … ┆ A Game of ┆ 15-May-11 ┆ 2.58 ┆ 9.1 │\n",
"│ ┆ ┆ ┆ and the ┆ ┆ Thrones ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ Lion\" ┆ ┆ ┆ ┆ ┆ │\n",
"└─────────┴────────┴────────┴──────────────┴───┴──────────────┴─────────────┴─────────────┴────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fav_episodes = game.filter(\n",
" pl.col(\"Title\").str.contains(favorite[\"Paolo\"])\n",
" | pl.col(\"Title\").str.contains(favorite[\"Chiara\"])\n",
")\n",
"fav_episodes"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "86d941ff",
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (2, 10)No. overall | No. in season | Season | Title | Directed by | Written by | Novel(s) adapted | Original air date | U.S. viewers(millions) | Imdb rating |
---|
i64 | i64 | i64 | str | str | str | str | str | f64 | f64 |
1 | 1 | 1 | ""Winter Is Coming"" | "Tim Van Patten" | "David Benioff & D. B. Weiss" | "A Game of Thrones" | "17-Apr-11" | 2.22 | 9.1 |
5 | 5 | 1 | ""The Wolf and the Lion"" | "Brian Kirk" | "David Benioff & D. B. Weiss" | "A Game of Thrones" | "15-May-11" | 2.58 | 9.1 |
"
],
"text/plain": [
"shape: (2, 10)\n",
"┌─────────┬────────┬────────┬──────────────┬───┬──────────────┬─────────────┬─────────────┬────────┐\n",
"│ No. ┆ No. in ┆ Season ┆ Title ┆ … ┆ Novel(s) ┆ Original ┆ U.S. viewer ┆ Imdb │\n",
"│ overall ┆ season ┆ --- ┆ --- ┆ ┆ adapted ┆ air date ┆ s(millions) ┆ rating │\n",
"│ --- ┆ --- ┆ i64 ┆ str ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ i64 ┆ ┆ ┆ ┆ str ┆ str ┆ f64 ┆ f64 │\n",
"╞═════════╪════════╪════════╪══════════════╪═══╪══════════════╪═════════════╪═════════════╪════════╡\n",
"│ 1 ┆ 1 ┆ 1 ┆ \"Winter Is ┆ … ┆ A Game of ┆ 17-Apr-11 ┆ 2.22 ┆ 9.1 │\n",
"│ ┆ ┆ ┆ Coming\" ┆ ┆ Thrones ┆ ┆ ┆ │\n",
"│ 5 ┆ 5 ┆ 1 ┆ \"The Wolf ┆ … ┆ A Game of ┆ 15-May-11 ┆ 2.58 ┆ 9.1 │\n",
"│ ┆ ┆ ┆ and the ┆ ┆ Thrones ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ Lion\" ┆ ┆ ┆ ┆ ┆ │\n",
"└─────────┴────────┴────────┴──────────────┴───┴──────────────┴─────────────┴─────────────┴────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fav_episodes"
]
},
{
"cell_type": "markdown",
"id": "e7598c5b",
"metadata": {},
"source": [
"#### First airing\n",
"\n",
"Write a function `first_aired` that returns a dataframe containing all the episodes which have been aired the first time in a given `year` (`Original air date` column)\n",
"\n",
"- **NOTE**: `year` is given as an `int`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5b7f5d5e",
"metadata": {
"tags": [
"answer",
"hide-input"
]
},
"outputs": [],
"source": [
"def first_aired(df, year):\n",
" return df.filter(pl.col(\"Original air date\").str.slice(-2) == str(year)[2:])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e8e9a1d7",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
shape: (6, 10)No. overall | No. in season | Season | Title | Directed by | Written by | Novel(s) adapted | Original air date | U.S. viewers(millions) | Imdb rating |
---|
i64 | i64 | i64 | str | str | str | str | str | f64 | f64 |
62 | 2 | 7 | ""Stormborn"" | "Mark Mylod" | "Bryan Cogman" | "Outline from A Dream of Spring… | "23-Jul-17" | 9.27 | 8.9 |
63 | 3 | 7 | ""The Queen's Justice"" | "Mark Mylod" | "David Benioff & D. B. Weiss" | "Outline from A Dream of Spring… | "30-Jul-17" | 9.25 | 9.2 |
64 | 4 | 7 | ""The Spoils of War"" | "Matt Shakman" | "David Benioff & D. B. Weiss" | "Outline from A Dream of Spring… | "6-Aug-17" | 10.17 | 9.8 |
65 | 5 | 7 | ""Eastwatch"" | "Matt Shakman" | "Dave Hill" | "Outline from A Dream of Spring… | "13-Aug-17" | 10.72 | 8.8 |
66 | 6 | 7 | ""Beyond the Wall"" | "Alan Taylor" | "David Benioff & D. B. Weiss" | "Outline from A Dream of Spring… | "20-Aug-17" | 10.24 | 9.0 |
67 | 7 | 7 | ""The Dragon and the Wolf"" | "Jeremy Podeswa" | "David Benioff & D. B. Weiss" | "Outline from A Dream of Spring… | "27-Aug-17" | 12.07 | 9.4 |
"
],
"text/plain": [
"shape: (6, 10)\n",
"┌─────────┬────────┬────────┬───────────────┬───┬──────────┬───────────────┬──────────────┬────────┐\n",
"│ No. ┆ No. in ┆ Season ┆ Title ┆ … ┆ Novel(s) ┆ Original air ┆ U.S. viewers ┆ Imdb │\n",
"│ overall ┆ season ┆ --- ┆ --- ┆ ┆ adapted ┆ date ┆ (millions) ┆ rating │\n",
"│ --- ┆ --- ┆ i64 ┆ str ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ i64 ┆ i64 ┆ ┆ ┆ ┆ str ┆ str ┆ f64 ┆ f64 │\n",
"╞═════════╪════════╪════════╪═══════════════╪═══╪══════════╪═══════════════╪══════════════╪════════╡\n",
"│ 62 ┆ 2 ┆ 7 ┆ \"Stormborn\" ┆ … ┆ Outline ┆ 23-Jul-17 ┆ 9.27 ┆ 8.9 │\n",
"│ ┆ ┆ ┆ ┆ ┆ from A ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ Dream of ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ Spring… ┆ ┆ ┆ │\n",
"│ 63 ┆ 3 ┆ 7 ┆ \"The Queen's ┆ … ┆ Outline ┆ 30-Jul-17 ┆ 9.25 ┆ 9.2 │\n",
"│ ┆ ┆ ┆ Justice\" ┆ ┆ from A ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ Dream of ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ Spring… ┆ ┆ ┆ │\n",
"│ 64 ┆ 4 ┆ 7 ┆ \"The Spoils ┆ … ┆ Outline ┆ 6-Aug-17 ┆ 10.17 ┆ 9.8 │\n",
"│ ┆ ┆ ┆ of War\" ┆ ┆ from A ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ Dream of ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ Spring… ┆ ┆ ┆ │\n",
"│ 65 ┆ 5 ┆ 7 ┆ \"Eastwatch\" ┆ … ┆ Outline ┆ 13-Aug-17 ┆ 10.72 ┆ 8.8 │\n",
"│ ┆ ┆ ┆ ┆ ┆ from A ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ Dream of ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ Spring… ┆ ┆ ┆ │\n",
"│ 66 ┆ 6 ┆ 7 ┆ \"Beyond the ┆ … ┆ Outline ┆ 20-Aug-17 ┆ 10.24 ┆ 9.0 │\n",
"│ ┆ ┆ ┆ Wall\" ┆ ┆ from A ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ Dream of ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ Spring… ┆ ┆ ┆ │\n",
"│ 67 ┆ 7 ┆ 7 ┆ \"The Dragon ┆ … ┆ Outline ┆ 27-Aug-17 ┆ 12.07 ┆ 9.4 │\n",
"│ ┆ ┆ ┆ and the Wolf\" ┆ ┆ from A ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ Dream of ┆ ┆ ┆ │\n",
"│ ┆ ┆ ┆ ┆ ┆ Spring… ┆ ┆ ┆ │\n",
"└─────────┴────────┴────────┴───────────────┴───┴──────────┴───────────────┴──────────────┴────────┘"
]
},
"execution_count": null,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"first_aired(game, 2017)"
]
},
{
"cell_type": "markdown",
"id": "896fd8bb",
"metadata": {},
"source": [
"## A parallel with spreadsheets\n",
"\n",
"You're probably familiar with some spreadsheets software, such as Excel or Google Sheets.\n",
"To make an analogy, you can picture a dataframe as a single sheet, which in Python is stored as a variable.\n",
"Then, if you do not reassign this variable, the dataframe stays the same -- so in the analogy, the sheet is untouched.\n",
"So while in a spreadsheet, you'd compute for instance the sum of a column in a cell on the same sheet, in Python, you'd assign the result of the sum to a new variable, that is, to a new sheet.\n",
"Same goes for filtering or reordering the rows: if you do not explicitly override the original variable, the original data stays intact."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {
"height": "calc(100% - 180px)",
"left": "10px",
"top": "150px",
"width": "192.188px"
},
"toc_section_display": true,
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 5
}