About¶

CensusWrangler is the fast python API to your local copy of the Australian Census data.

The use-case¶

If you have any interest or career involving Australian data, you’ve likely had to deal with the challenging Census data structures.

In the government’s interest of maintaining privacy for all involved, they provide data as a series of hundreds of tables each at different levels of geographic aggregation.

My experience has been that getting the right slice of data out this structure can be tedious and time-consuming. It drags out what would otherwise be a quick piece of adhoc analysis.

If you’re lucky - your organisation already has a paid API, or database of the census data. But if not?

You can speed up the process with the CensusWrangler library. With quick & templatable configurations it helps you efficiently pull data out of the downloadable census datapacks.

Features¶

Configuration templates: Deploy and quickly customise configuration csvs - then let CensusWrangler instally find and merge the data
Re-use configs across geographies: Change a single argument to re-pull data from a different geography without any additional work
Validation & Checks: Your config is checked against census metadata, with detailed errors letting you know what went wrong
Built-in Grouping: Make easier for yourself down the line by setting groups and your own column naming directly in the config file
Customisable output: Select from several convenient output methods, accessing the output quickly in your desired format
On you local machine: Once you have downloaded the datapack, the library requires no access to anything other than what is on your local machine

Set-up¶

Download a census datapack¶

Visit the the ABS Census DataPacks page & download a datapack .zip
Extract the .zip into a single folder, containing nothing else

When you are ready, your datapack folder should look something like this, with the name of the downloaded pack:

/2021_GCP_all_for_AUS_short-header
    ├── 2021 Census GCP All Geographies for AUS/
    ├── Metadata/
    ├── Readme/

Note

CensusWrangler is fully tested on the 2021 census datapacks, and basic testing shows it working with previous years as well

Install¶

pip install censuswrangler

Import¶

import censuswrangler as cw

Usage¶

Preparing a Config Template¶

We tell CensusWrangler what data to grab by preparing a config csv:

Generate a config template¶

You can quickly generate a template to get started:

cw.create_config_template(r"C:\Config Folder\")

Config fields¶

Below is an sample example of the config csv`

The first 3 fields come straight from the metadata:

SHORT - Short descriptor
LONG - Long descriptor
DATAPACKFILE - A code indicating which datapack file contains the field

The other 2 are used to customise, group & simplify the names of fields in the CensusWrangler output:

CUSTOM_DESCRIPTION - Describes the data subset represented, Like ‘Male’ & ‘Female’
CUSTOM_GROUP - Describes the subsets, like ‘Gender’

Put whatever you need in the custom fields, just make sure that each row is unique.

SHORT	LONG	DATAPACKFILE	CUSTOM_DESCRIPTION	CUSTOM_GROUP
Tot_P_M	Total_Persons_Males	G01	Male	Gender
Tot_P_F	Total_Persons_Females	G01	Female	Gender
P_Tot_Marrd_reg_marrge	PERSONS_Total_Married_in_a_registered_marriage	G06	Married	Relationship Type
P_Tot_Married_de_facto	PERSONS_Total_Married_in_a_de_facto_marriage	G06	Couple	Relationship Type
P_Tot_Not_married	PERSONS_Total_Not_married	G06	No relationship	Relationship Type

Referencing Metadata¶

It’s super easy to copy what you want out of the metadata file that comes with each datapack.

In the datapack folder, look in the /Metadata/ folder for a file like Metadata_2021_GCP_DataPack_R1_R2.xlsx
Go to the Cell Descriptors Information sheet
Browse the fields, and copy over the SHORT, LONG & DATAPACKFILE columns for your fields you want, into the config file
Fill in the custom fields in the remaining columns of the config file

This file is also used to validate your config selections, so try & avoiding changing it as you go.

Select a Census Geography¶

Visit the ABS Census Geography Glossary.

Determine the shortcode for the geography you are after. For example, ‘Statistical Area Level 1’ has code ‘SA1’.

This is also reflected by the folder names in the datapack - look for the name like \2021 Census GCP All Geographies for AUS\.

Usage¶

Once the config file is ready, you can run CensusWrangler with just a few lines of code:

import censuswrangler as cw

# Intialise the Census object
census = cw.Census(
    datapack_path=r"E:/Data/2021_GCP_all_for_AUS_short-header/",  # Datapack folder path
    config_path=r"censuswrangler/config_template.csv",  # Config file path
    geo_type="LGA",  # The geotype code to pull the data for
    year=2021,  # The census year
)

# Gather and prepare the data from the datapack
census.wrangle("all")  # "merge" | "pivot" | "all"

# Access the output dataframes in the desired format
print(census.merged_df)
print(census.pivoted_df)

# Or output directly to csv
census.to_csv(
    "all",  # "merge" | "pivot" | "all"
    r"F:/Github/censuswrangler/test_output",  # Output folder
)

More details are available in the documentation.

Example Output¶

You can see example output over in the repository’s sample folder.

Good luck - and don’t forget to give the repository a star if this helped you out (it all helps!).