Census

The census module provides the Census class, which encapsulates common configuration and datapack objects, and provides methods to gather, filter, join, and output census data based on specified configurations.

class census.Census

Bases: object

The Census class is designed to facilitate the processing and manipulation of census data. It integrates configuration and datapack objects, enabling users to gather, filter, join, and output census data based on specified configurations. The class supports operations such as merging and pivoting dataframes, and provides methods to validate and output the processed data.

datapack_path

Path to the folder containing the census datapack.

Type:

str

config_path

Path to the configuration file.

Type:

str

geo_type

The spatial aggregation sub-folder to target (e.g., LGA, SA2).

Type:

str

year

The census year used to identify columns in the datapack.

Type:

int

col_type

The type of column output to use, either ‘short’ or ‘long’. Defaults to ‘short’.

Type:

str

affix_type

Specifies whether to add a ‘prefix’, ‘suffix’, or ‘none’ to column names. Defaults to ‘prefix’.

Type:

str

config

An instance of the Config class, representing the configuration file.

Type:

Config

pack

A dictionary containing all the information needed to work on the datapack.

Type:

dict

data

An instance of the Data class, built from the target geo and configuration.

Type:

Data

merged_df

A dataframe to store merged data after the wrangle method is called.

Type:

Optional[pd.DataFrame]

pivoted_df

A dataframe to store pivoted data after the wrangle method is called.

Type:

Optional[pd.DataFrame]

__init__(
datapack_path: str,
config_path: str,
geo_type: str,
year: int,
col_type: str = 'short',
affix_type: str = 'prefix',
)

Initializes the Census class with the specified datapack and configuration paths, geographic type, year, column type, and affix type. It also validates input parameters and prepares the necessary objects for data processing.

Parameters:
  • datapack_path (str) – Path to the folder containing the census datapack.

  • config_path (str) – Path to the configuration file.

  • geo_type (str) – The spatial aggregation sub-folder to target (e.g., LGA, SA2).

  • year (int) – The census year used to identify columns in the datapack.

  • col_type (str, optional) – The type of column output to use, either ‘short’ or ‘long’. Defaults to ‘short’.

  • affix_type (str, optional) – Specifies whether to add a ‘prefix’, ‘suffix’, or ‘none’ to column names. Defaults to ‘prefix’.

Raises:
  • AssertionError – If col_type is not one of the allowed values (‘short’, ‘long’).

  • AssertionError – If affix_type is not one of the allowed values (‘prefix’, ‘suffix’, ‘none’).

to_csv(
mode: str,
output_folder: str,
)

Outputs the processed census data to CSV files in the specified output folder.

This method generates CSV files based on the specified mode (‘merge’, ‘pivot’, or ‘all’) and saves them to the provided output folder. The filenames include metadata such as geographic type, column type, affix type, and the current timestamp.

Parameters:
  • mode (str) – The output mode, which determines the type of data to export (‘merge’, ‘pivot’, ‘all’).

  • output_folder (str) – The directory where the CSV files will be saved. Must be an existing directory.

Raises:
  • AssertionError – If the specified output_folder does not exist or is not a directory.

  • ValueError – If the mode argument is not one of the allowed values (‘merge’, ‘pivot’, ‘all’).

wrangle(
mode,
)

Processes census data by gathering, filtering, and joining specified census files based on the configuration and datapack objects in the Census class.

This method performs the following steps:

  1. Validates the mode argument to ensure it is one of the allowed values (‘merge’, ‘pivot’, ‘all’).

  2. Reads and filters data from census files based on the configuration.

  3. Renames and prepares columns according to the specified column type (col_type) and affix type (affix_type).

  4. Merges the prepared dataframes if the mode is ‘merge’ or ‘all’.

  5. Creates pivoted dataframes grouped by specified column groups if the mode is ‘pivot’ or ‘all’.

Parameters:

mode (str) – The processing mode, which determines the type of operation to perform (‘merge’, ‘pivot’, ‘all’).

Raises:
  • AssertionError – If the mode argument is not one of the allowed values.

  • ValueError – If invalid values are provided for col_type or affix_type.