Parameterizing simulations
Table of contents
- Epidemiology
- Synthetic population generation
- Synthetic agenda
- To link building types and activities
- Redefining parameter values
Epidemiology
There are two different ways to set up epidemiological parameters in COMOKIT:
- by modifying the parameters variables in Model/Parameters.gaml (setting the variable
load_epidemiological_parameter_from_file
tofalse
), - by giving the path of a CSV file containing different values for the parameters (setting the variable
epidemiological_parameters
to the path of the file).
In both cases, a map of parameters (map_epidemiological_parameters
, defined in Model/Global.gaml
) will be generated, using different ages (int) as key of the map, and the value of the parameters.
Modifying the parameters in Model/Parameters.gaml
The first method is easier, as the parameters are already defined and just their values have to be changed. However, it lacks flexibility and heterogeneity compared to the second method.
Note that this method is not the one used by default in COMOKIT, but the second one, providing more realistic values.
Environmental contamination dynamics
The list of the epidemiological parameters presented in Model/Parameters.gaml for environmental contamination dynamics is:
allow_transmission_building
, allowing environmental contamination dynamics, set by default totrue
,basic_viral_decrease
, the decrease over time of the viral load in the environment, set by default to0.33
,basic_viral_release
, the release over time of the viral load in the environment by one infected Individual, set by default to3.0
,successful_contact_rate_building
, the successful contact rate for environment to human transmission derivated from the R0 and the mean infectious period, set by default to2.5 * 1/(14.69973*nb_step_for_one_day)
,
Human to human transmission dynamics
The list of the epidemiological parameters presented in Model/Parameters.gaml for human to human transmission dynamics is:
allow_transmission_human
, allowing human to human transmission dynamics, set by default totrue
,init_all_ages_factor_contact_rate_asymptomatic
, the non-age-dependent factor of the successful contact rate for asymptomatic/presymptomatic human to human transmission, set by default to0.55
,init_all_ages_proportion_asymptomatic
, the non-age-dependent proportion of asymptomatic individuals, set by default to0.3
,init_all_ages_successful_contact_rate_human
, the non-age-dependent successful contact rate for human to human transmission derivated from the R0 and the mean infectious period, set by default to2.5 * 1/(14.69973)
, -reduction_coeff_all_buildings_individuals
, the reduction of the contact rate for individuals belonging to different households but living in the same building, set by default to0.05
,reduction_coeff_all_buildings_inhabitants
, the reduction of the contact rate for individuals belonging to different households but living in the same building, set by default to0.01
,init_all_ages_distribution_type_incubation_period_symptomatic
, the non-age-dependent distribution for the incubation period of symptomatic Individuals, set by default toLognormal
,init_all_ages_parameter_1_incubation_period_symptomatic
, the first parameter for the non-age-dependent distribution for the incubation period of symptomatic Individuals, set by default to1.57
,init_all_ages_parameter_2_incubation_period_symptomatic
, the second parameter for the non-age-dependent distribution for the incubation period of symptomatic Individuals, set by default to0.65
,init_all_ages_distribution_type_incubation_period_asymptomatic
, the non-age-dependent distribution for the incubation period of asymptomatic Individuals, set by default toLognormal
,init_all_ages_parameter_1_incubation_period_asymptomatic
, the first parameter for the non-age-dependent distribution for the incubation period of asymptomatic Individuals, set by default to1.57
,init_all_ages_parameter_2_incubation_period_asymptomatic
, the second parameter for the non-age-dependent distribution for the incubation period of asymptomatic Individuals, set by default to0.65
,init_all_ages_distribution_type_serial_interval
, the non-age-dependent distribution for the serial interval, set by default toNormal
,init_all_ages_parameter_1_serial_interval
, the first parameter for the non-age-dependent distribution for the incubation period of asymptomatic Individuals, set by default to3.96
,init_all_ages_parameter_2_serial_interval
, the second parameter for the non-age-dependent distribution for the incubation period of asymptomatic Individuals, set by default to3.75
,init_all_ages_distribution_type_infectious_period_symptomatic
, the non-age-dependent distribution for the infectious period of symptomatic Individuals, set by default toLognormal
,init_all_ages_parameter_1_infectious_period_symptomatic
, the first parameter for the non-age-dependent distribution for the infectious period of symptomatic Individuals, set by default to3.034953
,init_all_ages_parameter_2_infectious_period_symptomatic
, the second parameter for the non-age-dependent distribution for the infectious period of symptomatic Individuals, set by default to0.34
,init_all_ages_distribution_type_infectious_period_asymptomatic
, the non-age-dependent distribution for the infectious period of asymptomatic Individuals, set by default toLognormal
,init_all_ages_parameter_1_infectious_period_asymptomatic
, the first parameter for the non-age-dependent distribution for the infectious period of asymptomatic Individuals, set by default to3.034953
,init_all_ages_parameter_2_infectious_period_asymptomatic
, the second parameter for the non-age-dependent distribution for the infectious period of asymptomatic Individuals, set by default to0.34
.
Hospitalisation and severity
The list of the hospitalisation and severity related parameters presented in Model/Parameters.gaml for human to human transmission dynamics is:
init_all_ages_proportion_hospitalisation
, the non-age-dependent proportion of symptomatic Individuals that will need hospitalisation, set by default to0.2
,init_all_ages_distribution_type_onset_to_hospitalisation
, the non-age-dependent distribution of the time between symptom onset and hospital admission for symptomatic Individuals, set by default toLognormal
,init_all_ages_parameter_1_onset_to_hospitalisation
, the first parameter of the non-age-dependent distribution of the time between symptom onset and hospital admission for symptomatic Individuals, set by default to3.034953
,init_all_ages_parameter_2_onset_to_hospitalisation
, the second parameter of the non-age-dependent distribution of the time between symptom onset and hospital admission for symptomatic Individuals, set by default to0.34
,init_all_ages_proportion_icu
, the non-age-dependent proportion of hospitalised Individuals that will need to go in ICU, set by default to0.1
,init_all_ages_distribution_type_hospitalisation_to_ICU
, the non-age-dependent distribution of the time between hospitalisation and ICU admission for hospitalised Individuals, set by default toLognormal
,init_all_ages_parameter_1_hospitalisation_to_ICU
, the first parameter of the non-age-dependent distribution of the time between hospitalisation and ICU admission for hospitalised Individuals, set by default to3.034953
,init_all_ages_parameter_2_hospitalisation_to_ICU
, the second parameter of the non-age-dependent distribution of the time between hospitalisation and ICU admission for hospitalised Individuals, set by default to0.34
,init_all_ages_distribution_type_stay_ICU
, the non-age-dependent distribution of the duration of stay in ICU for Individuals, set by default toLognormal
,init_all_ages_parameter_1_stay_ICU
, the first parameter of the non-age-dependent distribution of the duration of stay in ICU for Individuals, set by default to3.034953
,init_all_ages_parameter_2_stay_ICU
, the second parameter of the non-age-dependent distribution of the duration of stay in ICU for Individuals, set by default to0.34
,init_all_ages_proportion_dead_symptomatic
, the non-age-dependent proportion of symptomatic individuals that will die even if treated, set by default to0.01
,
Testing and mask wearing
The list of the test and mask related parameters presented in Model/Parameters.gaml for human to human transmission dynamics is:
init_all_ages_probability_true_negative
, the non-age-dependent probability of being tested positive when infected, set by default to0.92
,init_all_ages_probability_true_positive
, the non-age-dependent probability of being tested positive when infected, set by default to0.89
,init_all_ages_proportion_wearing_mask
, the non-age-dependent proportion of Individuals wearing mask, set by default to0.0
,init_all_ages_factor_contact_rate_wearing_mask
, the non-age-dependent factor of successful transmission for an infectious individual wearing a mask, set by default to0.5
,
Modifying the parameters in the CSV file
In the CSV file, it is possible to define multiple values for the same parameter taking into account the age of the individuals. In addition, it is possible to not just provide a fixed value, but also a distribution in order to generate different values for each Individual (among Weibull, Gamma, Uniform, Normal and Lognormal distributions). In the case of providing a CSV file to set the parameters, it is still possible to force parameter values (for one experiment, for instance) by using the force_parameters
map, expecting the string ID of a parameters (defined in Model/Constants.gaml
) and a given value. Particular attention should be given to the names of the parameters provided in the CSV file, as they must be exactly the same as the string ID of the parameters. In order to define a parameter as age-depedent, multiple rows in the CSV file must begin with the same parameter string ID, and different age values. The age category will be defined with the lower bound equals to the value of the column age for the given row, and the upper bound the value of the column age for the next row with the same parameter string ID (if there are none, there will be no upper bound, including every Individuals having at least an age equals to the lower bound). The column detail in the CSV is expecting either the string FIXED
(for a parameter not following a distribution), or one of the string ID of the distributions (Weibull
, Gamma
, Normal
, Lognormal
, Uniform
). Parameter_1 is expecting the value of the parameter (if FIXED
) or the first parameter of the gama function for the given distribution. Parameter_2 is used only if Detail is a distribution, and provides the second parameter of the gama function for the given distribution.
All the parameters can have different values for given age categories but:
Transmission_human
(corresponding toallow_transmission_human
inModel/Parameters.gaml
), allowing human to human transmission dynamics,Transmission_building
(corresponding toallow_transmission_building
inModel/Parameters.gaml
), allowing environmental contamination dynamics,Successful_contact_rate_building
(corresponding tosuccessful_contact_rate_building
inModel/Parameters.gaml
), the transmission building to human occuring due to environmental contamination,Basic_viral_decrease
(corresponding tobasic_viral_decrease
inModel/Parameters.gaml
), the decrease over time in viral load in the environment, can’t be age-dependent for obvious reasons.
The default values for the different parameters are stored in the file Parameters/Epidemiological Parameters.csv
.
Synthetic population generation
As mentioned in its ODD description, COMOKIT proposes two ways to generate a synthetic population of agents: either from a csv file where each line corresponds to an individual or using a built-in generator. This paragraph details how both can be parameterized. The following sub-section shows how to customize the built-in generator and how to harmonize file based synthetic records
Built-in generator parameterization
The default generator provided with COMOKIT makes use of a configuration file in order to define key variables of the algorithm. In order to define them according to a case study, it should be created and placed in the case study folder, and named Population parameter.csv
, with a first column corresponding to the parameter name and a second column the provided value. The list below details all the parameters that can be defined in this file with their default value when they are omitted:
male_ratio
: the probability to be a male (1 - male_ratio
gives the probability to be a female). Default is21/41
proba_active_family
: the probability to create a two or more persons household (1 - proba_active_family
being the probability to create a lone individual household). Default is0.95
number_children_mean
: the mean (expected value) number of children per household. Default is2.0
number_children_std
: the standard deviation (variance) to determine the number of children per household. Default is0.5
number_children_max
: the maximum number of children per household. Default is3
proba_grandfather
: the probability to include the grandfather in the household. Default is0.2
proba_grandmother
: the probability to include the grandmother in the household. Default is0.3
retirement_age
: the age threshold for retirement. Default is55
max_age
: the maximum age an agent can have at initialization. Default is100
nb_friends_mean
: the mean (expected value of a Gaussian) number of friends per individual agent. Default is5.0
nb_friends_std
: the standard deviation (variance of a Gaussian) to determine the number of friends per individual. Default is3.0
nb_classmates_mean
: mean number of classmates with which an Individual will have close contact. Default is10.0
nb_classmates_std
: stand deviation of the number of classmates with which an Individual will have close contact. Default is5.0
nb_work_colleagues_mean
mean number of work colleagures with which an Individual will have close contact. Default is5.0
nb_work_colleagues_std
: stand deviation of the number of work colleagures with which an Individual will have close contact. Default is3.0
proba_work_at_home
: the probability to do remote work. Default is0.05
proba_unemployed_M
: the probability for a male individual agent to be unemployed. Default is0.03
proba_unemployed_F
: the probability for a female individual agent to be unemployed. Default is0.03
File based data harmonization
In order to use microdata to initialize the synthetic population of agents in COMOKIT, users should provide attribute names and values for age
, sex
, is_unemployed
and household_id
Individual
attributes. In Version 1.0.1 of COMOKIT, these information must be provided directly in the global
section in GAML. Below are listed the variables that must be overwritten to harmonize the micro-data with COMOKIT:
age
: assigns the name of micro-data age variable toage_var
(i.e.age_var <- "age_var_name_in_csv"
) and the mapping of values inage_map
(e.g.age_map <- ["0-14"::[0,14],"15-45"::[15,45],"46+"::[46,120]]
)sex
: assigns the name of micro-data sex variable tosex_var
(i.e.sex_var <- "sex_var_name_in_csv"
) and the mapping of values insex_map
(e.g.sex_map <- ["man"::0,"woman"::1]
)is_unemployed
: assigns the name of micro-data employement status tounemployed_var
(i.e.unemployed_var <- "employement_var_name_in_csv"
) and the mapping of values inunemployed_map
(e.g.unemployed_map <- ["active"::false,"inactive"::true,"unemployed"::true]
)household_id
: assigns the name of micro-data household identifier tohouseholdID
(i.e.householdID <- "household_id_name_in_csv"
)
Each one has a built-in method to get a default value when data is missing or missencoding. The age will be uniformly drawn in [0,max_age]
(with max_age
being a variable of COMOKIT that can be found in Parameters.gaml
), Gender chosen according to a probability of 0.512 (ratio of 105 males for 100 females) and an unemployed rate of 0.03 (i.e. same rate as in the built-in generator). When no household identifier is provided in the micro-data, COMOKIT builds households like the built-in generator does, except it selects relevant individuals from the sample rather than creating them.
Synthetic agenda
The synthetic agenda generator provided with COMOKIT makes use of two files: a configuration file and an activity file.
Synthetic agenda configuration file
The configuration file is used to define the key variables of the algorithm. In order to define them according to a case study, it should be created and placed in the case study folder, and named Agenda parameter.csv
, with a first column corresponding to the parameter name and a second column the provided value. The list below details all the parameters that can be defined in this file with their default value when they are omitted:
non_working_days
: List of non working days (1: Monday, 7: Sunday). Default is[7]
work_hours_begin_min
: the minimal value for the beginning working hour (between 0 and 23). Default is6
work_hours_begin_max
: the maximal value for the beginning working hour (between 0 and 23). Default is10
work_hours_end_min
: the minimal value for the ending working hour (between 0 and 23). Default is15
work_hours_end_max
: the maximal value for the ending working hour (between 0 and 23). Default is18
school_hours_begin_min
: the minimal value for the beginning school hour (between 0 and 23). Default is7
school_hours_begin_max
: the maximal value for the beginning school hour (between 0 and 23). Default is9
school_hours_end_min
: the minimal value for the ending school hour (between 0 and 23). Default is15
school_hours_end_max
: the maximal value for the ending school hour (between 0 and 23). Default is18
first_act_hour_non_working_min
: For non working day (or people who are not working), minimal hour for the beginning of the first activity (between 0 and 23). Default is7
first_act_hour_non_working_min
: For non working day (or people who are not working), maximal hour for the beginning of the first activity (between 0 and 23). Default is10
lunch_hours_min
: the minimal value for the beginning of the lunch time (between 0 and 23). Default is11
lunch_hours_max
: the maximal value for the beginning of the lunch time (between 0 and 23). Default is13
max_duration_lunch
: the maximal duration (in hours) of the lunch time. Default is2
max_duration_default
: the maximal duration (in hours) of other activities. Default is3
min_age_for_evening_act
: the minimal age of individual to have an activity after school. Default is13
nb_activity_fellows_mean
: the mean number of fellows per activity. Default is3.0
nb_activity_fellows_std
: the standard deviation of the number of fellows per activity. Default is2.0
max_num_activity_for_non_working_day
: the maximal number of activities for non working day. Default is4
max_num_activity_for_unemployed
: the maximal number of activities per day for unemployed individual. Default is3
max_num_activity_for_old_people
: the maximal number of activities per day for retired individual. Default is3
proba_activity_evening
: the probability for people to have an activity after work. Default is0.7
proba_lunch_outside_workplace
: the probability to have lunch outside the working place (home or restaurant). Default is0.5
proba_lunch_at_home
: if lunch taken outside the working place, the probability of having lunch at home. Default is0.5
proba_work_outside
: the probabilty for an individual to work outside the considered area. Default is0.0
proba_go_outside
: the probabilty for an individual to do an activity outside the considered area. Default is0.0
building_neighbors_dist
: the maximal distance (in meter) of the considered neigbhorhood for the “visit to neighbors” activity. Default is500.0
Activity file
The activity file is used to define, according to the Individual’s age and the sex category, the weights of the different types of activity (the higher the weight, the higher the chance to carried out this activity). In order to define it according to a case study, it should be created and placed in the case study folder, and named Activity type weights.csv
.The table below shows an example of file
Age min | Age max | sex | visiting neighbor | visiting friend | eating | shopping | leisure | outside activity | sport | other activity |
---|---|---|---|---|---|---|---|---|---|---|
0 | 10 | 0 | 1.0 | 1.0 | 0.5 | 0.5 | 1.0 | 2.0 | 1.0 | 0.1 |
0 | 10 | 1 | 1.0 | 1.0 | 0.5 | 0.5 | 1.0 | 2.0 | 1.0 | 0.1 |
11 | 18 | 0 | 0.2 | 0.5 | 1.0 | 0.5 | 2.0 | 2.0 | 1.0 | 0.1 |
11 | 18 | 1 | 0.2 | 0.5 | 1.0 | 0.5 | 2.0 | 2.0 | 1.0 | 0.1 |
19 | 60 | 0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 | 1.0 | 1.0 | 1.0 |
19 | 60 | 1 | 1.0 | 1.0 | 1.0 | 2.0 | 2.0 | 1.0 | 1.0 | 1.0 |
61 | 120 | 0 | 2.0 | 2.0 | 0.5 | 1.0 | 0.5 | 0.5 | 0.2 | 2.0 |
61 | 120 | 1 | 2.0 | 2.0 | 0.5 | 2.0 | 0.5 | 0.5 | 0.1 | 2.0 |
To link building types and activities
Synthetic agenda of agents provides time use according to a sequence of activites to be carried out in buildings. In order to choose appropriate building corresponding to their activities, it is necessary to determine case study related building types and to attached these types to an activity.
Building types
COMOKIT provides built-in types of buildings, most of which have been taken from the OSM nomenclature. They are stored in the Paramaeters/Building type per activity type.csv
file, with first column standing for activity type, and following ones to corresponding building types. For example, default provided type of building for staying_at_home
activity are: " "
(empty string), yes
, house
, manor
, apartments
, caravan
, hostel
, home
and Tent Shelter
. For any missing type, i.e. case study buildings.shp
file where home place type is not in the previous list, you may have to add in a new column the corresponding type name. Hence, COMOKIT will be able to link activity types to custom building types.
Building file
The building file is used to define the agent preferences for building types according to age and sex. The higher the weigths are the higher individual agent will tends to carried out activities within this particular type of building. For example, agents can perform a leisure activity in several types of buildings, including cinema, karaoke, coffeshops and many more defined in Building per type activity type.csv
(see Activity to building section). In that respect, the weights attached to building type will determine the corresponding probability to carry out a leisure activity in a particular building. Any missing building type will be assigned a weight equal to 1.0
.
As for the activity file, the building file should be created and placed in a case study folder, and named Building type weights.csv
. The table below show a snippet example of a file content:
Age min | Age max | sex | playground | park | cinema | place_of_worship |
---|---|---|---|---|---|---|
0 | 10 | 0 | 1.0 | 1.0 | 0.5 | 0.5 |
0 | 10 | 1 | 1.0 | 1.0 | 0.5 | 0.5 |
11 | 18 | 0 | 0.2 | 0.5 | 1.0 | 0.5 |
11 | 18 | 1 | 0.2 | 0.5 | 1.0 | 0.5 |
19 | 60 | 0 | 1.0 | 1.0 | 1.0 | 1.0 |
19 | 60 | 1 | 1.0 | 1.0 | 1.0 | 2.0 |
61 | 120 | 0 | 2.0 | 2.0 | 0.5 | 1.0 |
61 | 120 | 1 | 2.0 | 2.0 | 0.5 | 2.0 |
Redefining parameter values
As a general advice, parameters in Parameters.gaml
and variables (e.g. in Global.gaml
) of COMOKIT should be redefined in user models: it is good practice to redefine them in a global
section of a user created GAML file. For example, to redefine the number of infected and recovered individuals at the beginning of a simulation, open or create a new GAML file, import all required files from COMOKIT and assign new values as follow:
global {
int num_infected_init <- 10;
int num_recovered_init <- 500;
}
As a general guideline, do not change the values of parameters directly in the COMOKIT code, but rather in your own model (that imports COMOKIT). This advice is not mandatory at all, and modifying your local version of COMOKIT is functional. However, for the sake of clarity, it is recommended to redefine the parameters/variables of interest for you, in a model specific user created file.
Calibration of the transmission rate value (β)
The parameter β, referred as Successful_contact_rate_human
in the file Epidemiological Parameters.csv
, which stands for the transmission rate over a day of Covid-19 in COMOKIT determines the probability of any infectious agent to infect a susceptible one during a contact. It is splited into β_1 and β_2 with indexes representing transmission rate for symptomatic and asymptomatic respectively (see Complete O.D.D). The parameter Factor_asymptomatic
define the weight applied to β_1 in order to obtain β_2; meaning that we can focus on β as an autonomous parameter (i.e. as it is β_1 from which β_2 is derived from). Its value can be calibrated to reproduce a given R0, using the BetaCalibration.gaml
file. The process is twofold: first it is required to know the actual contact rate within your model and then, the second step makes it possible to elicit the best β value to reproduce a given R0 factor, using an approximation of the contact rate produced in the first step.
The experiment batch_Contact_Tracking
makes use of the No Containment.gaml
model to summarize the contact pattern of agent: how many contacts per days of week and weekends, average and median values. The only required parameter is the duration_xp
float variable that gives the length of one run in simulated time (default is one month). Values are estimated over replications of the model (no containment, unhindered outbreak), that can be changes (12 by default). Beside the number of contacts the experiment also estimates a β (let’s call it β’) value based on average/median contact per step and a given R0: β’ is equal to R0 divided by the length of the infectious period, divided by the number of contact per day. Those values are stored in a csv file COMOKIT/beta_output/Contact_and_Beta.csv
and can be used to further calibrate β
The experiment estimate_beta
makes it possible to further refine the value of β according to contact pattern within COMOKIT. In fact the first computed estimate of β does not take into account the type of agents that are in contacts, although $\beta$ is only significant when an infectious agent meets a susceptible one. The dynamic of the model behind the calibration of β is a very simplified setup with no explicit spatial representation and controled number of contact based on previously estimated value: agents might be in contact with any other agent, pretty much like in a compartement model, and elicit uniformaly a n number of agent to meet during one step of simulation. By default n will be the average number of contact per step computed using the batch_Contact_Tracking
experiment. By the end of one simulation actual reproduction number (called Re) is computed as the mean of the number of infected agent by any infectious agent excluding those that do not spread the desease. The fitness of the calibration process is simply the euclidian distance between R0 and Re.