Calculate error statistics from population model (more memory efficient)

We require bilby to load the population hyperposterior samples and gwpopulation to create the population model

[1]:

import gwpopulation
gwpopulation.set_backend('jax')
import h5py
from bilby.core.result import read_in_result
from bilby.hyper.model import Model
# load in population-error package
from population_error import error_statistics

/home/jack.heinzel/.conda/envs/gwjax311/lib/python3.11/site-packages/bilby/core/likelihood.py:7: UserWarning: A NumPy version >=1.23.5 and <2.3.0 is required for this version of SciPy (detected version 2.3.5)
  from scipy.special import gammaln, xlogy

Load in injection set and posterior set

[2]:

# dictionary of injections, containing arrays of shape (Ninj). Must contain 'prior' key and should also contain 'total_generated'
# adapted from publicly available GW injections at https://zenodo.org/records/7890398
with h5py.File('data/GWTC3_injections.h5') as f:
    injections = {k: f[k][()] for k in f.keys()}

# dictionary of gw samples, containing arrays of shape (Nobs, NPE). Must contain 'prior' key
# adapted from publicly available GW posteriors at https://zenodo.org/records/6513631 and https://zenodo.org/records/8177023
with h5py.File('data/GWTC3_posteriors.h5') as f:
    event_posteriors = {k: f[k][()] for k in f.keys()}

Create gwpopulation model

[3]:

model_list = [
    gwpopulation.models.mass.SinglePeakSmoothedMassDistribution(),
    gwpopulation.models.spin.iid_spin,
    gwpopulation.models.redshift.PowerLawRedshift()
    ]

model_function = Model(model_list, cache=False)

vt_model_list = [
    gwpopulation.models.mass.SinglePeakSmoothedMassDistribution(),
    gwpopulation.models.spin.iid_spin,
    gwpopulation.models.redshift.PowerLawRedshift()
    ]

vt_model_function = Model(vt_model_list, cache=False)

Load in hyperposterior samples

[4]:

# copied from GWTC-3 data release at https://zenodo.org/records/11254021 (analyses_PowerLawPeak.tar.gz)
gwtc3_result = read_in_result('data/o1o2o3_mass_c_iid_mag_iid_tilt_powerlaw_redshift_result.json')
hyperposterior = gwtc3_result.posterior

Calculate error statistics

[5]:

statistics = error_statistics(
    model_function,
    injections,
    event_posteriors,
    hyperposterior,
    vt_model_function=vt_model_function,
    include_likelihood_correction=True,
    conversion_function=gwpopulation.conversions.convert_to_beta_parameters
    )

Nobs not provided, assuming Nobs = 69

Computing single event covariance weights integrated over hyperposterior samples: 100%|██████████| 11469/11469 [00:09<00:00, 1168.25it/s]
Computing selection covariance weights integrated over hyperposterior samples: 100%|██████████| 11469/11469 [00:07<00:00, 1450.43it/s]
For each posterior sample, average single event covariance with another posterior sample: 100%|██████████| 11469/11469 [00:09<00:00, 1210.25it/s]
For each posterior sample, average selection covariance with another posterior sample: 100%|██████████| 11469/11469 [00:07<00:00, 1459.19it/s]


Your inference loses approximately 0.263 bits of information to Monte Carlo approximations.
Of the total information loss
 * 0.255 bits is from uncertainty in the posterior. Of this
    * 11.0% is from the single-event Monte Carlo integration
    * 89.0% is from the selection Monte Carlo integration
 * 0.00827 bits is from bias in the posterior. Of the total bias
    * 3.0% is from the single-event Monte Carlo integration
    * 101.3% is from the selection Monte Carlo integration
    * -4.3% is from correlations in the uncertainty of the single-event and selection MC integrals

Note the uncertainty dominates the total information loss, which is generically true. There is a small additional information loss in the posterior due to bias in the estimate of the likelihood and posterior probability density. In GWTC-3, the systematic uncertainty was mostly due to the selection function estimate.

Note the contribution to the bias from the correlations in the uncertainty of the single-event and selection MC integrals is negative. This is not a bug! Since the bias is a directional quantity, the overall information loss will be positive, but the contributions to the total bias can be in different directions, meaning they will have a relative minus sign.

Rate-explicit likelihood

gwpopulation samples the hyperposterior using the rate-marginalized likelihood, and then can produce samples from the rate posterior in post processing. The ‘rate’ key in the posterior refers to the comoving merger rate density at redshift \(z=0\), whereas population-error expects the total merger number \(N = \int dt_{\rm obs}dz \frac{dV_c}{dz}\frac{1}{1+z}\mathcal{R}(z)\) where \(\mathcal{R}(z)\) is the comoving merger rate density as a function of redshift.

gwpopulation outputs this integral by hyperposterior['surveyed_hypervolume'] * hyperposterior['rate'].

[6]:

hyperposterior['total_merger_number'] = hyperposterior['surveyed_hypervolume']*hyperposterior['rate']

rate_explicit_statistics = error_statistics(
    model_function,
    injections,
    event_posteriors,
    hyperposterior,
    vt_model_function=vt_model_function,
    include_likelihood_correction=True,
    conversion_function=gwpopulation.conversions.convert_to_beta_parameters,
    rate=True,
    rate_key='total_merger_number'
    )

Nobs not provided, assuming Nobs = 69

Computing single event covariance weights integrated over hyperposterior samples: 100%|██████████| 11469/11469 [00:09<00:00, 1273.39it/s]
Computing selection covariance weights integrated over hyperposterior samples: 100%|██████████| 11469/11469 [00:08<00:00, 1350.61it/s]
For each posterior sample, average single event covariance with another posterior sample: 100%|██████████| 11469/11469 [00:09<00:00, 1247.26it/s]
For each posterior sample, average selection covariance with another posterior sample: 100%|██████████| 11469/11469 [00:08<00:00, 1299.13it/s]


Your inference loses approximately 0.268 bits of information to Monte Carlo approximations.
Of the total information loss
 * 0.259 bits is from uncertainty in the posterior. Of this
    * 10.8% is from the single-event Monte Carlo integration
    * 89.2% is from the selection Monte Carlo integration
 * 0.00926 bits is from bias in the posterior. Of the total bias
    * 2.7% is from the single-event Monte Carlo integration
    * 100.5% is from the selection Monte Carlo integration
    * -3.2% is from correlations in the uncertainty of the single-event and selection MC integrals

Note the numbers are very similar to the rate-marginalized likelihood. That should usually be the case! The uncertainty once again dominates the total information loss, and though there is some additional information loss in the posterior due to bias in the estimate of the likelihood and posterior probability density, it is very small.

Some might be concerned about the negative contribution to the bias from the correlations in the uncertainty of the single-event and selection MC integrals is negative. This is not a bug! Since the bias is a directional quantity, the overall information loss will be positive, but the contributions to the total bias can be in different directions, meaning they can (and often do) have a relative minus sign.