{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "843a7445",
   "metadata": {},
   "source": [
    "### Calculate error statistics from population model (more memory efficient)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15a3d4cf",
   "metadata": {},
   "source": [
    "We require bilby to load the population hyperposterior samples and gwpopulation to create the population model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c3d0c6d5",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/jack.heinzel/.conda/envs/gwjax311/lib/python3.11/site-packages/bilby/core/likelihood.py:7: UserWarning: A NumPy version >=1.23.5 and <2.3.0 is required for this version of SciPy (detected version 2.3.5)\n",
      "  from scipy.special import gammaln, xlogy\n"
     ]
    }
   ],
   "source": [
    "import gwpopulation\n",
    "gwpopulation.set_backend('jax')\n",
    "import h5py\n",
    "from bilby.core.result import read_in_result\n",
    "from bilby.hyper.model import Model\n",
    "# load in population-error package\n",
    "from population_error import error_statistics"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5ee09302",
   "metadata": {},
   "source": [
    "Load in injection set and posterior set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "8bfac1c7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# dictionary of injections, containing arrays of shape (Ninj). Must contain 'prior' key and should also contain 'total_generated'\n",
    "# adapted from publicly available GW injections at https://zenodo.org/records/7890398\n",
    "with h5py.File('data/GWTC3_injections.h5') as f:\n",
    "    injections = {k: f[k][()] for k in f.keys()}\n",
    "    \n",
    "# dictionary of gw samples, containing arrays of shape (Nobs, NPE). Must contain 'prior' key\n",
    "# adapted from publicly available GW posteriors at https://zenodo.org/records/6513631 and https://zenodo.org/records/8177023\n",
    "with h5py.File('data/GWTC3_posteriors.h5') as f:\n",
    "    event_posteriors = {k: f[k][()] for k in f.keys()}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b7d0391",
   "metadata": {},
   "source": [
    "Create gwpopulation model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "1f40f685",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_list = [\n",
    "    gwpopulation.models.mass.SinglePeakSmoothedMassDistribution(), \n",
    "    gwpopulation.models.spin.iid_spin,\n",
    "    gwpopulation.models.redshift.PowerLawRedshift()\n",
    "    ]\n",
    "\n",
    "model_function = Model(model_list, cache=False)\n",
    "\n",
    "vt_model_list = [\n",
    "    gwpopulation.models.mass.SinglePeakSmoothedMassDistribution(), \n",
    "    gwpopulation.models.spin.iid_spin,\n",
    "    gwpopulation.models.redshift.PowerLawRedshift()\n",
    "    ]\n",
    "\n",
    "vt_model_function = Model(vt_model_list, cache=False)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93d872c2",
   "metadata": {},
   "source": [
    "Load in hyperposterior samples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "3e84fb32",
   "metadata": {},
   "outputs": [],
   "source": [
    "# copied from GWTC-3 data release at https://zenodo.org/records/11254021 (analyses_PowerLawPeak.tar.gz)\n",
    "gwtc3_result = read_in_result('data/o1o2o3_mass_c_iid_mag_iid_tilt_powerlaw_redshift_result.json')\n",
    "hyperposterior = gwtc3_result.posterior"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23124402",
   "metadata": {},
   "source": [
    "Calculate error statistics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "2d47e4f8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Nobs not provided, assuming Nobs = 69\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Computing single event covariance weights integrated over hyperposterior samples: 100%|██████████| 11469/11469 [00:09<00:00, 1168.25it/s]\n",
      "Computing selection covariance weights integrated over hyperposterior samples: 100%|██████████| 11469/11469 [00:07<00:00, 1450.43it/s]\n",
      "For each posterior sample, average single event covariance with another posterior sample: 100%|██████████| 11469/11469 [00:09<00:00, 1210.25it/s]\n",
      "For each posterior sample, average selection covariance with another posterior sample: 100%|██████████| 11469/11469 [00:07<00:00, 1459.19it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Your inference loses approximately 0.263 bits of information to Monte Carlo approximations.\n",
      "Of the total information loss\n",
      " * 0.255 bits is from uncertainty in the posterior. Of this\n",
      "    * 11.0% is from the single-event Monte Carlo integration\n",
      "    * 89.0% is from the selection Monte Carlo integration\n",
      " * 0.00827 bits is from bias in the posterior. Of the total bias\n",
      "    * 3.0% is from the single-event Monte Carlo integration\n",
      "    * 101.3% is from the selection Monte Carlo integration\n",
      "    * -4.3% is from correlations in the uncertainty of the single-event and selection MC integrals\n"
     ]
    }
   ],
   "source": [
    "statistics = error_statistics(\n",
    "    model_function, \n",
    "    injections, \n",
    "    event_posteriors, \n",
    "    hyperposterior, \n",
    "    vt_model_function=vt_model_function,\n",
    "    include_likelihood_correction=True, \n",
    "    conversion_function=gwpopulation.conversions.convert_to_beta_parameters\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e6e9e38",
   "metadata": {},
   "source": [
    "Note the uncertainty dominates the total information loss, which is generically true. There is a small additional information loss in the posterior due to bias in the estimate of the likelihood and posterior probability density. In GWTC-3, the systematic uncertainty was mostly due to the selection function estimate.\n",
    "\n",
    "Note the contribution to the bias from the correlations in the uncertainty of the single-event and selection MC integrals is negative. This is not a bug! Since the bias is a _directional_ quantity, the overall information loss will be positive, but the contributions to the total bias can be in different directions, meaning they will have a relative minus sign."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6210701",
   "metadata": {},
   "source": [
    "### Rate-explicit likelihood"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0916f93d",
   "metadata": {},
   "source": [
    "gwpopulation samples the hyperposterior using the rate-marginalized likelihood, and then can produce samples from the rate posterior in post processing. The 'rate' key in the posterior refers to the comoving merger rate density at redshift $z=0$, whereas population-error expects the total merger number $N = \\int dt_{\\rm obs}dz \\frac{dV_c}{dz}\\frac{1}{1+z}\\mathcal{R}(z)$ where $\\mathcal{R}(z)$ is the comoving merger rate density as a function of redshift. \n",
    "\n",
    "gwpopulation outputs this integral by `hyperposterior['surveyed_hypervolume'] * hyperposterior['rate']`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "809f1fbb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Nobs not provided, assuming Nobs = 69\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Computing single event covariance weights integrated over hyperposterior samples: 100%|██████████| 11469/11469 [00:09<00:00, 1273.39it/s]\n",
      "Computing selection covariance weights integrated over hyperposterior samples: 100%|██████████| 11469/11469 [00:08<00:00, 1350.61it/s]\n",
      "For each posterior sample, average single event covariance with another posterior sample: 100%|██████████| 11469/11469 [00:09<00:00, 1247.26it/s]\n",
      "For each posterior sample, average selection covariance with another posterior sample: 100%|██████████| 11469/11469 [00:08<00:00, 1299.13it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Your inference loses approximately 0.268 bits of information to Monte Carlo approximations.\n",
      "Of the total information loss\n",
      " * 0.259 bits is from uncertainty in the posterior. Of this\n",
      "    * 10.8% is from the single-event Monte Carlo integration\n",
      "    * 89.2% is from the selection Monte Carlo integration\n",
      " * 0.00926 bits is from bias in the posterior. Of the total bias\n",
      "    * 2.7% is from the single-event Monte Carlo integration\n",
      "    * 100.5% is from the selection Monte Carlo integration\n",
      "    * -3.2% is from correlations in the uncertainty of the single-event and selection MC integrals\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "hyperposterior['total_merger_number'] = hyperposterior['surveyed_hypervolume']*hyperposterior['rate']\n",
    "\n",
    "rate_explicit_statistics = error_statistics(\n",
    "    model_function, \n",
    "    injections, \n",
    "    event_posteriors, \n",
    "    hyperposterior, \n",
    "    vt_model_function=vt_model_function,\n",
    "    include_likelihood_correction=True, \n",
    "    conversion_function=gwpopulation.conversions.convert_to_beta_parameters,\n",
    "    rate=True,\n",
    "    rate_key='total_merger_number'\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1603630a",
   "metadata": {},
   "source": [
    "Note the numbers are very similar to the rate-marginalized likelihood. That should usually be the case! The uncertainty once again dominates the total information loss, and though there is some additional information loss in the posterior due to bias in the estimate of the likelihood and posterior probability density, it is very small.\n",
    "\n",
    "Some might be concerned about the negative contribution to the bias from the correlations in the uncertainty of the single-event and selection MC integrals is negative. This is not a bug! Since the bias is a _directional_ quantity, the overall information loss will be positive, but the contributions to the total bias can be in different directions, meaning they can (and often do) have a relative minus sign."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "gwjax311",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}