Keywords - Random forest Regression, K-nearest neighbor, Gradient Boosting, XGBoost, CatBoost, Neural Networks, Multi-Layer Perceptron, Keras, Artificial Neural Networks, Gaussian Process Regression
In this project we model a subset of 80,952 galaxies with total 22 features - 20 numerical (g,r,z,w1,w2, magnitudes,g,r,z fibre magnitudes, delta chi square, extinction, sersic index, colors and shape) and 2 categorical (1. light profile morphology - radial, sersic, exponential, devoculers or point spread function and 2. sky survey region - north or south) to predict their redshift.
We perform some quality cuts and remove objects with any missing feature (NaN and infinity values), spectral type 'STAR' and redshifts below 0 (mostly stars) or above 0.8 (mostly Quasars) from the analysis. After these selection cuts we will be feeding total 76609 objects to the ML algorithms.
We use 4 metrics for the analysis and quantifying the performance and accuracy of models -
Normalized median absolute deviation (NMAD)
Root Mean Square Errors (RMSE)
Bias %
Outliers %
Comments