I am starting to think that the long two-steps process might be the more rigorous one (compared to find a “magic function” that is stable for all regimes: quasi-linear, exponential, s-shaped)

run one by one each gene with two models (40K total runs, that I need to run in parallel maybe with just 2 chains each)
1.1 the simpler exp(beta * X)
1.2 the more complex plateau * inv_logit(beta * X)

compare the models and chose the most parsimonious model for each gene ( LOO might be the right tool?)

Run a hierarchical model including all genes using a regularised_horseshoe on the slopes using 2 groups specific slab_scale

Just to make it clear, the generalised sigmoid is frequently used in the lit and it is not an ad-hoc choice. The model is partially justified from molecular biology, I’ve seen a proof (that I didn’t check) that under OK-ish assumptions the full reaction kinetics of mRNA synthesis (polymerase binding to DNA, transcription start, …) can be approximated as this generalized sigmoid. Assuming a constant mRNA decay you get:

Where k, \alpha and \beta arise from the constants found in the full equations. When you don’t have time-course measurements, people usually assume \frac{dy}{dt} = 0 which gives rise to the linear model @stemangiola is using.

If there is a better model, I’d be super happy to learn about it, but this is what people have been using for quite a while and it is connected to theory - it is not an ad-hoc assumption. I might however have a conflict of interest here, as my boss was probably one of the first people to use this model to describe mRNA synthesis so it is not really up to discussion in my lab :-).

Also, I assume that the reparametrizations we discussed previously (GLM with growth function - problem of indetermiinability when slope = 0) didn’t help - do you have any insights on what was the problem? In the time-series case with a small number of regulators it seems to work OK across all the regimes, but I will hopefully try to extend to the steady-state case in near future, so any insight would be valuable.

One more guess: with this amount of data (and the examples you’ve shown), it wouldn’t be such a stretch to just fit a linear model, possibly with a bit flexible error model (e.g. allow increased variability at the extremes of data). The directions of the associations would be very likely preserved and you’ll get rid of all the identifiability problems :-) I assume you will not be directly interpreting the intercept or plateau parameters anyway… A sensitivity analysis with a subset of the genes that have clear sigmoid/exp shapes could easily show/disprove whether the simplification is reasonable.

Out of interest is this RNA-seq or microarray data?

Could you hierarchical prior for these parameters? It seems now intercept and k don’t have prior at all if they tend to go to infinity?

I may be missing something but, I think in the plots you have shown only max one parameter is not informed by data, and even weak prior should help that it doesn’t go to infinity
1.

_
_/

all parameters are informed by data
2.

_/

Plateau is only partially informed by data, Data informs about the minimum value, and maximum range should come from the prior.
3.

_
/

Baseline is only partially informed by data. Data informs about the maximum baseline value, and minimum range should come from the prior.
4.

_

Baseline and plateu are informed by data. Slope is not informed by data, and the information needs to come from the prior.

@avehtari I agree, indeed with hierarchical prior is what I do (axcept the horseshoe becomes less effective and I could not re-tune the parameters to increase its effect, see below)

I think this might be slope == 0, plateau informed, baseline/intercept not informed and informed -> 0 (in theory) with the Andrew suggestion of putting intercept and slope under the same hierarchical prior (I fail to do it for the horseshoe prior, since has so many parameters)

I want to present you the full model with results so you can have an idea.

Simulated data: 70 genes with 0 change, 30 genes with positive change.

With horseshoe prior and just linear equation exp(beta*X) I am able to normalise the data as expected, and have a good FP/FN rate. This is great.

functions{
vector reg_horseshoe(
vector zb,
vector beta1_z,
real aux1_global ,
real aux2_global,
vector aux1_local ,
vector aux2_local ,
real caux,
real scale_global ,
real slab_scale
) {
int K = rows(zb);
// Horseshoe variables
real tau ; // global shrinkage parameter
vector [ K] lambda ; // local shrinkage parameter
vector [ K] lambda_tilde ; // ’ truncated ’ local shrinkage parameter
real c; // slab scale
// Horseshoe calculation
lambda = aux1_local .* sqrt ( aux2_local );
tau = aux1_global * sqrt ( aux2_global ) * scale_global * 1 ;
c = slab_scale * sqrt ( caux );
lambda_tilde = sqrt ( c ^2 * square ( lambda ) ./ (c ^2 + tau ^2* square ( lambda )) );
return zb .* lambda_tilde * tau ;
}
}
data {
int<lower = 0> G; // all genes
int<lower = 0> T; // tube
int<lower=0> R;
int<lower = 0> y[T, G]; // RNA-seq counts
matrix[T,R] X;
// Horseshoe
real < lower =0 > par_ratio ; // proportion of 0s
real < lower =1 > nu_global ; // degrees of freedom for the half -t prior
real < lower =1 > nu_local ; // degrees of freedom for the half - t priors
real < lower =0 > slab_scale ; // slab scale for the regularized horseshoe
real < lower =0 > slab_df ; // slab degrees of freedom for the regularized
}
transformed data{
real < lower =0 > scale_global = par_ratio / sqrt(1.0 * T); // scale for the half -t prior for tau
}
parameters {
// Linear model
row_vector[G] beta0;
// Horseshoe
vector [ G] beta1_z;
real < lower =0 > aux1_global ;
real < lower =0 > aux2_global ;
vector < lower =0 >[ G] aux1_local ;
vector < lower =0 >[ G] aux2_local ;
real < lower =0 > caux ;
}
transformed parameters {
row_vector[G] beta1 = to_row_vector(reg_horseshoe(beta1_z, beta1_z, aux1_global , aux2_global, aux1_local , aux2_local , caux, scale_global, slab_scale ));
matrix[R, G] beta = append_row(beta0, beta1);
}
model {
// Horseshoe
beta1_z ~ normal (0 , 1);
aux1_local ~ normal (0 , 1);
aux2_local ~ inv_gamma (0.5* nu_local , 0.5* nu_local );
aux1_global ~ normal (0 , 1);
aux2_global ~ inv_gamma (0.5* nu_global , 0.5* nu_global );
caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df );
// Lnear system
beta0 ~ normal(0,5);
sum(beta0) ~ normal(0, 0.01 * G);
for (t in 1:T)
y[t] ~ multinomial( softmax(to_vector(X[t] * beta) ) );
}

Now I would hope that changing the link function for k * inv_logit(beta * X) I obtain similar results, with the sparsity and normalization behaving well.

functions{
vector log_gen_inv_logit(row_vector y, row_vector k) {
return to_vector( k - log1p_exp(- ( y ) ) );
}
vector reg_horseshoe(
vector zb,
vector beta1_z,
real aux1_global ,
real aux2_global,
vector aux1_local ,
vector aux2_local ,
real caux,
real scale_global ,
real slab_scale
) {
int K = rows(zb);
// Horseshoe variables
real tau ; // global shrinkage parameter
vector [ K] lambda ; // local shrinkage parameter
vector [ K] lambda_tilde ; // ’ truncated ’ local shrinkage parameter
real c; // slab scale
// Horseshoe calculation
lambda = aux1_local .* sqrt ( aux2_local );
tau = aux1_global * sqrt ( aux2_global ) * scale_global * 1 ;
c = slab_scale * sqrt ( caux );
lambda_tilde = sqrt ( c ^2 * square ( lambda ) ./ (c ^2 + tau ^2* square ( lambda )) );
return zb .* lambda_tilde * tau ;
}
}
data {
int<lower = 0> G; // all genes
int<lower = 0> T; // tube
int<lower=0> R;
int<lower = 0> y[T, G]; // RNA-seq counts
matrix[T,R] X;
// Horseshoe
real < lower =0 > par_ratio ; // proportion of 0s
real < lower =1 > nu_global ; // degrees of freedom for the half -t prior
real < lower =1 > nu_local ; // degrees of freedom for the half - t priors
real < lower =0 > slab_scale ; // slab scale for the regularized horseshoe
real < lower =0 > slab_df ; // slab degrees of freedom for the regularized
}
transformed data{
real < lower =0 > scale_global = par_ratio / sqrt(1.0 * T); // scale for the half -t prior for tau
}
parameters {
// Linear model
row_vector[G] beta0;
row_vector[G] k;
// Horseshoe
vector [ G] beta1_z;
real < lower =0 > aux1_global ;
real < lower =0 > aux2_global ;
vector < lower =0 >[ G] aux1_local ;
vector < lower =0 >[ G] aux2_local ;
real < lower =0 > caux ;
}
transformed parameters {
row_vector[G] beta1 = to_row_vector(reg_horseshoe(beta1_z, beta1_z, aux1_global , aux2_global, aux1_local , aux2_local , caux, scale_global, slab_scale ));
matrix[R, G] beta = append_row(beta0, beta1);
}
model {
// Horseshoe
beta1_z ~ normal (0 , 1);
aux1_local ~ normal (0 , 1);
aux2_local ~ inv_gamma (0.5* nu_local , 0.5* nu_local );
aux1_global ~ normal (0 , 1);
aux2_global ~ inv_gamma (0.5* nu_global , 0.5* nu_global );
caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df );
// Lnear system
beta0 ~ normal(0,1);
k ~ normal(0,2);
sum(k) ~ normal(0, 0.01 * G);
for (t in 1:T)
y[t] ~ multinomial( softmax( log_gen_inv_logit(X[t] * beta, k) ) );
}

However I loose the normalization property of the horseshoe

But whatever scale I choose for beta[1] (intercept), keeping all the other parameters of the horseshoe in common with the slope (for determinablity of the intercept), I get worse results.

Here pretty much the whole story.

Just to give a laugh, for desperation I started to experiment with the trick of common SD for determinability, applying in obviously the wrong way.

// trick
for(g in 1:G) beta[,g] ~ normal(0,sigma_trick[g]); //beta[2] is already transformed from horseshoe!!
sigma_trick ~ normal(0,.2);

And is the only was I could achieve decent results.

functions{
vector log_gen_inv_logit(row_vector y, row_vector k) {
return to_vector( k - log1p_exp(- ( y ) ) );
}
vector reg_horseshoe(
vector zb,
vector beta1_z,
real aux1_global ,
real aux2_global,
vector aux1_local ,
vector aux2_local ,
real caux,
real scale_global ,
real slab_scale
) {
int K = rows(zb);
// Horseshoe variables
real tau ; // global shrinkage parameter
vector [ K] lambda ; // local shrinkage parameter
vector [ K] lambda_tilde ; // ’ truncated ’ local shrinkage parameter
real c; // slab scale
// Horseshoe calculation
lambda = aux1_local .* sqrt ( aux2_local );
tau = aux1_global * sqrt ( aux2_global ) * scale_global * 1 ;
c = slab_scale * sqrt ( caux );
lambda_tilde = sqrt ( c ^2 * square ( lambda ) ./ (c ^2 + tau ^2* square ( lambda )) );
return zb .* lambda_tilde * tau ;
}
}
data {
int<lower = 0> G; // all genes
int<lower = 0> T; // tube
int<lower=0> R;
int<lower = 0> y[T, G]; // RNA-seq counts
matrix[T,R] X;
// Horseshoe
real < lower =0 > par_ratio ; // proportion of 0s
real < lower =1 > nu_global ; // degrees of freedom for the half -t prior
real < lower =1 > nu_local ; // degrees of freedom for the half - t priors
real < lower =0 > slab_scale ; // slab scale for the regularized horseshoe
real < lower =0 > slab_df ; // slab degrees of freedom for the regularized
}
transformed data{
real < lower =0 > scale_global = par_ratio / sqrt(1.0 * T); // scale for the half -t prior for tau
}
parameters {
// Linear model
row_vector[G] beta0;
row_vector[G] k;
vector<lower=0>[G] sigma_trick;
// Horseshoe
vector [ G] beta1_z;
real < lower =0 > aux1_global ;
real < lower =0 > aux2_global ;
vector < lower =0 >[ G] aux1_local ;
vector < lower =0 >[ G] aux2_local ;
real < lower =0 > caux ;
}
transformed parameters {
row_vector[G] beta1 = to_row_vector(reg_horseshoe(beta1_z, beta1_z, aux1_global , aux2_global, aux1_local , aux2_local , caux, scale_global, slab_scale ));
matrix[R, G] beta = append_row(beta0, beta1);
}
model {
// Horseshoe
beta1_z ~ normal (0 , 1);
aux1_local ~ normal (0 , 1);
aux2_local ~ inv_gamma (0.5* nu_local , 0.5* nu_local );
aux1_global ~ normal (0 , 1);
aux2_global ~ inv_gamma (0.5* nu_global , 0.5* nu_global );
caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df );
// Lnear system
beta0 ~ normal(0,1);
// trick
for(g in 1:G) beta[,g] ~ normal(0,sigma_trick[g]);
sigma_trick ~ normal(0,.2);
k ~ normal(0,2);
sum(k) ~ normal(0, 0.01 * G);
for (t in 1:T)
y[t] ~ multinomial( softmax( log_gen_inv_logit(X[t] * beta, k) ) );
}

looking good but using nonsensical priors on the transformed parameters

A possibility is just that I implemented your model wrong, since I didn’t understand it fully. I observed that even though the plateau and intercept get confounded in my model giving a fix prior on plateau and non a hierarchical one (which is needed because I wrap the regression with softmax) eliminated divergences even with high parameter correlation. Of course the intercept becomes less defined, but when the slope == 0 I can almost ignore it in further analyses; while the plateau won’t be further analysed anyway.

The majority of the experiments are well far from the level of details of kinetics. They are mainly related to disease, and the gene interactions are likely to be much bigger in magnitude than the chemical dynamics of transcription itself. My attempt is just to not loose genes that reach a plateau just because can’t be explained by a linear function

Well the intercept indicates if the change is likely to happen early or late in cancer development. This is another nice thing of using a sigmoid like function. I want to think that a well regularised and penalised regression even more complex than linear is possible, after all is just another degree of freedom.

RNAseq that’s why I can use multinomial based regression

From the previous post I was thinking that the normalization looses efficacy because I have less mass around 0, the slopes are free to float and thus the “0” is less defined

Somehow this happens because I have 3 parameters (slope, intercept and upper_plateau) instead of 2 (slope and intercept). As result the slope might have more freedom to float because intercept and upper_plateau are themself less defined.

So the question might become: what of those three parameters I have to regularise more, and how?

k == upper_plateau needs to have fixed prior and sum to 0 because is the anchor for using the softmax (like an intercept would be in a linear model)

intercept, should have a hierarchical variance prior with its slope, so it goes to 0 when the slope goes to 0. However doing this

Sorry, I was assuming all the time you are inferring transcriptional regulations, I am starting to understand why I might have not made a lot of sense, my bad!

That is interesting as most of the literature I’ve seen uses over-dispersed Poisson-derived distributions for RNA-seq data (this is supported by theory). The most widely used is negative binomial (Poisson-gamma), used for example by DESeq2, but I’ve also seen Poisson-lognormal (https://www.biorxiv.org/content/early/2018/05/31/335695). Are you sure the problem is not with the likelihood? In particular, if you fit a model with Gaussian likelihood with small sd to data simulated from such a model, do the problems still appear?

One more thing that might be worth trying would be to reparametrize as

k = max(observed_target_gene_expression) * k_raw;
k_raw ~ normal(1, 0.5);

This incorporates the idea that you assume the plateu to likely not be more than twice the maximum observed and more likely somewhere close to the observed maximum. There are some alternatives to this, you can work with the mean of the observed expression, and/or use another distribution for k_raw that is centered at 1 (log normal or gamma could make sense).

Yes I am aware, however talking with @Bob_Carpenter he suggested that multinomial is really the distribution (real information = N-1 ) that the sequenced reads come from. This models naturally the different amount of information that patients sequenced deep have more than patients sequenced shallow.

Then all the game with normalization is played inside, with the assumption that most of the genes are non differentially transcribed

I am not sure (I was using negative binomial in the past), however then you have to insert normalization parameters, that would make less of a generative model.

I am trying to establish this method, it is going to be a nice publication and a new route for many studies. If this overlaps with your interest (I haven’t met many geneticists that use Bayesian statistics!) and if you feel like contributing more substantially, I’m also here being open to collaborations ;)

That is something I’d definitely like to hear more about, seems counter-intuitive, but who am I to judge :-).

It does overlap with my interests, however I am not really a geneticist neither a statistician - I switched to bioinformatics from AI 2 years ago and I still have a lot to learn on both the biology and statistical side of things. But I am also open to some collaborations and this looks like an interesting problem. I’ll be definitely happy to talk.

> print(fit1_horseshoe_linear_C2, pars=c("aux2_global", "caux"))
Inference for Stan model: db730f080581b0f57f1cc197d7fd99ce.
4 chains, each with iter=500; warmup=250; thin=1;
post-warmup draws per chain=250, total post-warmup draws=1000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
aux2_global 3.31 0.04 1.12 1.69 2.50 3.12 3.90 6.08 727 1.00
caux 3.36 0.16 3.03 1.23 1.94 2.65 3.77 9.87 376 1.01
Samples were drawn using NUTS(diag_e) at Fri Jun 8 18:15:41 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Strange behaviour - sigmoid-like acquires a really long tail for some reason (slope less defined with essentially overparametrization when the curve really resembles a pure exponential).

> print(fit_horseshoe_sigmoid_C2, pars=c("aux2_global", "caux"))
Inference for Stan model: a2478bbfa33caaf9a18be0f4da559fdf.
4 chains, each with iter=500; warmup=250; thin=1;
post-warmup draws per chain=250, total post-warmup draws=1000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
aux2_global 400.21 15.97 188.54 149.00 270.16 359.45 480.69 875.33 139 1.03
caux 13.81 0.64 9.37 4.63 8.37 11.22 16.75 38.71 213 1.02
Samples were drawn using NUTS(diag_e) at Fri Jun 8 17:59:44 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Chiming in late, and possibly not helpful, but would a GP be of use here? You would lose the explicit parameterization of the curve, but if all you’re interested in is whether there’s change in y of any form over the range of x, that would be straight-forward to compute in the posterior

Plain GP would be too flexible if it’s known that the functions are monotonic and saturating.

With that many genes I suspect there are too many cases where these decision are not easy, and the selection process adds it’s own variance. I would still try to continue with more informative priors, and it’s not a problem if for some genes some parameters are not well identified as long as the functions are well identified and the weak identifiability doesn’t cause computational problems.

What do you mean by normalization?

I’m confused. You have many genes, and you assume some slopes are near 0? Then you set horseshoe prior on slopes? Then you would not have horseshoe but some wide normal distribution for the intercept? Plateu parameter could be modelled as delta so that plateu = intercept + delta. Sorry I didn’t have time to read your model more carefully, but wanted give at least some feedback

Thanks for all the support so far, especially @Bob_Carpenter, @avehtari, @martinmodrak , I think I found what was the problem (until I prove myself wrong, as often happens)!

Essentially the model shows good behavior for a exp-linear link function, the question was why it does not show a good behavior for a generalized logit link function?

\frac{plateau}{1 + e^{-(\alpha + \beta * X)}}

While this equation is perfectly suitable for this data:

in case of an quasi-exponential trend,

the plateau swings to exponentially high values and is not a stable anchor anymore for normalization (like it is the intercept of the exp-linear function). This would not be a problem if I wasn’t using a multinomial(softmax( REGRESSION )) framework, where I need normalization action (see below). The solution is to reparametrize using the real intercept (at x = 0; alpha here is just really the point of inversion), which is directly supported by data even for an exponential-like trend.

Now this function behaves better in all scenarios.

Yes I was just testing the above similar reparametrization, but for some reason it was not improving the stability (for maybe some messy tests I was doing… and Saturday starting over it did work!) :)

With this, and applying the hierarchical prior between the inversion parameter alpha and the slope(s) I got the expected results.

With the first 50 genes having slope = 0 and being centered on 0 even with a softmax wrap (normally these would be read as decreasing, just because the change is not symmetric)

I mean this by normalization.

Without the right sparse assumption (using normal prior on beta) this data would have the wrong inference for 0 slopes (using my multinomial(softmax( REGRESSION )) framework)

I was also suggesting that may not be the most efficient way to code them and that we really needed to figure out what @bgoodri was up to with what he was calling the Lancaster reparameterization. I need a case study that breaks things down to Stan code. I think it will let you replace a bunch of the multinomial stuff with Poissons.

Yes I didn’t forget that suggestion, however (as I was expecting) I don’t have enough background to really understand.

If Ben would like to give an example of the reparametrization using my case, I can share my model.

I leave it directly here, this body of code to create simulated data and model it with Stan (the model is quite long, however most of it it’s just the implementation of the regularised horseshoe).

library(tidyverse)
library(foreach)
library(rstan)
#' Simulate data
#'
#' @param delta_magnitude the magnitude of the slope
#' @param n_genes Number of genes to be simulated
#' @param changing_genes Number of changing genes
#' @return A list
#'
#' @importFrom foreach foreach
#' @importFrom foreach %do%
#'
#' @export
#'
simulate_from_sigmoid = function(delta_magnitude = 5, n_genes = 100, changing_genes = round(n_genes*0.3), hkg = round(n_genes*0.3)){
# Custom sigmoid
inv_logit_gen = function(x, k) k / ( exp( - x ) + 1 )
# number of changing genes
#changing_genes = round(n_genes * prop_canging)
# numbe rof samples
n_samples = 13
# Design matrix
X = model.matrix( ~ sort(runif(n_samples, -1, 1)))
# beta
beta =
data.frame(
"(Intercept)" = rnorm(n_genes, 0, 1),
"slope" = c(
rnorm(hkg, 0, 0.001),
rep(0, n_genes - changing_genes - hkg),
rep( (delta_magnitude), changing_genes)
)
)
# Max expression
k = abs(rnorm(n_genes, 7, 2))
# Calculate baselines
y = t( X %*% t(beta) )
y_sigmoid = apply(y, 2, function(mc) inv_logit_gen(mc, k) )
# Produce tissue samples
y_real = foreach(n = 1:n_samples, .combine=bind_cols) %do% {
tibble( rnbinom(n_genes, mu = exp(y_sigmoid[,n]), size = 100) )
} %>%
setNames(sprintf("s%s", 1:n_samples)) %>%
mutate_if(is.numeric, as.integer)
# produce probability values
y_prob = foreach(n = 1:n_samples, .combine=bind_cols) %do% { y_real[,n]/sum(y_real[,n]) } %>% as_tibble()
# sample from multinomial
y_observed = foreach(n = 1:n_samples, .combine=bind_cols) %do% {
tibble(as.numeric(rmultinom(n = 1, prob = y_prob %>% pull(n), size = n_genes * 100)) )
} %>%
setNames(colnames(y_real)) %>%
mutate_if(is.numeric, as.integer)
list(
G = nrow(y_observed),
T = ncol(y_observed),
F = n_genes - changing_genes,
R = ncol(beta),
y = t(y_observed),
cond = cov,
X = X,
y_prob = y_prob,
k = k,
beta = beta,
y_sigmoid = y_sigmoid,
y_real = y_real
)
}
model = "functions{
// Log of the generalised sigmoid function reparametrised
vector log_gen_inv_logit(row_vector y, vector inversion, vector intercept) {
return intercept + log1p_exp(-inversion) - log1p_exp(- to_vector(y) ) ;
}
// Regularised horseshoe
vector reg_horseshoe(
vector zb,
real aux1_global ,
real aux2_global,
vector aux1_local ,
vector aux2_local ,
real caux,
real scale_global ,
real slab_scale
) {
int K = rows(zb);
// Horseshoe variables
real tau ; // global shrinkage parameter
vector [ K] lambda ; // local shrinkage parameter
vector [ K] lambda_tilde ; // ’ truncated ’ local shrinkage parameter
real c; // slab scale
// Horseshoe calculation
lambda = aux1_local .* sqrt ( aux2_local );
tau = aux1_global * sqrt ( aux2_global ) * scale_global * 1 ;
c = slab_scale * sqrt ( caux );
lambda_tilde = sqrt ( c ^2 * square ( lambda ) ./ (c ^2 + tau ^2* square ( lambda )) );
return zb .* lambda_tilde * tau ;
}
}
data {
int<lower = 0> G; // all genes
int<lower = 0> T; // tubes/samples
int<lower=0> R_1; // number of slopes
int<lower = 0> y[T, G]; // RNA-seq counts
matrix[T,R_1+1] X; // Design matrix including first "intercept" term
// Horseshoe
vector < lower =0 >[R_1] par_ratio ; // proportion of 0s
vector < lower =1 >[R_1] nu_global ; // degrees of freedom for the half -t prior
vector < lower =1 >[R_1] nu_local ; // degrees of freedom for the half - t priors
vector < lower =0 >[R_1] slab_scale ; // slab scale for the regularized horseshoe
vector < lower =0 >[R_1] slab_df; // slab degrees of freedom for the regularized
}
transformed data{
vector < lower =0 >[R_1] scale_global = par_ratio / sqrt(1.0 * T); // scale for the half -t prior for tau
}
parameters {
// Linear model
vector[G] inversion_z; // Point of inversion of the sigmoid
vector[G] intercept; // value at x == 0
vector<lower=0>[G] sigma_trick; //Discourse [quote=\"stijn, post:2, topic:4201\"]
// Horseshoe
vector [ G] beta1_z[R_1];
real < lower =0 > aux1_global[R_1] ;
real < lower =0 > aux2_global[R_1] ;
vector < lower =0 >[ G] aux1_local[R_1] ;
vector < lower =0 >[ G] aux2_local[R_1] ;
real < lower =0 > caux[R_1] ;
}
transformed parameters {
vector[G] beta1[R_1];
vector[G] beta1_trick[R_1];
vector[G] inversion;
matrix[R_1+1, G] beta;
matrix[T, G] y_hat;
// Horseshoe calculation
for(r in 1:R_1)
beta1[r] =
reg_horseshoe(
beta1_z[r],
aux1_global[r] ,
aux2_global[r],
aux1_local[r] ,
aux2_local[r] ,
caux[r],
scale_global[r],
slab_scale[r]
);
// trick //Discourse [quote=\"stijn, post:2, topic:4201\"]
for(r in 1:R_1) beta1_trick[r] = beta1[r] .* sigma_trick * 0.5;
inversion = inversion_z .* sigma_trick * 0.5;
// make beta
beta[1] = to_row_vector(inversion);
for(r in 1:R_1) beta[r+1] = to_row_vector(beta1_trick[r]);
// Matrix multiplication for speed up
y_hat = X * beta;
}
model {
// Horseshoe
for(r in 1:R_1) beta1_z[r] ~ normal (0 , 1);
for(r in 1:R_1) aux1_local[r] ~ normal (0 , 1);
for(r in 1:R_1) aux2_local[r] ~ inv_gamma (0.5* nu_local[r] , 0.5* nu_local[r] );
aux1_global ~ normal (0 , 1);
aux2_global ~ inv_gamma (0.5* nu_global , 0.5* nu_global );
caux ~ inv_gamma (0.5* slab_df , 0.5* slab_df );
// Trick //Discourse [quote=\"stijn, post:2, topic:4201\"]
sigma_trick ~ normal(0,1);
// Linear system
inversion_z ~ normal(0 ,1);
intercept ~ normal(0,5);
sum(intercept) ~ normal(0, 0.01 * G);
// Likelihood
for (t in 1:T) y[t] ~ multinomial( softmax( log_gen_inv_logit(y_hat[t], inversion, intercept) ) );
}
"
data_sigma = simulate_from_sigmoid()
data_sigma$R_1 = data_sigma$R - 1
data_sigma$nu_local = array(1, dim=data_sigma$R_1)
data_sigma$nu_global = array(45, dim=data_sigma$R_1)
data_sigma$par_ratio = array(0.05, dim=data_sigma$R_1)
data_sigma$slab_df = array(4, dim=data_sigma$R_1)
data_sigma$slab_scale = array(5, dim=data_sigma$R_1)
fit= rstan:::stan(
model_code = model,
data= data_sigma,
iter= 500, warmup = 250,
chains = 4,
cores = 4
#,control=list(adapt_delta=0.95, stepsize = 0.05, max_treedepth =15)
)

That was really only for situations where you need to have group-wise intercepts but need to make them block orthogonal to the other parameters in the likelihood. Similar idea to

but I don’t know that it is going to fix this thread.