$\begingroup$

This is probably too broad, but is worth asking: Assuming an unknown distribution (from which you would like to sample), is there any benefit in looking at the gradients of the joint with respect to each parameter? What does each partial derivative even mean?

The reason I am looking at this is to somehow use the derivatives to initialize the Gibbs sampler of those parameters properly. I was hoping to somehow move close to the high density area of the distribution and initialize the parameters accordingly (a bit of warm start for the sampler), instead of randomly initializing them. But I have no idea if this even makes sense.