3/29/2025 at 11:19:30 PM
If you want to get handy with matrix calculus, the real prerequisite is being comfortable with Taylor expansions and linear algebra.In a graduate numerical optimization class I took over a decade ago, the professor spent 10 minutes on the first day deriving some matrix calculus identity by working out the expressions for partial derivatives using simple calculus rules and a lot of manual labor. Then, as the class was winding up, he joked and said "just kidding, don't do that... here's how we can do this with a Taylor expansion", and proceeded to derive the same identity in what felt like 30 seconds.
Also, don't forget the Jacobian and gradient aren't the same thing!
by sfpotter
3/30/2025 at 1:17:55 AM
> Also, don't forget the Jacobian and gradient aren't the same thing!Every gradient is a Jacobian but not every Jacobian is a gradient.
If you have a map f from R^n to R^m then the Jacobian at a point x is an m x n matrix which linearly approximates f at x. If m = 1 (namely if f is a scalar function) then the Jacobian is exactly the gradient.
If you already know about gradients (e.g. from physics or ML) and can't quite wrap your head around the Jacobian, the following might help (it's how I first got to understand Jacobians better):
1. write your function f from R^n to R^m as m scalar functions f_1, ..., f_m, namely f(x) = (f_1(x), ..., f_m(x))
2. take the gradient of f_i for each i
3. make an m x n matrix where the i-th row is the gradient of f_i
The matrix you build in step 3 is precisely the Jacobian. This is obvious if you know the definition and it's not a mathematically remarkable fact but for me at least it was useful to demystify the whole thing.
by C-x_C-f
3/30/2025 at 2:17:44 AM
For m = 1, the gradient is a "vector" (a column vector). The Jacobian is a functional/a linear map (a row vector, dual to a column vector). They're transposes of one another. For m > 1, I would normally just define the Jacobian as a linear map in the usual way and define the gradient to be its transpose. Remember that these are all just definitions at the end of the day and a little bit arbitrary.by sfpotter
3/30/2025 at 1:15:36 PM
I'd say a gradient is usually a covector / one-form. It's a map from vector directions to a scalar change. ie. df = f_x dx + f_y dy is what you can actually compute without a metric; it's in T*M, not TM. If you have a direction vector (e.g. 2 d/dx), you can get from there to a scalar.by oddthink
3/30/2025 at 3:00:06 PM
I'm not a big Riemannian geometry buff, but I took a look at the definition in Do Carmo's book and it appears that "grad f" actually lies in TM, consistent with what I said above. Would love to learn more if I've got this mixed up.This would be nice, because it would generalize the "gradient" from vector calculus, which is clearly and unambiguously a vector.
by sfpotter
3/30/2025 at 7:25:35 PM
It's probably just a notation/definition issue. I'm not sure if "grad f" is 100% consistently definedI'm a simple-minded physicist. I just know if you apply the same coordinate transformation to the gradient and to the displacement vector, you get the wrong answer.
My usual reference is Schutz's Geometrical Methods of Mathematical Physics, and he defines the gradient as df, but other sources call that the "differential" and say the gradient is what you get if you use the metric to raise the indices of df.
But that raised-index gradient (i.e. g(df)), is weird and non-physical. It doesn't behave properly under coordinate transformations. So I'm not sure why folks use that definition.
You can see difference by looking at the differential in polar coordinates. If you have f=x+y, then df=dx+dy=(cos th + sin th)dr + r(cos th - sin th)d th. If you pretend this is instead a vector and transform it, you'd get "df"=(cos th + sin th)dr + (1/r)(cos th - sin th)d th, which just gives the wrong answer.
To be specific, if v=(1,1) in cartesian (ex,ey), then df(v)=2. But (1,1) in cartesian is (1,1/r) in polar (er, etheta). The "proper" df still gives 2, but the "weird metric one" gives 1+1/r^2, since you get the 1/r factor twice, instead of a 1/r and a balancing r.
by oddthink
3/31/2025 at 6:27:35 PM
And I'm just a simple applied mathematician. For me, the gradient is the vector that points in the direction of steepest increase of a scalar field, and the Jacobian (or indeed, "differential") is the linear map in the Taylor expansion. I'll be curious to take a look at your reference: looks like a good one, and I'm definitely interested in seeing what the physicist's perspective is. Thanks!by sfpotter
3/30/2025 at 12:40:40 AM
Can you give an example?by edflsafoiewq
3/30/2025 at 6:40:05 AM
If you mean for how to use Taylor expansions and linear algebra, here's one I just made up.Let's say I want to differentiate tr(X^T X), tr is the trace, X is a matrix, and X^T is its transpose. Expand:
tr((X + dX)^T (X + dX)) = tr(X^T X) + 2 tr(X^T dX) + tr(dX^T dX).
Our knowledge of linear algebra tells us that tr is a linear map. Hence, dX -> 2 tr(X^T dX) is the linear mapping corresponding to the Jacobian of tr(X^T X). With a little more work we could figure out how to write it as a matrix.
by sfpotter
3/30/2025 at 6:21:35 AM
https://math.stackexchange.com/questions/3680708/what-is-the-difference-between-the-jacobian-hessian-and-the-gradient
https://carmencincotti.com/2022-08-15/the-jacobian-vs-the-hessian-vs-the-gradient/
by godelski
3/30/2025 at 5:59:59 AM
Check out this classic from 3b1b - How (and why) to raise e to the power of a matrix: https://youtu.be/O85OWBJ2ayoby vismit2000
3/30/2025 at 10:33:51 AM
For those who prefer reading (I’ve not seen the video, but it seems related):https://sassafras13.github.io/MatrixExps/
“Thanks to a fabulous video by 3Blue1Brown [1], I am going to present some of the basic concepts behind matrix exponentials and why they are useful in robotics when we are writing down the kinematics and dynamics of a robot.”
by kgwgk
3/30/2025 at 3:41:13 PM
They didn't show how to actually do it using matrix decomposition!by esafak