Delta Mush in Unreal Engine with the Deformer Graph

Introduction

Analysis of the asset:

Content/ExampleContent/AnimationDeformer/AnimBlueprints/DG_DeltaMushDeformerDemo.uasset

found in the sample scene provided in the content examples (see Epic Games market place). This asset offers an advanced use case of the deformer graph plugin in UE (which is the tech allowing to implement ML deformation by the way)

Algorithm overview

Let's recall how delta mush works:

Precompute:
- At each vertex, compute tangent space, i.e. a local frame coordinate which origin is the vertex and is aligned with the vertex' normal and tangent. Per vertex TBN (tangent, binormal, normal) can be obtained with UVs, or face center.
- Smooth mesh in rest pose (e.g. Laplacian smoothing)
- Compute and store deltas vectors that encodes the difference between the original rest position and the smooth rest position. The per vertex deltas are represented in tangent space (the UE sample does it at runtime so this is an area for improvement)

Runtime:
- Smooth deformed mesh (deformation can be anything, skinning, lattice etc.)
- compute tangent space
- Apply delta vectors to restore details.

Implementation

Setup Graph

Precomputation is done in the setup graph :

A mesh is simply smoothed out in rest pose (uniform Laplacian smoothing with 8 hard coded iterations)

The result is written in a resource buffer SmoothBasePosition

The setter of SmoothBasePosition is not configured to use any particular mutex and write type is simply set to "Write"

Kernel code for one iteration of smoothing:

if (Index >= ReadNumThreads().x) return;

float3 Position = ReadPosition(Index);
float3 PositionSum = Position;

uint MaxConnected = ReadNumConnectedVertices(Index);
uint NumConnected = 0;

for (; NumConnected < MaxConnected; NumConnected++)
{
    uint ConnectedIndex = ReadConnectedVertex(Index, NumConnected);
    if (ConnectedIndex == 0xFFFFFFFF)
    {
        break;
    }
    PositionSum += ReadPosition(ConnectedIndex);
}

float3 SmoothPosition = PositionSum / (float)(NumConnected + 1);

WriteOutPosition(Index, SmoothPosition);

Update graph

Kernel for rigid skinning (only max influence taken into account at each vertex)

// Kernel name: SingleBoneSkin
if (Index >= ReadNumThreads().x) return;

float3 LocalPosition = ReadPosition(Index);
// Unreal sorts bone influence from max to min:
float3x4 BoneMatrix = ReadBoneMatrix(Index, 0);

float3 SkinnedPosition = mul(BoneMatrix, float4(LocalPosition, 1));
WriteOutPosition(Index, SkinnedPosition);

Delta Mush deformation core kernel code:

/*
    Kernel name: DeltaMush
    Execution domain: Vertex
    Group size: 64 1 1 
*/

// Accumulation resource should be NumVertices * 8.
#define ACCUMULATION_BUFFER_NUM_INTS 8

// For a vertex P0 compute a local frame purely based
// on outward triangle edges.
float3x3 GetVertexMatrix(float3 P0, float3 P1, float3 P2)
{
    float3 Edge1 = normalize(P1 - P0);
	float3 Edge2 = normalize(P2 - P0);
	float3 Normal = normalize(cross(Edge1, Edge2));
	float3 Binormal = normalize(cross(Edge1, Normal));
    return float3x3(Edge1,      // [0]
                    Normal,     // [1]
                    Binormal    // [2]
                   );        
}

KERNEL
{
    if (Index >= ReadNumThreads().x) 
        return;
    
    float BlendFactor = ReadBlendFactor();

    
    float3 BasePosition       = ReadBasePosition(Index);       // Rest Pose 
    float3 SmoothBasePosition = ReadSmoothBasePosition(Index); // Rest Pose smoothed 8 times
    float3 SmoothPosition     = ReadSmoothPosition(Index);     // Rigid skinning smoothed 8 times
    float3 Position           = ReadPosition(Index);           // Rigid skinning
    float3 AdjustedPosition   = Position;                      // Deformed mesh / Output


    /*
        We pick two edges in the list of adjacent vertices:

                ConnectedIndex2
                      *  
                     /
                    / (Edge2)
                   /
            Index *----------* ConnectedIndex1
                    (Edge1)
    
    */
    uint ConnectedIndex1 = ReadConnectedVertex(Index, 0);
    uint ConnectedIndex2 = ReadConnectedVertex(Index, 1);
    if (ConnectedIndex1 != 0xFFFFFFFF && 
        ConnectedIndex1 != 0xFFFFFFFF)
    {

        // Ideally we would want to precompute this part (the Delta) 
        // in the SetupGraph
        float3x3 BaseVertexMatrix = GetVertexMatrix(SmoothBasePosition, 
                                                    ReadSmoothBasePosition(ConnectedIndex1), 
                                                    ReadSmoothBasePosition(ConnectedIndex2));

        float3 GlobalDelta = BasePosition - SmoothBasePosition;
        // BaseVertexMatrix is a rowMajor representation so
        // matrix multiplication is from left to right:
        // vec * Matrix
        //
        // However we want to transform by the inverse 
        // BaseVertexMatrix^-1
        // Well, BaseVertexMatrix is an orthogonal matrix by construction
        // So we only need to transpose it.
        // And actually, multiplying from right to left is the same as doing:
        // vec * BaseVertexMatrix^T
        // Which is why we get the local delta with:
        float3 Delta = mul(BaseVertexMatrix, GlobalDelta);


        // This has to be done in the runtime though:
        // We compute the local frame of the animated and smoothed position:
        float3x3 VertexMatrix = GetVertexMatrix(SmoothPosition,     
                                                ReadSmoothPosition(ConnectedIndex1),     
                                                ReadSmoothPosition(ConnectedIndex2));
        // Transform back the local Delta to global coordinates and apply it to 
        // the current position:
        float3 DeltaMushPosition = mul(Delta, VertexMatrix) + SmoothPosition;

        AdjustedPosition = lerp(Position, DeltaMushPosition, BlendFactor);
    }

    float3 DebugDiffColor = abs(Position - AdjustedPosition) / 1;

    WriteOutPosition(Index, AdjustedPosition);

    float4 AColor = float4(DebugDiffColor, length(DebugDiffColor));
    WriteOutColor( Index, saturate(AColor) );
    
    // Clear the accumulation buffer before use.
    for (int Acc = 0; Acc < ACCUMULATION_BUFFER_NUM_INTS; ++Acc)    
        WriteOutAccumulation(Index * ACCUMULATION_BUFFER_NUM_INTS + Acc, 0);
    
}

Compute Normals and tangents

After deformation we need to compute normal from scratch, this is done by computing the normal at each triangle and summing this result to each vertex that belongs to the triangle. When summing to the corresponding vertex, for better accuracy, each normal is weighted by the angle between the outward edges of that vertex. To understand the code it's best to have knowledge on tangent space computation (OpenGl tutorial, small summary) . Polygon Mesh Processing also covers the logic behind weighting normal by the corner angle of a triangle.

To accumulate the resulting normal and tangents we set the resource buffer Accumulation "write type" to "Write Atomic Add":

Portion of the graph to re-compute tangents (I disconnected "NumThreads" pins for visibility):

Kernel ComputePerTriangleNormals

/*
    Execution domain: triangle
*/
// Accumulation resource should be NumVertices * 8.
#define ACCUMULATION_BUFFER_NUM_INTS 8

#define TANGENT_RANGE 0x7fff

struct FVertexData
{
	uint VertexIndex;
	float3 Position;
	float2 UV;
        /*
            If we name:
            - the current vertex 'a'.
            - the current triangle vertices 'a', 'b' and 'c' 
            Then:
            Angle = AngleBetween( Vec(b-a), Vec(c-a) )
        */
	float Angle;
};

struct FTriangleTangentData
{
	float3 TangentX;   // Tangent T
	float3 TangentZ;   // Normal N
	float Orientation; // Binormal orientation (1: B = cross(T, N); -1: cross(N, T))
};

FTriangleTangentData CalculateTriangleTangentData(in FVertexData Corners[3])
{
    FTriangleTangentData Out;
    
    float3 EdgeA = Corners[1].Position - Corners[0].Position;
    float3 EdgeB = Corners[2].Position - Corners[0].Position;
    float3 TriangleNormal = cross(EdgeB, EdgeA);
    
    float3 TangentZ = normalize(TriangleNormal);
    
    float2 UVEdgeA = Corners[1].UV - Corners[0].UV;
    float2 UVEdgeB = Corners[2].UV - Corners[0].UV;
    
    float CP = UVEdgeA.y * UVEdgeB.x - UVEdgeA.x * UVEdgeB.y;
    float3 UnnormalizedTangentX = (EdgeB * UVEdgeA.y - EdgeA * UVEdgeB.y) / CP;
    if (abs(CP) < 0.000001f)
    {
        // Bad UV assignment? Select some other vector that is orthogonal to TangentZ.
        float3 Other = (abs(normalize(TangentZ).z) < 0.5f) ? float3(0, 0, 1) : float3(1, 0, 0);
        UnnormalizedTangentX = cross(Other, TangentZ);
    }
    
    Out.TangentX = normalize(UnnormalizedTangentX);
    Out.TangentZ = TangentZ;
    Out.Orientation = (CP > 0) ? 1 : -1;
    
    return Out;
}

KERNEL
{
    if (Index > ReadNumThreads().x) 
        return;

    uint TriangleIndex = Index;

    // Gather per vertex information for each triangle corner.
    uint CornerId;
    FVertexData Corners[3];
    UNROLL for (CornerId = 0; CornerId < 3; ++CornerId)
    {
    	uint VertexIndex = ReadIndexBuffer(TriangleIndex * 3 + CornerId);
    	Corners[CornerId].VertexIndex = VertexIndex;
    	Corners[CornerId].Position = ReadPosition(VertexIndex);
    	Corners[CornerId].UV = ReadUV(VertexIndex, 0);
    }
    UNROLL for (CornerId = 0; CornerId < 3; ++CornerId)
    {
    	float3 Position = Corners[CornerId].Position;
    	float3 EdgeA = Corners[(CornerId + 1) % 3].Position - Position;
    	float3 EdgeB = Corners[(CornerId + 2) % 3].Position - Position;
    
    	Corners[CornerId].Angle = AngleBetweenVectorsFast(EdgeA, EdgeB);
    }
    
    // Calculate per triangle information.
    FTriangleTangentData TriangleData = CalculateTriangleTangentData(Corners);
    
    // Accumulate the calculated values to each vertex in the output buffer.
    UNROLL for (CornerId = 0; CornerId < 3; ++CornerId)
    {
    	// Weight and convert to integer range [-TANGENT_RANGE*PI, TANGENT_RANGE*PI]
    	float Weight = Corners[CornerId].Angle;
    	int3 IntTangentZ = (int3)(TriangleData.TangentZ * Weight * TANGENT_RANGE);
    	int3 IntTangentX = (int3)(TriangleData.TangentX * Weight * TANGENT_RANGE);
    	int IntOrientation = (int)(TriangleData.Orientation * Weight * TANGENT_RANGE);
    
    	// Accumulate each original vertex.
    	uint VertexIndex = Corners[CornerId].VertexIndex;
    	uint BaseVertexIndex = VertexIndex * ACCUMULATION_BUFFER_NUM_INTS;
    
        // Buffer set to "Atomic Add" so writing automatically 
        // adds value to existing values.
    	WriteAccumulation(BaseVertexIndex + 0, IntTangentZ.x);
    	WriteAccumulation(BaseVertexIndex + 1, IntTangentZ.y);
    	WriteAccumulation(BaseVertexIndex + 2, IntTangentZ.z);
    	WriteAccumulation(BaseVertexIndex + 3, IntTangentX.x);
    	WriteAccumulation(BaseVertexIndex + 4, IntTangentX.y);
    	WriteAccumulation(BaseVertexIndex + 5, IntTangentX.z);
    	WriteAccumulation(BaseVertexIndex + 6, IntOrientation);
    }
}

ComputePerVertexNormals

/*
    Execution domain: vertex
*/
// Accumulation resource should be NumVertices * 8.
#define ACCUMULATION_BUFFER_NUM_INTS 8

KERNEL
{
    if (Index > ReadNumThreads().x) 
        return;

    uint VertexIndex = Index;

    float3 TangentX, TangentZ;
    uint BaseVertexIndex = VertexIndex * ACCUMULATION_BUFFER_NUM_INTS;
    TangentZ.x = (float)ReadAccumulation(BaseVertexIndex + 0);
    TangentZ.y = (float)ReadAccumulation(BaseVertexIndex + 1);
    TangentZ.z = (float)ReadAccumulation(BaseVertexIndex + 2);
    TangentX.x = (float)ReadAccumulation(BaseVertexIndex + 3);
    TangentX.y = (float)ReadAccumulation(BaseVertexIndex + 4);
    TangentX.z = (float)ReadAccumulation(BaseVertexIndex + 5);
    float Orientation = ReadAccumulation(BaseVertexIndex + 6) <= 0 ? -1 : 1;

    // We don't need to rescale because we normalize.
    TangentX = normalize(TangentX);
    TangentZ = normalize(TangentZ);
	
    // gram-schmidt orthonormalization
    TangentX = normalize(TangentX - (dot(TangentX, TangentZ) * TangentZ));

    // No need to orthogonalize after blending as vertex factory will do so later on.
    TangentX = normalize(TangentX);
    TangentZ = normalize(TangentZ);
	
    WriteOutPosition(VertexIndex, ReadPosition(VertexIndex));
    WriteOutTangentX(VertexIndex, float4(TangentX, 0.0f));
    WriteOutTangentZ(VertexIndex, float4(TangentZ, Orientation));
}

Gram-Schmidt orthonormalization visualized

Since 'n' is a unit vector then 'dot(n,t)' is the length of the projection of 't' onto 'n'

Limitation

Geometry must be manifold

Disconnected geometry won't be processed correctly by the smoothing pass:

More advanced smoothing algorithm with robust Laplacian is needed to handle this corner case. Or just fix the model to be water-tight.

Heavy computation

Direct Delta Mush is a much more efficient method which should be considered in case speed is an issue (most likely it will...)

No comments

(optional field, I won't disclose or spam but it's necessary to notify you if I respond to your comment)
All html tags except <b> and <i> will be removed from your comment. You can make links by just typing the url or mail-address.
Anti-spam question: