I want to implement Spout support directly in FFmpeg (both sender and receiver).
I had a quick look at the SDK, specifically the SendImage() function, which takes as input the image data, width, and height.
In FFmpeg, the image data width and linesize (or stride, or pitch) might differ. For example, there might be padding at the end of each line (useful for simd optimizations overrunning the end of the width), or the data pointer might point to the start of the last line, and the linesize will be negative.
I could copy the data to an intermediate packed buffer and use that to call SendImage(), but that would slow things down.
Would it be possible to add a SendImage() function with stride to Spout itself?
It depends on whether you are using OpenGL or DirectX.
DirectX would be better if you just need to create a context within FFmpeg. You would use the SpoutDX class with OpenDirecX11() and CloseDirectX11(). There is a Windows example for SendImage.
It would be possible to add an argument at the end of the SendImage function and this value would be passed on to the “pitch” argument of the DirectX “UpdateSubResource” function that updates the shared texture, but it’s not so straightforward for ReceiveImage.
However, there is a “RemovePadding” function in “SpoutCopy” that has options for rgba or rgb data and is optimized to use sse2 depending on the line length and speed could be sufficient.
I’m still wrapping my head around how Spout works. Is it safe to assume that I can hardcode directx usage instead of opengl and that would generally be better? I’ll do some tests with a modified build of spout and see if adding a pitch parameter works.
I’m surprised about the implementation of memcpy_sse2. On Linux, memcpy is usually as optimized as optimized can be. Is this not the case on Windows? memcpy_sse2 seems to be using movdqu instructions, and could be even faster is the alignment is correct and movdqa is used instead.
Another thing, I see that Spout has no synchronization mechanisms. This means it will probably have to be implemented in ffplay instead of ffmpeg (playing back to a shared texture instead of a screen).
DirectX is better if you don’t need to suport OpenGL. It bypasses the GL/DX interop and uses shared textures directly. Use the SpoutDX class and simply call OpenDirectX11() at the start of the session and CloseDirectX11() at the end. The sender example shows the basics.
There is a synchronisation mechanism by way of a mutex so that sender and receiver do not access the shared texture at the same time. Additionally, there is a semaphore that is tested by a receiver to determine when the sender has produced a new frame.
Strict sender/receiver synchronization can be achieved using events but it depends on both the sender and receiver monitoring the events and is probably not appropriate for your project. Here is an OpenGL example to give you an idea.
Timing with the memcpy_sse2 function shows a 1.7X speed improvement of the standard memcpy. This was done a while ago and there could be other implementations now. This library popped up after a quick GitHub search.
Adding a pitch parameter is straightforward. This is untested but should work in theory -
// Send an image
bool SendImage(const unsigned char * pData, unsigned int width, unsigned int height, unsigned int pitch = 0);
bool spoutDX::SendImage(const unsigned char * pData, unsigned int width, unsigned int height, unsigned int pitch)
{
// Quit if no data
if (!pData)
return false;
// Create or update the sender
if (!CheckSender(width, height, m_dwFormat))
return false;
// Line length
unsigned int rowpitch = width*4;
if(pitch > 0)
rowpitch = pitch;
// Check the sender mutex for access the shared texture
if (frame.CheckTextureAccess(m_pSharedTexture)) {
// Update the shared texture resource with the pixel buffer
m_pImmediateContext->UpdateSubresource(m_pSharedTexture, 0, NULL, pData, rowpitch, 0);
// Flush the command queue because the shared texture has been updated on this device
m_pImmediateContext->Flush();
// Signal a new frame while the mutex is locked
frame.SetNewFrame();
// Allow access to the shared texture
frame.AllowTextureAccess(m_pSharedTexture);
}
return true;
}