WineHQ
Bug Tracking Database – Bug 57442

 Bugzilla

 

Last modified: 2024-12-13 21:36:47 UTC  

Several applications: abnormal input delay with Wine

Bug 57442 - Several applications: abnormal input delay with Wine
Several applications: abnormal input delay with Wine
Status: CLOSED FIXED
AppDB: Show Apps affected by this bug
Product: Wine
Classification: Unclassified
Component: win32u
9.21
x86-64 Linux
: P2 enhancement
: ---
Assigned To: Mr. Bugs
: regression
Depends on:
Blocks:
  Show dependency tree
 
Reported: 2024-11-21 19:55 UTC by ksmnvsg
Modified: 2024-12-13 21:36 UTC (History)
2 users (show)

See Also:
Regression SHA1: 54ca1ab607d3ff22a1f57a9561430f64c75f0916
Fixed by SHA1: b5a4c2f64ad07b0aaeddc2d8245bc79ddb33b1f5
Distribution: ---
Staged patchset:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ksmnvsg 2024-11-21 19:55:11 UTC
I've noticed a slight input delay in one Unreal Engine sample and decided to test things with my custom application (very simple C++ program that uses SDL2 to poll mouse events and log them). I built this application on Linux with g++, and cross-compiled it for Windows with mingW. Native Linux built has around 1.05 ms input delay, while Windows build with Wine 9.21 has 9.5 ms input lag. I believe you can build any application that polls events and test it for yourself, as I observed the same input lag in Unreal Engine samples (though not so severe, it's bounded by the Game Thread time)

The way I tested things is a bit complicated, but it's necessary for my goal: I used Sunshine as a server and Moonlight as a client, and logged when Sunshine receives the input from Moonlight. We can think of it as kernel receiving events since Sunshine creates virtual device and sends input there. Then I logged when application receives the event.

I checked the source code, and I believe the new throttling mechanism introduced in 9.13 version for input is the reason why it's so slow. If you re-implement the previous mechanism based on message count, input delay is almost 0. I wonder if CPU usage would spike though in real games.
Comment 1 William Horvath 2024-12-09 08:35:23 UTC
Regression commit:
54ca1ab607d3ff22a1f57a9561430f64c75f0916 "win32u: Simplify the logic for driver messages polling."

I noticed this issue from a PeekMessage event loop like this:

    while (running)
    {
        while (PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))
        {
            if (msg.message == WM_QUIT) { running = false; break; }
            TranslateMessage(&msg);
            DispatchMessage(&msg);
        }
        if (running)
        {
            QueryPerformanceCounter(&current_time);
            double elapsed = get_ms_delta(last_check, current_time);
            if (elapsed >= 0.1) /* 10kHz */
            {
                process_key_state_changes();
                last_check = current_time;
            }
            Sleep(0);
        }
    }

If driver events (e.g. mouse motion, keypresses) are sent at a freq. greater than 1000hz, the processing loop above only receives them in ~10ms intervals on average, because most seem to be entirely dropped. If the events are sent at a freq. less than or equal to 1000hz, then there's barely any added delay (~0.1ms), and all of them are received, same as before the regression commit.

I only tested with `GetKeyboardState` + an external program sending X11 key events at a constant rate, but reverting the mentioned commit allows even 0.25ms keypress intervals (8000hz) to all be received with minimal delay.
Comment 2 Rémi Bernon 2024-12-09 10:17:29 UTC
The issue is probably coming from the X11 driver throttle that is in place, and only allow peeking into X11 events every ms. When events are received faster, the X11 driver will start queuing them, and eventually merge inputs events together before sending the input through wineserver to the application.

The input event merging is probably also adding more effective latency, but removing it without checking for X11 events more often, will simply cause batches of events to be sent every 1ms, and won't make much difference.

Removing the throttle will cause applications that use a pattern like this to storm X11 with XCheckIfEvent calls for nothing. This has been causing heavy system loads and we want to prevent that. We could allow slightly more frequent calls, but I don't think we can remove the throttle and we could need a different user driver design instead [*].

[*] I had the idea to move the X11 input event polling elsewhere, possibly in wineserver, but this would be a very large architectural change.
Comment 3 ksmnvsg 2024-12-09 14:49:34 UTC
(In reply to Rémi Bernon from comment #2)
> The issue is probably coming from the X11 driver throttle that is in place,
> and only allow peeking into X11 events every ms. When events are received
> faster, the X11 driver will start queuing them, and eventually merge inputs
> events together before sending the input through wineserver to the
> application.
> 
> The input event merging is probably also adding more effective latency, but
> removing it without checking for X11 events more often, will simply cause
> batches of events to be sent every 1ms, and won't make much difference.
> 
> Removing the throttle will cause applications that use a pattern like this
> to storm X11 with XCheckIfEvent calls for nothing. This has been causing
> heavy system loads and we want to prevent that. We could allow slightly more
> frequent calls, but I don't think we can remove the throttle and we could
> need a different user driver design instead [*].
> 
> [*] I had the idea to move the X11 input event polling elsewhere, possibly
> in wineserver, but this would be a very large architectural change.

I've replaced the throttling method based off ticking with the one that Wine had before 9.13 version (based off number of messages) and this latency is gone. It's hard to measure if my CPU usage increased since my application is pretty small, but it didn't skyrocket at least, although it's better to test it with a CPU-heavy game. 

I also wanted to look into X11 code, but then I figured even if I do find something, rebuilding X11 it would be a nightmare for me.
Comment 4 William Horvath 2024-12-09 17:13:45 UTC
(In reply to ksmnvsg from comment #3)
> I've replaced the throttling method based off ticking with the one that Wine
> had before 9.13 version (based off number of messages) and this latency is
> gone.

If I understand Rémi correctly, the commit that simplified the driver event check also had the same queuing/latency issues, it just started to present itself in a different way.

Using a higher frequency 4000hz counter (e.g. with `NtQueryPerformanceCounter`) instead of the 1000hz `NtGetTickCount` in the commit, the consistent measurements from the old impl. were restored in my test until I started sending events at >4000hz, which makes sense.

I think a small increase in the frequency like this would go a long way in accommodating quite a few applications/setups where this sort of behavior is noticeable; but in any case, it doesn't look like an easy problem to fix properly.
Comment 5 ksmnvsg 2024-12-09 17:23:40 UTC
(In reply to William Horvath from comment #4)

Sorry, I'm fairly new to all of this and don't quite understand what you mean. Are you saying this latency issue is still there regardless of throttling method on Wine side, and it's X11 driver's throttling that is to blame?
Comment 6 Rémi Bernon 2024-12-09 17:29:53 UTC
We cannot really use the message count as a throttle anymore because of how peeking for messages have been optimized. Restoring it is pretty much the same as removing the throttle entirely, and it puts the load on the X server. 
It's maybe that visible, but it definitely adds some load, and makes a difference in various cases.

The calls to peek_message are now extremely fast, as in the most common case it's just about checking for bits in shared memory with wineserver, and the 200 message count (actually more a peek_message call counter) throttle isn't very effective.

That limit would need to be increased accordingly, but using a time based throttle is more deterministic, and yes NtQueryPerformanceCounter could be an option.

The proper way to support such high-frequency X11 input events is to wait for them instead of doing that polling we do, but that requires another large architectural redesign.

The input events have to go through wineserver, and with very high frequency input, receiving them in the application process just to route them through wineserver is inneficient, which is why I think it should instead be done in wineserver. There, they could be waited for, and it would save a lot of IPCs (replacing X -> app -> wineserver -> app, with X -> wineserver -> app).
Comment 7 ksmnvsg 2024-12-09 17:56:35 UTC
(In reply to Rémi Bernon from comment #6)
I see, that makes sense. I'm working on a project that requires a very low input latency and potentially a very high response speed, so I'm trying my best to optimize whatever I can optimize, but it's hard to measure potential drawbacks like CPU usage. I can't control the applications (for my testing I just mimicked a potential application), so my only options are either redesigning Wine to use interruption instead of polling, or reduce the number of IPCs, right? 

Could you also please explain how input is rerouted from application to wineserver, and then back to the application? I didn't really understand why or how, and to be fair I don't really understand what wineserver does to begin with.
Comment 8 Rémi Bernon 2024-12-09 18:32:42 UTC
You can see wineserver as the core of Wine NT kernel implementation, it implements various bits that need to work across processes, although the delimitation is not very well defined.

Input that is received from X server is sent through wineserver because it implements the Win32 hardware input dispatching logic, that might differ from the host dispatching, and because there's plenty of things that can and is supposed to happen to an input, in relation to other processes, like hooks, capture, or rawinput.

This is done through NtUserSendHardwareMessage calls, which is a generic function to send "hardware" input to be processed and dispatched. Later on, the input is going to be received back by applications through the expected Win32 calls, which can be window messages (through NtUserPeekMessage / NtUserGetMessage), rawinput buffers (NtUserGetRawinputData / NtUserGetRawinputBuffer), low level hooks (through the ll-hook registered procedures), etc...
Comment 9 ksmnvsg 2024-12-09 18:39:53 UTC
(In reply to Rémi Bernon from comment #8)
I think I got it. So, if I have an application with a separate thread made specifically for input polling (so its input polling rate will be very high), the only way to reduce latency without adding CPU usage is by putting input handling in wineserver side, and use interrupts there? This sounds like a lot of work.
Comment 10 Rémi Bernon 2024-12-12 22:23:17 UTC
Should be fixed after b5a4c2f64ad07b0aaeddc2d8245bc79ddb33b1f5
Comment 11 Alexandre Julliard 2024-12-13 21:36:47 UTC
Closing bugs fixed in 10.0-rc2.


Privacy Policy
If you have a privacy inquiry regarding this site, please write to [email protected]

Hosted By CodeWeavers