Surely schedulers aren't this harmful? Don't we have better APIs?_问答_开发者

I'm wondering what APIs are available to avoid the following problem.

Casting my mind back to Operating System lectures on my old CS course, the topic was multiprocess scheduling and concurrent I/O. Here's what the lecturer gave as an example of what would happen:

Two processes, X and Y have some work to do. There's one processor/bus/whatever and the scheduler distributes timeslices between X and Y, naively, as follows:

X gets timeslice 1
Y gets timeslice 2
X gets timeslice 3
...

This was described as being "fair", however it seems to me grossly unfair. Consider two cases under this scheme

If X and Y are both going to take 10 seconds each, now both will take 20 seconds.
If X requires 10 seconds and Y requires 100 seconds, then X will take 20 seconds and Y will take 110 seconds.

If the scheduler was simply "do all of X then all of Y" then in the first case X would take 10 seconds and Y would take 20 seconds; in the second case X would take 10 and y would take 110.

How a system which makes nobody better-off and somebody worse-off be a good idea? The only argument in the "fair" system's favour is that if we did all of Y before any of X then a small job X would be delayed by a large job Y and we need to keep both jobs "responsive".

For the second case, part of me sees the natural "best" way as being to say "X is 10 times smaller, therefore absent any explicit preference, it should get 10 times as many timeslices as Y". (It's a bit like giving pedestrians right of way before cars on the grounds that they put less strain on the roads, but I digress.) Under this scheme, X finishes in 11 seconds and Y finishes in 110 seconds. Real world consequence: my mp3 loads and plays without appreciable extra delay even though a massive file copy is happening in the background.

Obviously there is a whole universe开发者_运维问答 of strategies available and I don't want to argue the suitability of any particular one, my point is this: all such strategies require knowledge of the size of the job.

So, are there OS APIs (Linux, or even Windows) which allow one to specify hints of the amount of work an operation will take?

(NB you could claim disk I/O incorporates this implicitly but while(not_done){read_chunk();} would render it meaningless -- the kind of API I'm thinking of would specify megabytes at file open time, clock cycles at thread creation time, or something along these lines.)

If all tasks represent work that will have no value until they are run to completion, then the best approach is to run all the jobs in some sequence so as to minimize the cost of other things' (or peoples') having to wait for them. In practice, many tasks represent a sequence of operations which may have some individual value, so if two tasks will take ten seconds each, having both tasks be half done at the ten-second mark may be better than having one task completed and one task not even started. This is especially true of tasks are producing data which will be needed by a downstream process which is performed by another machine, and the downstream process will be able to perform useful work any time it has received more data than it has processed. It is also somewhat true if part of the work entails showing a person that something useful is actually happening. A user who watches a progress bar count up over a period of 20 seconds is less likely to get unhappy than one whose progress bar doesn't even budge for ten seconds.

In common operating systems you typically don't care about the delay of the task but you try to maximize the throughput - in 110 seconds will both X and Y be done, period. Of course, some of the processes can be interactive and therefore the OS takes the extra overhead of context switches between processes to keep the illusion of computation in parallel.

As you said, any strategy that should minimalize task's completion time would require to know how long it will take. That's very often a problem to find if the task is more than just copy a file - that's why sometimes the progress bar in some application goes to 99% percent and stays there for a while doing just the few last things.

However, in real-time operating systems you often have to know task's worst case execution time or some deadline until the task must be finished - and then you are obligated to provide such "hint". The scheduler must then do a little bit smarter scheduling (moreover if there are some locks or dependencies included), on multiprocessors is the process sometimes NP-complete (then the scheduler uses some heuristics).

I suggest you read something about RTOSes, Earliest Deadline First scheduling and Rate Monotonic scheduling.

The only argument in the "fair" system's favour is that if we did all of Y before any of X then a small job X would be delayed by a large job Y and we need to keep both jobs "responsive".

That's exactly the rationale. Fair scheduling is fair in that it tends to distribute computing time, and therefore delays, equally among processes asking for it.

So, are there OS APIs (Linux, or even Windows) which allow one to specify hints of the amount of work an operation will take?

Batch systems do this, but, as you concluded yourself, this requires knowledge of the task at hand. Unix/Linux has the nice command which gives a process lower priority; it's a good idea to let any long running, CPU-bound process on a multitasking machine be "nice" so it doesn't hold up short and interactive tasks. ionice does the same for IO priority.

(Also, ever since the early 1970s, Unix schedulers have dynamically raised the priority of processes that do not "eat up" their slices, so interactive processes get high CPU priority and stay responsive without CPU-bound ones holding everything up. See Thompson and Ritchie's early papers on Unix.)