开发者

Is Python's os.path.join slow?

开发者 https://www.devze.com 2023-02-04 10:52 出处:网络
I\'ve been told os.path.join is horribly slow in python and I should use string concatenation (\'%s/%s\' % (x, y)) instead. Is there really that big a difference and if so how ca开发者_StackOverflown

I've been told os.path.join is horribly slow in python and I should use string concatenation ('%s/%s' % (x, y)) instead. Is there really that big a difference and if so how ca开发者_StackOverflown I track it?


$ python -mtimeit -s 'import os.path' 'os.path.join("/root", "file")'
1000000 loops, best of 3: 1.02 usec per loop
$ python -mtimeit '"/root" + "file"'
10000000 loops, best of 3: 0.0223 usec per loop

So yes, it's nearly 50 times slower. 1 microsecond is still nothing though, so I really wouldn't factor the difference in. Use os.path.join: it's cross-platform, more readable and less bug-prone.

EDIT: Two people have now commented that the import explains the difference. This is not true, as -s is a setup flag thus the import is not factored into the reported runtime. Read the docs.


I don't know who told you not to use it, but they're wrong.

  1. Even if it were slow, it would never be slow to a program-breaking extent. I've never noticed it being remotely slow.
  2. It's key to cross-platform programming. Line separators etc. differ by platform, and os.path.join will always join paths correctly regardless of platform.
  3. Readability. Everyone knows what join is doing. People might have to do a double take for string concatenation for paths.


Also be aware that periods in function calls are known to be slow. Compare:

python -mtimeit -s "import os.path;x=range(10)" "os.path.join(x)"
1000000 loops, best of 3: 0.405 usec per loop

python -mtimeit -s "from os.path import join;x=range(10)" "join(x)"
1000000 loops, best of 3: 0.29 usec per loop

So that's a slowdown of 40% just by having periods in your function invocation syntax.

Curiously, these two are different speeds:

$ python -mtimeit -s "from os.path import sep;join=sep.join;x=map(str,range(10))" "join(x)"
1000000 loops, best of 3: 0.253 usec per loop

$ python -mtimeit -s "from os.path import join;x=map(str,range(10))" "join(x)"
1000000 loops, best of 3: 0.285 usec per loop


It may be nearly 50 times faster, but unless you're doing it in a CPU bound tight inner loop, the speed difference isn't going to matter at all. The portability difference on the other hand will make the difference between whether or not your program can be easily ported to a non-Unix platform or not.

So, please use os.path.join unless you've profiled and discovered that it really is a major impediment to your program's performance.


You should use os.path.join simply for portability.

I don't get the point of comparing os.path.join (which works for any number or parts, on any platform) with something as trivial as string formatting two paths.

To answer the question in the title, "Is Python's os.path.join slow?" you have to at least compare it with a remotely similar function to find out what speed you can expect from a function like this.

As you can see below, compared to a similar function, there is nothing slow about os.path.join:

python -mtimeit -s "x = tuple(map(str, range(10)))" "'/'.join(x)"
1000000 loops, best of 3: 0.26 usec per loop

python -mtimeit -s "from os.path import join;x = tuple(range(10))" "join(x)"
1000000 loops, best of 3: 0.27 usec per loop


python -mtimeit -s "x = tuple(range(3))" "('/%s'*len(x)) % x"
1000000 loops, best of 3: 0.456 usec per loop

python -mtimeit -s "x = tuple(map(str, range(3)))" "'/'.join(x)"
10000000 loops, best of 3: 0.178 usec per loop


In this hot controversy, I dare to propose:

(I know, I know , there is timeit, but I'm not so trained with timeit, and clock() seems to me to be sufficient for the case)

import os
from time import clock

separ = os.sep
ospath = os.path
ospathjoin = os.path.join

A,B,C,D,E,F,G,H = [],[],[],[],[],[],[],[]
n = 1000

for essays in xrange(100):

    te = clock()
    for i in xrange(n):
        xa = os.path.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
    A.append(clock()-te)


    te = clock()
    for i in xrange(n):
        xb = ospath.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
    B.append(clock()-te)


    te = clock()
    for i in xrange(n):
        xc = ospathjoin('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
    C.append(clock()-te)


    te = clock()
    for i in xrange(n):
        xd = 'C:\WINNT\system32'+os.sep+'Microsoft\Crypto'+os.sep+'RSA\MachineKeys'
    D.append(clock()-te)


    te = clock()
    for i in xrange(n):
        xe = '%s\\%s\\%s' % ('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')
    E.append(clock()-te)


    te = clock()
    for i in xrange(n):
        xf = 'C:\WINNT\system32'+separ+'Microsoft\Crypto'+separ+'RSA\MachineKeys'
    F.append(clock()-te)


    te = clock()
    for i in xrange(n):
        xg = os.sep.join(('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys'))
    G.append(clock()-te)


    te = clock()
    for i in xrange(n):
        xh = separ.join(('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys'))
    H.append(clock()-te)

print min(A), "os.path.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(B), "ospath.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(C), "ospathjoin('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(D), "'C:\WINNT\system32'+os.sep+'Microsoft\Crypto'+os.sep+'RSA\MachineKeys'"
print min(E), "'%s\\%s\\%s' % ('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(F), "'C:\WINNT\system32'+separ+'Microsoft\Crypto'+separ+'RSA\MachineKeys'"
print min(G), "os.sep.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print min(H), "separ.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')"
print 'xa==xb==xc==xd==xe==xf==xg==xh==',xa==xb==xc==xd==xe==xf==xg==xh

result

0.0284533369465 os.path.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')

0.0277652606686 ospath.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')

0.0272489939364 ospathjoin('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')

0.00398598145854 'C:\WINNT\system32'+os.sep+'Microsoft\Crypto'+os.sep+'RSA\MachineKeys'

0.00375075603184 '%s\%s\%s' % ('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')

0.00330824168994 'C:\WINNT\system32'+separ+'Microsoft\Crypto'+separ+'RSA\MachineKeys'

0.00292467338726 os.sep.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')

0.00261401937956 separ.join('C:\WINNT\system32','Microsoft\Crypto','RSA\MachineKeys')

True

with

separ = os.sep

ospath = os.path

ospathjoin = os.path.join


Everyone sholud know one inevident feature of os.path.join()

os.path.join( 'a', 'b' ) == 'a/b'
os.path.join( 'a', '/b' ) == '/b'
0

精彩评论

暂无评论...
验证码 换一张
取 消