My python program executes faster than my java ver

2020-07-02 10:01发布

Update: 2009-05-29

Thanks for all the suggestions and advice. I used your suggestions to make my production code execute 2.5 times faster on average than my best result a couple of days ago. In the end I was able to make the java code the fastest.

Lessons:

  • My example code below shows the insertion of primitive ints but the production code is actually storing strings (my bad). When I corrected that the python execution time went from 2.8 seconds to 9.6. So right off the bat, the java was actually faster when storing objects.

  • But it doesn't stop there. I had been executing the java program as follows:

    java -Xmx1024m SpeedTest

But if you set the initial heap size as follows you get a huge improvement:

java -Xms1024m -Xmx1024m SpeedTest

This simple change reduced the execution time by more than 50%. So the final result for my SpeedTest is python 9.6 seconds. Java 6.5 seconds.

Original Question:

I had the following python code:

import time
import sys

def main(args):    
    iterations = 10000000
    counts = set()
    startTime = time.time();    
    for i in range(0, iterations):
        counts.add(i)
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)

if __name__ == "__main__":
    main(sys.argv)

And it executed in about 3.3 seconds on my machine but I wanted to make it faster so I decided to program it in java. I assumed that because java is compiled and is generally considered to be faster than python I would see some big paybacks.

Here is the java code:

import java.util.*;
class SpeedTest
{    
    public static void main(String[] args)
    {        
        long startTime;
        long totalTime;
        int iterations = 10000000;
        HashSet counts = new HashSet((2*iterations), 0.75f);

        startTime = System.currentTimeMillis();
        for(int i=0; i<iterations; i++)
        {
            counts.add(i);
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
        System.out.println(counts.size());
    }
}

So this java code does basically the same thing as the python code. But it executed in 8.3 seconds instead of 3.3.

I have extracted this simple example from a real-world example to simplify things. The critical element is that I have (set or hashSet) that ends up with a lot of members much like the example.

Here are my questions:

  1. How come my python implementation is faster than my java implementation?

  2. Is there a better data structure to use than the hashSet (java) to hold a unique collection?

  3. What would make the python implementation faster?

  4. What would make the java implementation faster?

UPDATE:

Thanks to all who have contributed so far. Please allow me to add some details.

I have not included my production code because it is quite complex. And would generate a lot of distraction. The case I present above is the most simplified possible. By that I mean that the java put call seems to be much slower than the python set`s add().

The java implementation of the production code is also about 2.5 - 3 times slower than the python version -- just like the above.

I am not concerned about vm warmup or startup overhead. I just want to compare the code from my startTime to my totalTime. Please do not concern yourselves with other matters.

I initialized the hashset with more than enough buckets so that it should never have to rehash. (I will always know ahead of time how many elements the collection will ultimately contain.) I suppose one could argue that I should have initialized it to iterations/0.75. But if you try it you will see that execution time is not significantly impacted.

I set Xmx1024m for those that were curious (my machine has 4GB of ram).

I am using java version: Java(TM) SE Runtime Environment (build 1.6.0_13-b03).

In the production version of I am storing a string (2-15 chars) in the hashSet so I cannot use primitives, although that is an interesting case.

I have run the code many, many times. I have very high confidence that the python code is between 2.5 and 3 times faster than the java code.

20条回答
走好不送
2楼-- · 2020-07-02 10:47

There's a number of issues here which I'd like to bring together.

First if it's a program that you are only going to run once, does it matter it takes an extra few seconds?

Secondly, this is just one microbenchmarks. Microbenchmarks are pointless for comparing performance.

Startup has a number of issues.

The Java runtime is much bigger than Python so takes longer to load from disk and takes up more memory which may be important if you are swapping.

If you haven't set -Xms you may be running the GC only to resize the heap. Might as well have the heap properly sized at the start.

It is true that Java starts off interpreting and then compiles. Around 1,500 iterations for Sun client [C1] Hotspot and 10,000 for server [C2]. Server Hotspot will give you better performance eventually, but take more memory. We may see client Hotspot use server for very frequently executed code for best of both worlds. However, this should not usually be a question of seconds.

Most importantly you may be creating two objects per iteration. For most code, you wouldn't be creating these tiny objects for such a proportion of the execution. TreeSet may be better on number of objects score, with 6u14 and Harmony getting even better.

Python may possibly be winning by storing small integer objects in references instead of actually have an object. That is undoubtably a good optimisation.

A problem with a lot of benchmarks is you are mixing a lot of different code up in one method. You wouldn't write code you cared about like that, would you? So why are you attempting to performance test which is unlike code you would actually like to run fast?

Better data structure: Something like BitSet would seem to make sense (although that has ynchronisation on it, which may or may not impact performance).

查看更多
虎瘦雄心在
3楼-- · 2020-07-02 10:48

Well, if you're going to tune the Java program, you might as well tune the Python program too.

>>> import timeit
>>> timeit.Timer('x = set()\nfor i in range(10000000):\n    x.add(i)').repeat(3, 1)
[2.1174559593200684, 2.0019571781158447, 1.9973630905151367]
>>> timeit.Timer('x = set()\nfor i in xrange(10000000):\n    x.add(i)').repeat(3, 1)
[1.8742368221282959, 1.8714439868927002, 1.869229793548584]
>>> timeit.Timer('x = set(xrange(10000000))').repeat(3, 1)
[0.74582195281982422, 0.73061800003051758, 0.73396801948547363]

Just using xrange makes it about 8% faster on my machine. And the expression set(xrange(10000000)) builds exactly the same set, but 2.5x faster (from 1.87 seconds to 0.74).

I like how tuning a Python program makes it shorter. :) But Java can do the same trick. As everyone knows, if you want a dense set of smallish integers in Java, you don't use a hash table. You use a java.util.BitSet!

BitSet bits = new BitSet(iterations);

startTime = System.currentTimeMillis();
bits.set(0, iterations, true);
totalTime = System.currentTimeMillis() - startTime;
System.out.println("TOTAL TIME = "+( totalTime/1000f) );
System.out.println(bits.cardinality());

That should be fairly quick. Unfortunately I don't have the time to test it right now.

查看更多
smile是对你的礼貌
4楼-- · 2020-07-02 10:51

I'd like to dispel a couple myths I saw in the answers:

Java is compiled, yes, to bytecode but ultimately to native code in most runtime environments. People who say C is inherently faster aren't telling the entire story, I could make a case that byte compiled languages are inherently faster because the JIT compiler can make machine-specific optimizations that are unavailable to way-ahead-of-time compilers.

A number of things that could make the differences are:

  • Python's hash tables and sets are the most heavily optimized objects in Python, and Python's hash function is designed to return similar results for similar inputs: hashing an integer just returns the integer, guaranteeing that you will NEVER see a collision in a hash table of consecutive integers in Python.
  • A secondary effect of the above is that the Python code will have high locality of reference as you'll be accessing the hash table in sequence.
  • Java does some fancy boxing and unboxing of integers when you add them to collections. On the bonus side, this makes arithmetic way faster in Java than Python (as long as you stay away from bignums) but on the downside it means more allocations than you're used to.
查看更多
姐就是有狂的资本
5楼-- · 2020-07-02 10:53

You're not really testing Java vs. Python, you're testing java.util.HashSet using autoboxed Integers vs. Python's native set and integer handling.

Apparently, the Python side in this particular microbenchmark is indeed faster.

I tried replacing HashSet with TIntHashSet from GNU trove and achieved a speedup factor between 3 and 4, bringing Java slightly ahead of Python.

The real question is whether your example code is really as representative of your application code as you think. Have you run a profiler and determined that most of the CPU time is spent in putting a huge number of ints into a HashSet? If not, the example is irrelevant. Even if the only difference is that your production code stores other objects than ints, their creation and the computation of their hashcode could easily dominate the set insertion (and totally destroy Python's advantage in handling ints specially), making this whole question pointless.

查看更多
Root(大扎)
6楼-- · 2020-07-02 10:53

You might want to see if you can "prime" the JIT compiler into compiling the section of code you're interested in, by perhaps running it as a function once beforehand and sleeping briefly afterwords. This might allow the JVM to compile the function down to native code.

查看更多
叼着烟拽天下
7楼-- · 2020-07-02 10:55

Are you using the -server flag with the jvm? You can't test for performance without it. (You also have to warm up the jvm before doing the test.)

Also, you probably want to use TreeSet<Integer>. HashSet will be slower in the long run.

And which jvm are you using? The newest I hope.

EDIT

When I say use TreeSet, I mean in general, not for this benchmark. TreeSet handles the real world issue of non even hashing of objects. If you get too many objects in the same bin in a HashSet, you will perform about O(n).

查看更多
登录 后发表回答