-->

Integer Vs Long Confusion

2020-01-25 00:55发布

问题:

I have seen many believe in the following

VBA converts all integer values to type Long

In fact, even the MSDN article says

“In recent versions, however, VBA converts all integer values to type Long, even if they're declared as type Integer.”

How is this possible? Consider this simple example.

Sub Sample()
    Dim I As Integer
    I = 123456789
End Sub

If VBA converts all Integer values to type Long even if they're declared as type Integer, then the above should never give you the Overflow error!

What am I missing here? Or should I take it that the statement is incorrect and pay serious heed to that the link says in the beginning

回答1:

An integer declared as an Integer is still type checked as an Integer. The msdn documentation is referencing how the variable is stored internally. On a 32 bit system, an Integer will be stored in 32 BITS not Bytes, while on a 16 bit system the value is stored in a 16 BIT space or register, it would have been stored in 16. Hence the maximum size.

There is no type conversion going on as far as VBA is concerned. An int is an int and a long is a long, even though they now take up just as much space.



回答2:

I've spent a lot of time working in the VBA environment and have every reason to believe that the claim in this article is at best, misleading.

I've never come across a situation where an automatic unexpected conversion is made. Of course, assignment by value to a larger type (such as a Double or Long) would be implicit.

One specific case where automatic conversion would be a breaking change would be an assignment to a Variant type. Without a conversion, the type would be VT_I2, with conversion VT_I4.

Passing an Integer type ByRef to a function expecting a Long emits a type mismatch in Office 2013.

I suspect they are referring to the internal storage of the Integer: it's very likely that they are not aligned on 16 bit words in memory (cf. a short structure member in C / C++). They are probably talking about that.



回答3:

“In recent versions, however, VBA converts all integer values to type Long, even if they're declared as type Integer.”

I don't believe that documentation. Consider the following simple example (run in Excel 2010):

Sub checkIntegerVsLong()

    'Check the total memory allocation for an array
    Dim bits As Integer 'or long? :)
    Dim arrInteger() As Integer
    ReDim arrInteger(1 To 5)
    arrInteger(1) = 12
    arrInteger(2) = 456
    'get total memory allocation for integer in array
    bits = VarPtr(arrInteger(2)) - VarPtr(arrInteger(1))
    Debug.Print "For integer: " & bits & " bits and " & bits * 8 & " bytes."


    Dim arrLong() As Long
    ReDim arrLong(1 To 5)
    arrLong(1) = 12
    arrLong(2) = 456

    'get memory allocation for long
    bits = VarPtr(arrLong(2)) - VarPtr(arrLong(1))
    Debug.Print "For long: " & bits & " bits and " & bits * 8 & " bytes."

End Sub

This prints:

For integer: 2 bits and 16 bytes.

For long: 4 bits and 32 bytes.

You can also test this on individual variables using the following:

Sub testIndividualValues()

    Dim j As Long
    Dim i As Integer
    Dim bits As Integer

    bits = LenB(i)
    Debug.Print "Length of integer: " & bits & " bits and " & bits * 8 & " bytes."
    bits = LenB(j)
    Debug.Print "Length of long: " & bits & " bits and " & bits * 8 & " bytes."



End Sub

which prints

Length of integer: 2 bits and 16 bytes.

Length of long: 4 bits and 32 bytes.

Last, you can use a type comparison here:

Public Type myIntegerType
    a As Integer
    b As Integer
End Type
Public Type myLongType
    a As Long
    b As Long
End Type

Public Sub testWithTypes()
    Dim testInt As myIntegerType
    Dim testLong As myLongType
    Dim bits As Integer

    bits = VarPtr(testInt.b) - VarPtr(testInt.a)
    Debug.Print "For integer in types: " & bits & " bits and " & bits * 8 & " bytes."

    bits = VarPtr(testLong.b) - VarPtr(testLong.a)
    Debug.Print "For long in types: " & bits & " bits and " & bits * 8 & " bytes."

End Sub

which also prints:

For integer in types: 2 bits and 16 bytes.

For long in types: 4 bits and 32 bytes.

This is pretty compelling evidence to me that VBA actually does treat Integer and Long differently.

If VBA silently converted behind the scenes, you would expect those to return the same number of bits/bytes for each of the pointer allocations locations. But in the first case, with Integer, it is only allocating 16 bits, while for Long variables, it allocates 32 bits.

So what?

So to your question of

If VBA converts all Integer values to type Long even if they're declared as type Integer, then the above should never give you the Overflow error!

It makes complete sense that you would get an Overflow error, as VBA has not actually allocated the memory for a Long to the Integer declaration.

I would be curious too if this returns the same on all versions of Office. I can only test on Office 2010 on 64 bit Windows 7.



回答4:

The conversion is only for memory optimization, not for user code. For the programmer, there is practically no change since the min/max limits of datatypes remain the same.

If you take that para as a whole, you will realize that that statement is in context of performance only, and not otherwise. This is because the default size of numbers is Int32 or Int64 (depending on whether it is 32-bit or 64-bit system). The processor can process upto that big number in one go. If you declare a smaller unit than this, the compiler has to downsize it, and that needs more efforts than simply using the default type. And the processor has really no gain either. So even though you declare your variable as Integer, the compiler allocates it a Long memory, because it knows that it has to do more work without any gain.

As a VBA programmer what is of significance for you is – Declare your variables as LONG instead of INTEGER even if you want to store small numbers in them.



回答5:

So far, I have not seen anyone mention the problem of byte alignment. In order to manipulate the numbers, it needs to be loaded into register and as a rule, a register cannot contain more than one variable. I believe that registers also need to be cleared from the previous instruction, so to ensure the variable is loaded correctly it needs to be re-aligned, which may also involve sign extending or zeroing out the register.

You also can observe the byte alignment using VBA code:

Public Type x
    a As Integer
    b As Integer
    l As Long
End Type

Public Type y
    a As Integer
    l As Long
    b As Integer
End Type

Public Sub test()
    Dim x As x
    Dim y As y

    Debug.Print LenB(x)
    Debug.Print LenB(x.a), LenB(x.b), LenB(x.l)

    Debug.Print LenB(y)
    Debug.Print LenB(y.a), LenB(y.l), LenB(y.b)
End Sub

Even though the UDT x and y contains same number of members and each member are same data type; with the only difference being the ordering of the members, LenB() will give different results; on a 32-bit platform, x consumes only 8 bytes whereas y will need 12 bytes. The high word between the x.a and x.l and after the x.b is simply ignored.

The other point is that the problem is not unique to VBA. For example, C++ has the same considerations as illustrated here and here. So this is actually much lower level and thus why you can't "see" the sign-extending/zero-extending behavior when loading the variables into registers for performing the operation. To see that, you need the disassembly.



回答6:

As far as my testing goes, A VBA integer still takes two bytes (Tested on Access 2016, build 8201).

Implicit casting to a long (and back, if it's a write operation) occurs for operations, not for storage, as far as I can find. E.g. if I do myInt + 1, myInt gets cast to a long, then one gets added to that long, and then the result is cast back to an int, resulting in a performance loss compared to just using a Long. So while it costs less memory to use an integer, all operations will suffer in performance.

As Mathieu Guindon noted under Enderland/Elysian Fields's answer, testing VBA's storage with VBA functions can't prove anything, so let's go more low-level and directly view what's stored in memory, and manipulate that memory.

First, declarations:

Declare PtrSafe Sub CopyMemory Lib "Kernel32.dll" Alias "RtlMoveMemory" (ByVal Destination As LongPtr, ByVal Source As LongPtr, ByVal Length As Long)

Public Function ToBits(b As Byte) As String
    Dim i As Integer
    For i = 7 To 0 Step -1
        ToBits = ToBits & IIf((b And 2 ^ i) = (2 ^ i), 1, 0)
    Next
End Function

Now, I'm going to prove two things:

  1. The memory VarPtr points to contains 16-bits integers
  2. Manipulating this memory manipulates the integers VBA uses, even if you manipulate it outside of VBA

The code:

Dim i(0 To 1) As Integer
'Using negatives to prove things aren't longs, because of the sign bit
i(0) = -2 ^ 15 + (2 ^ 0) '10000000 00000001
i(1) = -2 ^ 15 + (2 ^ 1) '10000000 00000010
Dim bytes(0 To 3) As Byte
CopyMemory VarPtr(bytes(0)), VarPtr(i(0)), 4
Dim l As Long
For l = 3 To 0 Step -1
    Debug.Print ToBits(bytes(l)) & " ";
    'Prints 10000000 00000010 10000000 00000001
Next
'Now, let's write something back
bytes(0) = &HFF '0xFFFF = signed -1
bytes(1) = &HFF
CopyMemory VarPtr(i(0)), VarPtr(bytes(0)), 2
Debug.Print i(0) '-1

So, we can be certain that VBA indeed both writes 2-byte integers to the memory, and reads them back from the memory, when we're declaring things as an integer.



回答7:

Looking at the other answers and the MSDN documentation, I think the phrase "Internally stored" is imprecise and that's what's confusing.

Tl;DR

Integers aren't "stored internally" as Longs, that is they do not require the same amount of memory to save their values as a Long does. Rather they are "used internally" as Longs, meaning their value is temporarily stored in a Long variable whenever it is accessed (e.g incrementing a loop counter) before being copied back, and generally speaking, an array of Integers will require half as much memory as an array of Longs.


@enderland's answer shows that the memory layout of Integers, Integer Arrays and UDTs composed of Integers such as a DWORD all conform to the idea that the value contained in a variable declared as an integer takes up 2 bytes of memory.

This is from the viewpoint of VBA code, meaning it's possible to assume that

  1. The memory locations and sizes given by VarPtr and LenB respectively are incorrect (lies) in the case of Integers to avoid breaking existing code when the switch from 16 to 32 bit systems took place
  2. There is some sort of abstraction layer which means the memory appears as one thing but is actually another

We can rule these both out.

It is possible to use the CopyMemory API with an address given by the VarPtr and a width given by LenB to overwrite values in an array directly. The API is not under VBA's control, and all it does is directly write bits to memory. The fact that this is possible at all means that VarPtr must point to an area in memory where LenB bytes are used to store the value of that Integer; no other way around it, 2 bytes is the amount of space used to encode an Integer's value.

The abstraction layer could still be true though; VBA could hold one array of 2 byte spaced memory (SAFEARRAYS are all consecutive memory, that's why CopyMemory can write 2 entries at once) where VarPtr points to. Meanwhile a separate 4 byte spaced block of memory shadows the 2 byte spaced block, staying constantly in sync so that Integers can be stored as Longs. Sounds weird but could happen right?


It doesn't, and we can see this by looking at the process memory in Task Manager:

Idle, Excel uses 155,860KB of memory (155,860 * 1024 bytes)

Run this:

Sub testLongs()
    Dim longs(500, 500, 500) As Long
    Stop
End Sub

...and it spikes to 647,288KB. Taking the difference and dividing by the number of array elements gives ~4.03 bytes per Long. The same test for Integers:

Sub t()
    Dim ints(500, 500, 500) As Integer
    Stop
End Sub

...gives 401,548, or ~2.01 bytes per Integer

There will be a slight variation in the idle memory usage so the exact numbers don't matter, but clearly the Integer array is using ~ half the memory of the Long array


So my interpretation of the MSDN article is the following:

Memory-wise, Integers are really stored as 2 byte values, not as 4 byte Longs. There is no abstraction or trickery with pointers to hide this from us.

Rather the article tells us that when Integers are used in operations (multiplication/addition etc.) their values are fist copied to the lower half of an int32 / VBA Long, the calculation takes place in an optimized 32-bit friendly way, and then the result is copied back to Integer and overflow errors are raised as necessary. For Longs there is no need to copy forward and back (hence the recommendation).



标签: vba