So, what is a primitive type? According to the Incompleteness Theorem, there will always be things in any mathematical system, and therefore any computational system, that cannot be defined using the rules of that system. These rules form the axioms of that system.
For Java and C#, the axioms are the rules of the language and runtime, as defined in the respective specifications, and those rules cannot be inferred from within the language itself. They simply exist as a given.
So, what rules are the axioms of Java and C#? There are several possibilities, given the wide scope of both langauges. For the purposes of this post, I’m going to concentrate on the type systems of both languages, and use the primitive types as the axioms. So what are those primitive types?
- booleans
- integers
- floats
- arrays of primitives
And that’s it. There are several things to note from this definition:
- Arrays are defined recursively, so you can have an array of arrays of integers.
- Arrays are a reference type, everything else is a value type.
- Objects are not a primitive, as an object can be defined using arrays of primitives. As arrays are a reference type, this gives objects, defined using arrays, the semantics of a reference type.
- Characters are not a primitive either, as those can be defined using integers.
- Strings can be defined as an array of integers.
Also note that this is not a formal definition – I’m using this definition to learn more about Java and C#, and how they use these primitives to define the rest of the language, not to define and analyse C# and Java using formal type theory or strict mathematics.
The primitive value types
In this post, we’ll be starting off with the primitive value types. What are the primitive value types in Java and C#?
Type | Java | C# |
---|---|---|
boolean | bool |
boolean |
1-byte signed integer | byte |
sbyte |
1-byte unsigned integer | |
byte |
2-byte signed integer | short |
short |
2-byte unsigned integer | |
ushort |
4-byte signed integer | int |
int |
4-byte unsigned integer | |
uint |
8-byte signed integer | long |
long |
8-byte unsigned integer | |
ulong |
4-byte float | float |
float |
8-byte float | double |
double |
Within the runtime, these values all have a predefined representation – for the numbers, simply the byte representation of that number, and for boolean values, a 1-byte value containing zero for false, and non-zero for true. As you can see, C# provides signed and unsigned versions of all the various lengths of integers – 1, 2, 4, and 8 bytes. The c# byte
is defined as unsigned, whereas Java only provides signed versions, which means Java’s byte
is signed.
When programming using these types, you need to be able to perform operations on them, such as arithmetic operations or comparisons. Due to the Incompleteness Theorem, these operations cannot be defined using code written in the language itself – these operations are defined outside Java or C#. And so the CLR and JVM can perform mathematical operations and comparisons between instances of the primitive value types without using any external libraries. To accomplish this, there are special commands in IL and Java bytecode to perform these built-in operations.
A selection of these commands are:
Operation | Java bytecode | IL |
---|---|---|
Add two 4-byte integers | iadd |
add |
Multiple two 8-byte floats | dmul |
mul |
Branch if equal | if_icmpeq |
beq |
Load a constant 8-byte integer | ldc2_w |
ldc.i8 |
To access these built-in runtime instructions from Java or C#, the language has special syntax that compiles to these instructions, primarily the mathematical operators + - * / < >
and ==
. So, for example, the following expression:
1 |
int i = 10 + 20; |
compiles to the following IL:
1 2 3 4 |
ldc.i4 10 ldc.i4 20 add stloc.0 |
and the following Java bytecode:
1 2 3 4 |
ldc 10 ldc 20 iadd istore_0 |
All these language mappings, and instructions, are built-in and predefined as axioms in the language and runtime.
Methods on primitive types
However, there’s still some operations in the language which don’t map directly onto instructions provided by the runtime. For example, toString
, parse
, or the implementation of a generic Comparable
interface. As Java and C# are both object-oriented languages, these methods need to be defined as part of an object of some kind. For 4-byte integers, these methods are defined on System.Int32
in C#, and on java.lang.Integer
in Java.
The difference
It’s these objects and methods that are the key to understanding the differences between primitive types in Java and C#. Lets start off with Java:
java.lang.Integer
Like all other types in Java, java.lang.Integer
is a reference type, which contains a single field of the primitive type int
. It’s just like any other reference type in Java. It’s this type that contains the various methods that act on an int
, like toString
, parseInt
, compareTo
, implemented either as a static method that takes or returns an int
argument where appropriate, or as an instance method on java.lang.Integer
that operates on the instance’s int
field.
Prior to Java 1.5, you had to manually convert to and from int
to java.lang.Integer
, using the constructor on Integer
or calling the instance method Integer.intValue()
to get contained int
value. In 1.5, the compiler inserts these conversions where appropriate as part of the autoboxing feature.
The important point is that, in Java, an int
is a pure 4-byte number, operated on by instructions built-in to the runtime. java.lang.Integer
contains all the other operations on integers that can’t be compiled directly to runtime instructions. It’s just like any other reference type in Java. When necessary, you can create an instance of Integer
from an int
value to pass an integer value to methods expecting an instance of Object
or other reference type.
System.Int32
Similar to java.lang.Integer
, System.Int32
is the type containing all the methods on integer values that don’t map directly onto operations provided by the runtime. But, where Integer
is a reference type, System.Int32
is a value type. This has some quite fundamental consequences to what an integer value is in C#. To understand what these are, we need to take a digression as to how a value type is represented in .NET.
Value types in C#
An instance of a reference type is assigned its own block of memory on the heap. But a value type borrows memory from an object containing that value type. If it is declared as a member of a reference type, it will use a section of memory that belongs to the reference type on the heap. If manipulating it on the stack, it uses a section of the stack.
If the value type is a member of an outer value type, the inner value type becomes part of the value of the outer value type. For example, the following type definitions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
public struct Inner1 { int I1; short S1; short S2; } public struct Inner2 { double D1; } public struct ValueA { int I2; Inner1 v1; Inner2 v2; } public class ObjectA { float F1; ValueA A; int I3; } |
will result in the following memory layout for instances of type ObjectA
on the heap:
1 2 3 4 5 6 7 8 |
<object header> F1 ValueA.I2 Inner1.I1 Inner1.S1 Inner1.S2 Inner2.D1 I3 |
Recursive definition?
So, back to System.Int32
. If you have a look at this type in a disassembler, you’ll see that its definition is, in IL:
1 2 3 4 |
.class public System.Int32 extends System.ValueType { .field assembly int32 m_value } |
This looks like a recursive definition, violating the .NET rule that a struct cannot contain an instance of itself. But it obviously does work, somehow.
The key is the Incompleteness Theorum. int32
is a built-in primitive type that the CLR itself implements using a 4-byte value. The struct System.Int32
is a (more-or-less) standard value type. A value type is comprised of the values of its member fields. System.Int32
is comprised of a single 4-byte value. That means that an instance of System.Int32
is also a pure 4-byte value.
This is the key to understanding primitive types in .NET – any 4-byte value in memory can be interpreted as a primitive int32
, that can be manipulated by built-in arithmetic operations, or an instance of System.Int32
, on which the CLR can execute all the methods declared on that type. That change in interpretation can occur without any changes in the program’s memory, or any boxing operations, the CLR simply chooses to see a 4-byte value as a primitive type one instant, and a complex value type the next.
What is a primitive?
While primitive types in Java are simple values, values of primitive types in .NET are both a primitive type value and a complex value type value. Byte values of the correct length can be interpreted either as primitive types or complex types, thanks to the rules determining how value types use memory in the CLR.
As Java does not allow complex value types to be declared, the methods performing operations that aren’t built-in to the runtime must be declared on a separate reference type, and the primitive types converted to and from this representation using autoboxing where needed. The CLR simply reinterprets a value as a primitive or complex value type.
That’s started us off with the primitives. In the next post, we’ll be looking at arrays, and how Java and C# arrays can be used, and what they represent.
Load comments