High-level language like Java requires the compiler to translate source code to highly optimized byte code, then byte code is interpreted by the JVM interpreter. Bytecode is generated by the javac compiler when source code is compiled, and produced as a .class file.
Did you have the question in your mind that is .class file contains nothing but hexadecimal code? but which code is used for which purpose? if then you are in the right place to go through!
let’s write code for subtract operation, and we will generate bytecode through compile.
public class AddToValue { public static void main(String[] args) { long maxLong = Long.MAX_VALUE; long secondMinimumPositiveLong = 1; long secondMaxLong = maxLong - secondMinimumPositiveLong; System.out.println(secondMaxLong); } }
This high-level language hides the complexity of how JVM or theoperating system will read this code. Suppose we have declared the max long value, this is a simple instruction to us but how the value of maxLong variable is loaded to the CPU register is not our headache. we have to be relayed on the abstract layer.
although high-level code must be converted to instruction machine code-named as bytecode.
let’s compile this program.
javac AddToValue.java
we will get output like this
cafe babe 0000 0037 001f 0a00 0800 1107 0012 057f ffff ffff ffff ff09 0013 0014 0a00 1500 1607 0017 0700 1801 0006 3c69 6e69 743e 0100 0328 2956 0100 0443 6f64 6501 000f 4c69 6e65 4e75 6d62 6572 5461 626c 6501 0004 6d61 696e 0100 1628 5b4c 6a61 7661 2f6c 616e 672f 5374 7269 6e67 3b29 5601 000a 536f 7572 6365 4669 6c65 0100 0f41 6464 546f 5661 6c75 652e 6a61 7661 0c00 0900 0a01 000e 6a61 7661 2f6c 616e 672f 4c6f 6e67 0700 190c 001a 001b 0700 1c0c 001d 001e 0100 0a41 6464 546f 5661 6c75 6501 0010 6a61 7661 2f6c 616e 672f 4f62 6a65 6374 0100 106a 6176 612f 6c61 6e67 2f53 7973 7465 6d01 0003 6f75 7401 0015 4c6a 6176 612f 696f 2f50 7269 6e74 5374 7265 616d 3b01 0013 6a61 7661 2f69 6f2f 5072 696e 7453 7472 6561 6d01 0007 7072 696e 746c 6e01 0004 284a 2956 0021 0007 0008 0000 0000 0002 0001 0009 000a 0001 000b 0000 001d 0001 0001 0000 0005 2ab7 0001 b100 0000 0100 0c00 0000 0600 0100 0000 0100 0900 0d00 0e00 0100 0b00 0000 3c00 0400 0700 0000 1414 0003 400a 421f 2165 3705 b200 0516 05b6 0006 b100 0000 0100 0c00 0000 1600 0500 0000 0300 0400 0400 0600 0500 0b00 0600 1300 0700 0100 0f00 0000 0200 10
According to The Java Virtual Machine Instruction Set there are 256 possible bytecode instructions. generated bytecode contains instructions among them, we will find out which instruction is used for which bytecode down the line. this instructions set vary x86 to x64 another reason is that JVM tries to use a specific register for an instruction, if the processor does not have this register or is not compatible then used a stack for that operation.
let’s disassemble this class and we will print the instructions from the bytecode.
javap -c AddToValue.class
Compiled from "AddToValue.java" public class AddToValue { public AddToValue(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return public static void main(java.lang.String[]); Code: 0: ldc2_w #3 // long 9223372036854775807l 3: lstore_1 4: lconst_1 5: lstore_3 6: lload_1 7: lload_3 8: lsub 9: lstore 5 11: getstatic #5 // Field java/lang/System.out:Ljava/io/PrintStream; 14: lload 5 16: invokevirtual #6 // Method java/io/PrintStream.println:(J)V 19: return }
We will map bytecode with these printed instructions, first, we have to understand some term Permalink
opcode: opcode is operation code, also known as instruction machine code, that specifies the operation to be performed. ex:
operand stack: The operand stack is a 32-bit, used to store value and return once instructions are invoked.
mnemonic: Short description of the instruction ex: istore_1
theorem: the mnemonic form of opcode => mnemonic = opcode
now we will subtract operation instructions
long maxLong = Long.MAX_VALUE; long secondMinimumPositiveLong = 1; long secondMaxLong = maxLong - secondMinimumPositiveLong;
for these human-readable instructions, we got 8 opcode instructions
0: ldc2_w #3 // long 9223372036854775807l 3: lstore_1 4: lconst_1 5: lstore_3 6: lload_1 7: lload_3 8: lsub 9: lstore 5
opcode instructions corresponding hexadecimal value
Mnemonic | Opcode |
---|---|
ldc2_w | 14 |
lstore_1 | 40 |
lconst_1 | 0a |
lstore_3 | 42 |
lload_1 | 1f |
lload_3 | 21 |
lsub | 65 |
lstore | 37 |
this series of hexadecimal codes will exist within this .class file.
Bytecode |
---|
cafe babe 0000 0037 001f 0a00 0800 1107 |
0012 057f ffff ffff ffff ff09 0013 0014 |
0a00 1500 1607 0017 0700 1801 0006 3c69 |
6e69 743e 0100 0328 2956 0100 0443 6f64 |
6501 000f 4c69 6e65 4e75 6d62 6572 5461 |
626c 6501 0004 6d61 696e 0100 1628 5b4c |
6a61 7661 2f6c 616e 672f 5374 7269 6e67 |
3b29 5601 000a 536f 7572 6365 4669 6c65 |
0100 0f41 6464 546f 5661 6c75 652e 6a61 |
7661 0c00 0900 0a01 000e 6a61 7661 2f6c |
616e 672f 4c6f 6e67 0700 190c 001a 001b |
0700 1c0c 001d 001e 0100 0a41 6464 546f |
5661 6c75 6501 0010 6a61 7661 2f6c 616e |
672f 4f62 6a65 6374 0100 106a 6176 612f |
6c61 6e67 2f53 7973 7465 6d01 0003 6f75 |
7401 0015 4c6a 6176 612f 696f 2f50 7269 |
6e74 5374 7265 616d 3b01 0013 6a61 7661 |
2f69 6f2f 5072 696e 7453 7472 6561 6d01 |
0007 7072 696e 746c 6e01 0004 284a 2956 |
0021 0007 0008 0000 0000 0002 0001 0009 |
000a 0001 000b 0000 001d 0001 0001 0000 |
0005 2ab7 0001 b100 0000 0100 0c00 0000 |
0600 0100 0000 0100 0900 0d00 0e00 0100 |
0b00 0000 3c00 0400 0700 0000 1414 0003 |
400a 421f 2165 3705 b200 0516 05b6 0006 (40 0a 42 1f 21 65 37 05 -> lstore_1 lconst_1 lstore_3 lload_1 lload_3 lsub lstore) |
b100 0000 0100 0c00 0000 1600 0500 0000 |
0300 0400 0400 0600 0500 0b00 0600 1300 |
0700 0100 0f00 0000 0200 10 |
Operand Stack Preparation from these instructions:
Mnemonic | Description |
---|---|
ldc2_w | push 9223372036854775807l to stack from constant pool |
lstore_1 | store 9223372036854775807l in a local variable 1 |
lconst_1 | push 1l onto the stack |
lstore_3 | store 1l in a local variable 3 |
lload_1 | load 9223372036854775807l from a local variable 1 |
lload_3 | load 1l from a local variable 3 |
lsub | subtract 9223372036854775807l – 1l |
lstore | store 9223372036854775806l in a local variable #index |
In summary, we have seen how byte code holds the instruction need to be executed by JVM also mapping is made between Mnemonic and opcode in hexadecimal form.
ref:
- List of Java bytecode instructions
- The Java Virtual Machine Instruction Set
- The Java Virtual Machine
Volunteer at Java User Group Bangladesh (JUGBD)
cmabdullah21@gmail.com @cmabdullah21