-
Notifications
You must be signed in to change notification settings - Fork 56
Inside code
1. 从字节码到对象
MethodArea
负责管理字节码到JavaClass的完整生命周期。MethodArea
的方法是自解释的:
class MethodArea {
public:
// 方法区需要从运行时目录中搜索相关的*.class文件
MethodArea(const vector<string>& libPaths);
~MethodArea();
// 查看一个类是否存在
JavaClass* findJavaClass(const string& jcName);
//加载jcName类
bool loadJavaClass(const string& jcName);
//移除jcName(该方法用于垃圾回收器)
bool removeJavaClass(const string& jcName);
//链接jcName类,初始化static字段
void linkJavaClass(const string& jcName);
//初始化jcName,初始化静态字段,调用static{}
void initJavaClass(Interpreter& exec, const string& jcName);
public:
//辅助方法,如果不存在jcName则加载
JavaClass* loadClassIfAbsent(const string& jcName);
//如果未链接jcName则链接
void linkClassIfAbsent(const string& jcName);
//如果未初始化jcName则初始化
void initClassIfAbsent(Interpreter& exec, const string& jcName);
}
假设磁盘存在一个Test.class
文件,它会经历如下过程:
Test.class[磁盘中]
-> loadJavaClass("Test.class")[内存中]
-> linkJavaClass("Test.class")
->initJavaClass("Test.class")
现在虚拟机就可以使用这个JavaClass创建对应的对象了:
// yrt 是全局运行时对象,ma表示方法区模块,jheap表示堆模块
JavaClass* testClass = yrt.ma->findJavaClass("Test.class");
JObject* testInstance = yrt.jheap->createObject(*testClass);
2.1 对象内部构造
虚拟机执行时栈上存放的都是JObject,它的结构如下:
struct JObject {
std::size_t offset = 0;
const JavaClass* jc{};
};
offset
唯一代表一个对象,所有在堆上面的操作都需要这个offset。jc
指向对象的Class表示。
堆中的对象是按照<offset,fields>方式进行存放的:
[1] -> [field_a, field_b, field_c]
[2] -> []
[3] -> [field_a,field_b]
[4] -> [field_a]
[..] -> [...]
只要我们持有offset,就可以查找/添加/删除对应的field
数组几乎和上面类似,只是多了长度,少了Class指针
struct JArray {
int length = 0;
std::size_t offset = 0;
};
[1] -> <3, [field_a, field_b, field_c]>
[2] -> <0, []>
[3] -> <2, [field_a,field_b]>
[4] -> <1, [field_a]>
[..] -> <..,[...]>
2.2 从对象创建到消亡
上面提到,对象持有一个offset和jc,其中jc表示的JavaClass是由MethodArea
负责管理的,offset则是由JavaHeap
负责管理。JavaHeap
提供了大量API,这里选取的是最重要的:
class JavaHeap {
public:
//创建对象和数组
JObject* createObject(const JavaClass& javaClass);
JArray* createObjectArray(const JavaClass& jc, int length);
//获取对象字段
auto getFieldByName(const JavaClass* jc, const string& name,
const string& descriptor, JObject* object);
//设置对象字段
void putFieldByName(const JavaClass* jc, const string& name,
const string& descriptor, JObject* object,
JType* value);
//设置数组元素
void putElement(const JArray& array, size_t index, JType* value);
//获取数组元素
auto getElement(const JArray& array, size_t index);
//移除对象和数组
void removeArray(size_t offset;
void removeObject(size_t offset);
};
还是Test.class
那个例子,假设对应的Test.java
构造如下:
public class Test{
public int k;
private String hello;
}
在第一步我们已经获取到了Test类在虚拟机中的类表示以及对象表示,现在就可以对类的字段进行操作了:
const JavaClass* testClass = yrt.ma->findJavaClass("Test.class");
JObject* testInstance = yrt.jheap->createObject(*testClass);
//获取hello字段
JObject* helloField = yrt.jheap->getFieldByName(testClass,"hello","Ljava/lang/String;",testInstance);
//设置k字段
yrt.jheap->putFieldByName(testClass,"k","I",testInstance);
I. 关于JDK
部分JDK类是JVM运行攸关的,但由于JDK比较复杂不便于初期开发,所以这里用重写过的JDK代替,源码参见javaclass目录,可以使用compilejava.bat
进行编译,编译后*.class
文件位于bytecode.
目前重写过的JDK类有:
java.lang.String
java.lang.StringBuilder
java.lang.Throwable
java.lang.Math(::random())
java.lang.Runnable
java.lang.Thread
1. From bytecode to an object
MethodArea
used to handle a complete lifecycle of JavaClass, its APIs are self-explanatory:
class MethodArea {
public:
// Pass runtime libraries paths to tell virutal machine searches
// where to lookup dependent classes
MethodArea(const vector<string>& libPaths);
~MethodArea();
// check whether it already exists or absents
JavaClass* findJavaClass(const string& jcName);
// load class which specified by jcName
bool loadJavaClass(const string& jcName);
// remove class which specified by jcName(Used for gc only)
bool removeJavaClass(const string& jcName);
// link class which specified by jcName,initialize its fields
void linkJavaClass(const string& jcName);
// initialize class specified by jcName,call the static{} block
void initJavaClass(Interpreter& exec, const string& jcName);
public:
//auxiliary functions
JavaClass* loadClassIfAbsent(const string& jcName);
void linkClassIfAbsent(const string& jcName);
void initClassIfAbsent(Interpreter& exec, const string& jcName);
}
For example, we have a bytecode file named Test.class
,it would be available for jvm only if the following steps finished:
Test.class[in the disk]
-> loadJavaClass("Test.class")[in memory]
-> linkJavaClass("Test.class")
->initJavaClass("Test.class")
Now we can create corresponding objects as soon as above steps accomplished:
// yrt is a global runtime variable,ma stands for MethodArea module,jheap stands for JavaHeap module
JavaClass* testClass = yrt.ma->findJavaClass("Test.class");
JObject* testInstance = yrt.jheap->createObject(*testClass);
2.1 Inside the object
jvm stack only holds basic numeric data and object/array reference, which we call the JObject/JArray, they have the following structure:
struct JObject {
std::size_t offset = 0;
const JavaClass* jc{};
};
offset
stands for an object,all operations of object in heap required this offset
。jc
references to the JavaClass。
Every object in heap constructed with <offset, fields> pair
[1] -> [field_a, field_b, field_c]
[2] -> []
[3] -> [field_a,field_b]
[4] -> [field_a]
[..] -> [...]
If we get the object's offset, we can do anything of that indirectly.
Array is almost the same as object, it has a length field instead of jc since it's unnecessary for array to hold a meta class reference.
struct JArray {
int length = 0;
std::size_t offset = 0;
};
[1] -> <3, [field_a, field_b, field_c]>
[2] -> <0, []>
[3] -> <2, [field_a,field_b]>
[4] -> <1, [field_a]>
[..] -> <..,[...]>
2.2 From object creation to extinction
As above mentioned, a JObject holdsoffset
and jc
. MethodArea
has responsible to manage JavaClass
which referenced by jc
, another offset
field referenced to JObject
, which in control of JavaHeap
. JavaHeap
provides a large number of self-explanatory APIs:
class JavaHeap {
public:
// create and object/array
JObject* createObject(const JavaClass& javaClass);
JArray* createObjectArray(const JavaClass& jc, int length);
// get/set field
auto getFieldByName(const JavaClass* jc, const string& name,
const string& descriptor, JObject* object);
void putFieldByName(const JavaClass* jc, const string& name,
const string& descriptor, JObject* object,
JType* value);
// get/set specific element in the array
void putElement(const JArray& array, size_t index, JType* value);
auto getElement(const JArray& array, size_t index);
// remove an array/object from heap
void removeArray(size_t offset;
void removeObject(size_t offset);
};
Back to the above example again, assume its corresponding Java class structure is as follows:
public class Test{
public int k;
private String hello;
}
In the first step, we've already got testClass
, now we can do more things via it:
const JavaClass* testClass = yrt.ma->findJavaClass("Test.class");
JObject* testInstance = yrt.jheap->createObject(*testClass);
// get the field hello
JObject* helloField = yrt.jheap->getFieldByName(testClass,"hello","Ljava/lang/String;",testInstance);
//set the field k
yrt.jheap->putFieldByName(testClass,"k","I",testInstance);
Ⅰ. About JDK
Any java virtual machines can not run a Java program without Java libraries. As you may know, some opcodes like ldc
,monitorenter/monitorexit
,athrow
are internally requiring our virtual machine to operate JDK classes(java.lang.Class
,java.lang.String
,java.lang.Throwable
,etc). Hence, I have to rewrite some JDK classes for building a runnable VM , because original JDK classes are so complicated that it's inconvenient for early developing.
Rewrote JDK classes are as follows:
java.lang.String
java.lang.StringBuilder
java.lang.Throwable
java.lang.Math(::random())
java.lang.Runnable
java.lang.Thread