Skip to content

Inside code

Yi Yang edited this page Dec 2, 2022 · 2 revisions

中文

1. 从字节码到对象

MethodArea负责管理字节码到JavaClass的完整生命周期。MethodArea的方法是自解释的:

class MethodArea {
public:
    // 方法区需要从运行时目录中搜索相关的*.class文件
    MethodArea(const vector<string>& libPaths);
    ~MethodArea();

    // 查看一个类是否存在
    JavaClass* findJavaClass(const string& jcName);
    //加载jcName类
    bool loadJavaClass(const string& jcName);
    //移除jcName(该方法用于垃圾回收器)
    bool removeJavaClass(const string& jcName);
    //链接jcName类,初始化static字段
    void linkJavaClass(const string& jcName);
    //初始化jcName,初始化静态字段,调用static{}
    void initJavaClass(Interpreter& exec, const string& jcName);

public:
    //辅助方法,如果不存在jcName则加载 
    JavaClass* loadClassIfAbsent(const string& jcName);
    //如果未链接jcName则链接
    void linkClassIfAbsent(const string& jcName);
    //如果未初始化jcName则初始化
    void initClassIfAbsent(Interpreter& exec, const string& jcName);
}

假设磁盘存在一个Test.class文件,它会经历如下过程:

Test.class[磁盘中]-> loadJavaClass("Test.class")[内存中] -> linkJavaClass("Test.class")->initJavaClass("Test.class")

现在虚拟机就可以使用这个JavaClass创建对应的对象了:

// yrt 是全局运行时对象,ma表示方法区模块,jheap表示堆模块
JavaClass* testClass = yrt.ma->findJavaClass("Test.class");
JObject* testInstance = yrt.jheap->createObject(*testClass);
2.1 对象内部构造

虚拟机执行时栈上存放的都是JObject,它的结构如下:

struct JObject {
    std::size_t offset = 0; 
    const JavaClass* jc{}; 
};

offset唯一代表一个对象,所有在堆上面的操作都需要这个offset。jc指向对象的Class表示。 堆中的对象是按照<offset,fields>方式进行存放的:

[1]  ->  [field_a, field_b, field_c]
[2]  ->  []
[3]  ->  [field_a,field_b]
[4]  ->  [field_a]
[..] ->  [...]

只要我们持有offset,就可以查找/添加/删除对应的field

数组几乎和上面类似,只是多了长度,少了Class指针

struct JArray {
    int length = 0;
    std::size_t offset = 0; 
};
[1]  ->   <3, [field_a, field_b, field_c]>
[2]  ->   <0, []>
[3]  ->   <2, [field_a,field_b]>
[4]  ->   <1, [field_a]>
[..] ->   <..,[...]>
2.2 从对象创建到消亡

上面提到,对象持有一个offset和jc,其中jc表示的JavaClass是由MethodArea负责管理的,offset则是由JavaHeap负责管理。JavaHeap提供了大量API,这里选取的是最重要的:

class JavaHeap {
public:
    //创建对象和数组
    JObject* createObject(const JavaClass& javaClass);
    JArray* createObjectArray(const JavaClass& jc, int length);

    //获取对象字段
    auto getFieldByName(const JavaClass* jc, const string& name,
                        const string& descriptor, JObject* object);
    //设置对象字段
    void putFieldByName(const JavaClass* jc, const string& name,
                        const string& descriptor, JObject* object,
                        JType* value);
    //设置数组元素
    void putElement(const JArray& array, size_t index, JType* value);
    //获取数组元素
    auto getElement(const JArray& array, size_t index);
    
    //移除对象和数组
    void removeArray(size_t offset;
    void removeObject(size_t offset);
};

还是Test.class那个例子,假设对应的Test.java构造如下:

public class Test{
    public int k;
    private String hello;
}

在第一步我们已经获取到了Test类在虚拟机中的类表示以及对象表示,现在就可以对类的字段进行操作了:

const JavaClass* testClass = yrt.ma->findJavaClass("Test.class");
JObject* testInstance = yrt.jheap->createObject(*testClass);
//获取hello字段
JObject*  helloField = yrt.jheap->getFieldByName(testClass,"hello","Ljava/lang/String;",testInstance);
//设置k字段
yrt.jheap->putFieldByName(testClass,"k","I",testInstance);
I. 关于JDK

部分JDK类是JVM运行攸关的,但由于JDK比较复杂不便于初期开发,所以这里用重写过的JDK代替,源码参见javaclass目录,可以使用compilejava.bat进行编译,编译后*.class文件位于bytecode. 目前重写过的JDK类有:

  • java.lang.String
  • java.lang.StringBuilder
  • java.lang.Throwable
  • java.lang.Math(::random())
  • java.lang.Runnable
  • java.lang.Thread

English

1. From bytecode to an object

MethodArea used to handle a complete lifecycle of JavaClass, its APIs are self-explanatory:

class MethodArea {
public:
    // Pass runtime libraries paths to tell virutal machine searches 
    // where to lookup dependent classes
    MethodArea(const vector<string>& libPaths);
    ~MethodArea();

    // check whether it already exists or absents
    JavaClass* findJavaClass(const string& jcName);
    // load class which specified by jcName
    bool loadJavaClass(const string& jcName);
    // remove class which specified by jcName(Used for gc only)
    bool removeJavaClass(const string& jcName);
    // link class which specified by jcName,initialize its fields
    void linkJavaClass(const string& jcName);
    // initialize class specified by jcName,call the static{} block
    void initJavaClass(Interpreter& exec, const string& jcName);

public:
    //auxiliary functions
    JavaClass* loadClassIfAbsent(const string& jcName);
    void linkClassIfAbsent(const string& jcName);
    void initClassIfAbsent(Interpreter& exec, const string& jcName);
}

For example, we have a bytecode file named Test.class,it would be available for jvm only if the following steps finished:

Test.class[in the disk]-> loadJavaClass("Test.class")[in memory] -> linkJavaClass("Test.class")->initJavaClass("Test.class")

Now we can create corresponding objects as soon as above steps accomplished:

// yrt is a global runtime variable,ma stands for MethodArea module,jheap stands for JavaHeap module
JavaClass* testClass = yrt.ma->findJavaClass("Test.class");
JObject* testInstance = yrt.jheap->createObject(*testClass);
2.1 Inside the object

jvm stack only holds basic numeric data and object/array reference, which we call the JObject/JArray, they have the following structure:

struct JObject {
    std::size_t offset = 0; 
    const JavaClass* jc{}; 
};

offset stands for an object,all operations of object in heap required this offsetjc references to the JavaClass。 Every object in heap constructed with <offset, fields> pair

[1]  ->  [field_a, field_b, field_c]
[2]  ->  []
[3]  ->  [field_a,field_b]
[4]  ->  [field_a]
[..] ->  [...]

If we get the object's offset, we can do anything of that indirectly.

Array is almost the same as object, it has a length field instead of jc since it's unnecessary for array to hold a meta class reference.

struct JArray {
    int length = 0;
    std::size_t offset = 0; 
};
[1]  ->   <3, [field_a, field_b, field_c]>
[2]  ->   <0, []>
[3]  ->   <2, [field_a,field_b]>
[4]  ->   <1, [field_a]>
[..] ->   <..,[...]>
2.2 From object creation to extinction

As above mentioned, a JObject holdsoffset and jc. MethodArea has responsible to manage JavaClass which referenced by jc, another offset field referenced to JObject, which in control of JavaHeap. JavaHeap provides a large number of self-explanatory APIs:

class JavaHeap {
public:
    // create and object/array
    JObject* createObject(const JavaClass& javaClass);
    JArray* createObjectArray(const JavaClass& jc, int length);

    // get/set field
    auto getFieldByName(const JavaClass* jc, const string& name,
                        const string& descriptor, JObject* object);
    void putFieldByName(const JavaClass* jc, const string& name,
                        const string& descriptor, JObject* object,
                        JType* value);
    // get/set specific element in the array
    void putElement(const JArray& array, size_t index, JType* value);
    auto getElement(const JArray& array, size_t index);
    
    // remove an array/object from heap
    void removeArray(size_t offset;
    void removeObject(size_t offset);
};

Back to the above example again, assume its corresponding Java class structure is as follows:

public class Test{
    public int k;
    private String hello;
}

In the first step, we've already got testClass, now we can do more things via it:

const JavaClass* testClass = yrt.ma->findJavaClass("Test.class");
JObject* testInstance = yrt.jheap->createObject(*testClass);
// get the field hello
JObject*  helloField = yrt.jheap->getFieldByName(testClass,"hello","Ljava/lang/String;",testInstance);
//set the field k
yrt.jheap->putFieldByName(testClass,"k","I",testInstance);
Ⅰ. About JDK

Any java virtual machines can not run a Java program without Java libraries. As you may know, some opcodes like ldc,monitorenter/monitorexit,athrow are internally requiring our virtual machine to operate JDK classes(java.lang.Class,java.lang.String,java.lang.Throwable,etc). Hence, I have to rewrite some JDK classes for building a runnable VM , because original JDK classes are so complicated that it's inconvenient for early developing. Rewrote JDK classes are as follows:

  • java.lang.String
  • java.lang.StringBuilder
  • java.lang.Throwable
  • java.lang.Math(::random())
  • java.lang.Runnable
  • java.lang.Thread
Clone this wiki locally