Java Class Files - a White Paper
A brief technical overview of Java class file format is required in order to understand the obfuscation process.
Java bytecode for an applet is stored in a set of class files, one class file for each class defined. The class files contain the Java virtual machine instructions (bytecode) as well as all symbol information from the source.
The class file consists of 4 logical parts:
The complete strings for all symbol names in the class file are contained in the constant pool. The names stored are fully qualified names, uniquely identifying the class and member referenced. A constant pool entry is reference in the class file via it's index in the constant pool table.
The Fields and Methods lists each reference their name and type information with constant pool indexes. The Java bytecode instructions reference their operands via constant pool indexes.
When a class is loaded, so is it's constant pool. As each Java instruction executes the constant pool entries it may reference are "resolved", loading any other required classes then linking the instruction reference to the actual field or method the constant pool entry names.
This is enough detail to understand obfuscation. For a more detailed description of class file format and the Java Virtual Machine, the Java series reference, The Java Virtual Machine Specification, is indispensable.
The Obfuscation Process
JCloak performs the obfuscation across the set of class files referenced from:
The classes specified for 1 through 3 above are termed the root or main class[es].
JCloak resolves class references using the current classpath (java.classpath property). The classpath is divided into two groups, writeable and readonly classes. For example the java.lang.io package classes would be in the readonly classes.
If a writeable class path (AddWriteablePath) elements matches a prefix of a classpath element, the packages found on that classpath element are in the writeable set.
JCloak makes a fundamental assumption that any class in the readonly set has no direct knowledge of a class in the writeable set. In other words a readonly class may only invoke methods of a writeable class by way of inheritance or interface.
JCloak loads the readonly and writeable class sets by recursively examining the bytecode references of each writeable class, starting with the main class[es] and terminating the recursion at each reference to a readonly class.
For each class in the class set, all of it's superclasses and superinterfaces are loaded as readonly or writeable as indicated.
Next JCloak examines the symbols defined by the writeable classes to determine if they may be safely obfuscated, the rules depend on the access level setting (AccessLevel).
For public access level any symbol that is not referenced by a read-only class is subject to obfuscation.
Access level of package allows obfuscation of only symbols that are not public or protected access. This setting would generated the desired result for obfuscating a class library package.
JCloak removes any symbols that are defined and not referenced by any class in the set. This includes removing any code for a method that is unreferenced.