Deobfuscating Name Scrambling as a Natural Language Generation Task
We are interested in data-driven approaches to Natural Language Generation, but semantic representations for human text are difficult and expensive to construct. By considering a methods implementation as weak semantics for the English terms extracted from the method’s name we can collect massive da...
Guardado en:
| Autor principal: | |
|---|---|
| Formato: | Articulo |
| Lenguaje: | Inglés |
| Publicado: |
2019
|
| Materias: | |
| Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/135041 https://publicaciones.sadio.org.ar/index.php/EJS/article/view/85 |
| Aporte de: |
| id |
I19-R120-10915-135041 |
|---|---|
| record_format |
dspace |
| institution |
Universidad Nacional de La Plata |
| institution_str |
I-19 |
| repository_str |
R-120 |
| collection |
SEDICI (UNLP) |
| language |
Inglés |
| topic |
Ciencias Informáticas random forest model Natural language bytecodes |
| spellingShingle |
Ciencias Informáticas random forest model Natural language bytecodes Duboue, Pablo Ariel Deobfuscating Name Scrambling as a Natural Language Generation Task |
| topic_facet |
Ciencias Informáticas random forest model Natural language bytecodes |
| description |
We are interested in data-driven approaches to Natural Language Generation, but semantic representations for human text are difficult and expensive to construct. By considering a methods implementation as weak semantics for the English terms extracted from the method’s name we can collect massive datasets, akin to have words and sensor dataaligned at a scale never seen before. We applied our learned model to name scrambling, a common technique used to protect intellectual property and increase the effort necessary to reverse engineer Java binary code: replacing all the method and class names by a random identifier. Using 5.6M bytecode-compiled Java methods obtained from the Debianarchive, we trained a Random Forest model to predict the first term in the method name. As features, we use primarily the opcodes of the bytecodes (that is, bytecodes without any parameters). Our results indicate that we can distinguish the 15 most popular terms from the others at 78% recall, helping a programmer performing reverse engineering to reduce half of the methods in a program they should further investigate. We also performed some preliminary experiments using neural machine translation. |
| format |
Articulo Articulo |
| author |
Duboue, Pablo Ariel |
| author_facet |
Duboue, Pablo Ariel |
| author_sort |
Duboue, Pablo Ariel |
| title |
Deobfuscating Name Scrambling as a Natural Language Generation Task |
| title_short |
Deobfuscating Name Scrambling as a Natural Language Generation Task |
| title_full |
Deobfuscating Name Scrambling as a Natural Language Generation Task |
| title_fullStr |
Deobfuscating Name Scrambling as a Natural Language Generation Task |
| title_full_unstemmed |
Deobfuscating Name Scrambling as a Natural Language Generation Task |
| title_sort |
deobfuscating name scrambling as a natural language generation task |
| publishDate |
2019 |
| url |
http://sedici.unlp.edu.ar/handle/10915/135041 https://publicaciones.sadio.org.ar/index.php/EJS/article/view/85 |
| work_keys_str_mv |
AT dubouepabloariel deobfuscatingnamescramblingasanaturallanguagegenerationtask |
| bdutipo_str |
Repositorios |
| _version_ |
1764820456013037568 |