My Second Stint at Jagex

I returned to Jagex this summer to keep myself busy doing technical stuff, rather than wasting time sleeping till midday every day.

This time round, I worked as a Data Engineer. My main task was to rewrite and refactor the aging Bash codebase into Python. Although the fundamental processes of the system uses the command line and Bash seemed to the be the best language for that, the inability to load configurations and other files into memory means it performed slowly. More importantly, it became very much unreadable. Bash is good for small scripts but not in 800-line long scripts without any factoring.

This helped me to practice the awesomness of Python - lambda functions, filtering and list comprehensions to name a few. I even helped a game analyst to write a "generic" JSON-to-TSV script so that they could convert nested JSON into TSVs (it's on my GitHub, check it out - py-json2tsv).

I was fortunate enough to be given a cool task that inovles Apache Spark. It is one of the most contributed and active open-scoure projects that is used for data processing. Given Spark is written in Scala, I wrote my code using the Scala API, so to enable me to utilise all the available functionalities with little compromises.

The objective of the task was to compute some statistics for each column of a given table, given the type of that column in a configuration file. This would then feed into a final system that detects abnormal trends so that analysts and engineers could spot and fix potential issues more quickly.

It was a brilliant task as it exposed me to an interesting programming language. Think of Scala as a hybrid of Python, Java and some functional language, it has a steep learning curve, but it was surely fun. It took me a long while to get used to the strictly typing as I have been programming in Python most of the time for a long while.

Ultimately, I was able to create a rather efficient piece of code. Processing and storing the results of a 70 million-line partition with more than 10 columns took only 40 minutes. (It ran on our Hadoop cluster as a YARN applciation using at most 500 containers at any point of time.)

It was great to see familiar faces back at Jagex and being able to pick up new practical skills in 3 short months. It has shown me that I can actually program so I should do more of it in the future.