Software Forensics

Software is ubiquitous in systems and devices, and consequently there are numerous opportunities for its misuse and abuse.  Innovative software is also a key asset that organisations and individuals must protect.  These circumstances have led to the rise of software forensics, the field of research and practice concerned with determining the authorship of software artefacts – both the innovative and the malicious. Here in SERL we conduct software forensics research focused on identifying, characterising and discriminating among authors of software artefacts.

This work leverages the extensive background we have in software metrics but also incorporates aspects of NLP and semantic analysis as well the use of a range of modelling methods.  Our work to date has led to our being invited to comment in three legal disputes over code authorship, in which we have provided an opinion to counsel based on our empirical analyses.  We have also undertaken extensive tool development under this research theme, producing the IDENTIFIED software tool and the DE/A analysis and modelling software.

Contact Prof. Stephen MacDonell for more information on work in this theme.

Example Project

Authorship Analysis
We have used our IDENTIFIED tool in several authorship analysis investigations, demonstrating high levels of discriminatory power. Under this theme we have also established a strong record of research collaboration with Dr Georgia Frantzeskou and her colleagues at the University of the Aegean in Greece.  Dr Frantzeskou’s work on the SCAP method has been shown to perform very effectively in studies of authorship identification. In recent years we have extended this work to consider the work practices of open sources software development teams.

Related publication:
Frantzeskou, G., MacDonell, S.G., Stamatatos, E., & Gritzalis, S. (2008) Examining the significance of high-level programming features in source code author classification, Journal of Systems and Software 81(3), pp.447-460.

MacDonell, S.G., Buckingham, D., Gray, A.R., & Sallis, P.J. (2002) Software forensics: extending authorship analysis techniques to computer programs, Journal of Law and Information Science 13(1), pp.34-69.

Projects Available

Integration of the IDENTIFIED and SCAP approaches to authorship analysis

Testing the sensitivity and robustness of authorship classification methods

Theme Papers

Frantzeskou, G., MacDonell, S.G., Stamatatos, E., Georgiou, S., & Gritzalis, S. (2011) The significance of user-defined identifiers in Java source code authorship identification, International Journal of Computer
Systems Science and Engineering 26(2)
, pp.139-148.

Frantzeskou, G., MacDonell, S.G., & Stamatatos, E. (2010) Source code authorship analysis for supporting the cybercrime investigation process. In Handbook of Research on Computational Forensics, Digital Crime and Investigation: Methods and Solutions. C.-T. Li (ed.), IGI Global, pp.470-495 [ISBN 978-1-60566-836-9].

Frantzeskou, G., Gritzalis, S., & MacDonell, S.G. (2004) Source code authorship analysis for supporting the cybercrime investigation process, in Proceedings of the First International Conference on E-Business and Telecommunication Networks (ICETE'04). Setubal, Portugal, Kluwer Academic Publishers, pp.85-92.

MacDonell, S.G., & Gray, A.R. (2001) Software forensics applied to the task of discriminating between program authors, Journal of Systems Research and Information Systems 10, pp.113-127.

MacDonell, S.G., Gray, A.R., MacLennan, G., & Sallis, P.J. (1999) Software forensics for discriminating between program authors using case-based reasoning, feed-forward neural networks and multiple discriminant analysis, in Proceedings of the Sixth International Conference on Neural Information Processing (ICONIP'99/ANZIIS'99/ANNES'99/ACNN'99). Perth, Australia, IEEE Computer Society Press, pp.66-71.

Gray, A.R., Sallis, P.J., & MacDonell, S.G. (1998) IDENTIFIED (Integrated Dictionary-based Extraction of Non-language-dependent Token Information for Forensic Identification, Examination, and Discrimination): a dictionary-based system for extracting source code metrics for software forensics, in Proceedings of Software Engineering: Education & Practice (SE:E&P'98). Dunedin, New Zealand, IEEE Computer Society Press, pp.252-259.

Kilgour, R.I., Gray, A.R., Sallis, P.J., & MacDonell, S.G. (1998) A fuzzy logic approach to computer software source code authorship analysis, in Proceedings of the Fourth International Conference on Neural Information Processing (ICONIP'97/ANZIIS'97/ANNES'97). Dunedin, New Zealand, Springer-Verlag, pp.865-868.

Sallis, P.J., MacDonell, S.G., MacLennan, G., Gray, A.R., & Kilgour, R.I. (1998) IDENTIFIED: software authorship analysis with case-based reasoning, in Proceedings of the Addendum Session of the Fourth International Conference on Neural Information Processing (ICONIP'97). Dunedin, New Zealand, University of Otago, pp.53-56.