Researchers at Saarland University in Germany have developed a method that detects similarities between programs that can be used for identifying software piracy.
Called API Birthmark, the tool can scan a program and look for similarities it has with another piece of software based on its behaviour, according to Valentin Dallmeier, one of three researchers at the university’s Software Engineering Chair who developed this software analysis tool. What’s noteworthy about this approach, said Dallmeier, is that it compares and looks for similarities in the behaviour of the programs, rather than in the actual code.
Developers who illegally use code from a licensed program typically employ obfuscation techniques in an effort to evade detection by code-based scanning tools. Obfuscating the code does not necessarily change the functionality of the program, said Dallmeier.
“These obfuscation techniques only change the code in the program, but they cannot alter the behaviour without destroying the program and (losing) its functionality,” the researcher explained.
API Birthmark analyzes the behaviours of a particular program and compares them with other programs. The higher the degree of similarity between two different pieces of software the greater the likelihood of code theft is.
Dallmeier said API Birthmark looks at the interaction between a program and the operating system or the application programming interface (API), depending on the language the program was written. It then captures that interaction and compares it with other programs, he said.
The API Birthmark can be valuable to big software developers for conducting competitive analysis, said Michelle Warren, Toronto-based senior research analyst at Info-Tech Research Group.
API Birthmark can be used to evaluate other software products for possible copyright violations. It can also be used as an analysis tool in their own software development labs to ensure that their codes are not infringing on any copyrights, Warren explained.
The tool’s behaviour-based scanning method also makes it more effective than the traditional code-based analysis tools, she said.
“It’s like writing an essay,” said Warren. “Sentences can be created just coincidentally using the same words (as another piece of essay), but if we look at the actual idea and the thought patterns and the beliefs, that is really at the core of any kind of (intellectual property) theft.”
Looking at the behaviour of a program in the context of possible copyright infringement becomes more critical “as software piracy continues on its path of maturity.”
She added, however, that while this is a useful tool for detecting software piracy, it’s still ultimately up to the courts to decide whether a violation occurred.
“There will be a lot of grey areas there in terms of how to define code behaviour and program behaviour,” she said.
Even the scanning tool, such as the API Birthmark, itself would have to be thoroughly evaluated and analyzed before the results of its analysis can be deemed admissible in court, she added.
“And that might even depend on which court and in which country,” Warren said.
Canadian open source expert Russell McOrmond said the API Birthmark “seems to simply automate the most common method to detect this form of copyright infringement.”
He cited most copyright infringement investigations involving the Free/Libre Open Source Software (FLOSS) are triggered by noticing similarities in software behaviour.
“This type of tool only provides automation for that first glance and isn’t going to be helpful in the more thorough investigation,” said McOrmond, who is policy coordinator at Canadian open source group CLUE.
Dallmeier said they plan to release the API Birthmark code to the open source community in two to three months, in which case the methodology will be open to interested developers to take and develop for their own use.
The API Birthmark will also be presented at the upcoming Automated Software Engineering conference in Atlanta in November.