Google Details its Use of Machine Learning to Identify Intrusive Mobile Apps

All too often, we search for an app and end up finding what looks to be the best fit for our needs. But that is until one sees the long list of permissions the application thinks it needs to function. Some developers tend to call for permissions for functionality that their app clearly does not need, like an expense tracker needing the RECORD_AUDIO permission, indicating a high possibility of a nefarious motive.

Google does realize that many such applications plague the Google Play Store. While the technologically adept users may keep a close eye on the permissions they grant to any app, the normal user usually just presses on “Accept” till they reach their end result. It then becomes Google’s “responsibility” to figure out a solution that protects the users from such intrusive applications while devising a solution that scales across the entirety of the Google Play Store and all future uploads.

Google’s approach to fighting this problem involves the application of machine learning to scale its solution. Google begins by analyzing privacy and security signals for each app in Google Play, and then compares that app to its functional peers i.e. other apps with similar features. Functional peers help set the baseline of behavior expected out of that group and apps belonging to this group that exceed the boundary of expected behaviors are then easier to identify. For example, a coloring book app does not need to have access to a user’s precise location, and this need can be established by analyzing other coloring book apps. Similarly, a navigation app does need precise location, and looking at other navigation apps would illustrate that the need for location permissions is within expected and accepted behavior.

Google utilizes machine learning to create these peer groups, letting it look beyond other methods like manual curation and fixed categories, methods that have their own drawbacks. Google’s approach uses “deep learning of vector embeddings to identify peer groups of apps with similar functionality”. This uses app metadata such as text descriptions and user metrics like number of installs. Once the peer groups are established, anomalous behaviors are identified  for potentially harmful signals related to privacy and security from each app’s requested permissions and behaviors. The correlation between different peer groups and their security signals helps different teams at Google decide which apps to promote and determine which apps deserve a more careful look.

The results are also used to help app developers improve the privacy and security of their own apps, though Google did not expand on how exactly this is done, or how exactly it segregates apps that have an honest oversight in its security from apps that have malafide intentions. Perhaps this part of the process is done after manual inspection of these apps by Google’s security and privacy experts, but we are left to draw Google’s word on this.

Nonetheless, it is good to see Google tackling the issues of permission at the store level as well, with a solution that works best for the scale of the Android ecosystem.


What are your thoughts on Google’s methodology for identifying intrusive apps? Is employing machine learning the best way to go about it? Let us know in the comments below!

Source: Android Developers Blog