Evaluating Taint Specification Generators for Identifying Taint Sources in Relation to Data Safety Section (MOBILESoft 2025 - Research Track)

Who

Hiroki Inayoshi, Shoichi Saito, Akito Monden

Track

MOBILESoft 2025 Research Track

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 27 Apr 2025 14:45 - 15:00 at 211 - Session 3: Testing and Security

Abstract

Security and privacy researchers uncover privacy compliance issues in Android apps by utilizing taint analysis, which necessitates specific lists of sensitive source and sink APIs (i.e., taint specification). Since manually crafting a comprehensive and updated taint specification across the tens of thousands of APIs in the Android framework is impractical, automatic taint specification generators have been developed. Recently, two novel approaches, CoDoC and DocFlow, have emerged. On the other hand, Google introduced the Google Play Data Safety Section (DSS), where app developers disclose their apps’ privacy practices. Since taint sources vary by data category, DSS-related taint sources (DSSTSs) are essential for researchers analyzing DSS compliance. Currently, no studies on automatic DSSTS identification or relevant evaluation datasets exist. In this paper, towards automatic DSSTS identification, we evaluate CoDoC and DocFlow in identifying DSSTSs. We collect taint sources referenced in prior privacy-related studies and official documentation and map them to 11 DSS data categories to create a dataset of 505 APIs. Using this dataset, we evaluate CoDoC and DocFlow, finding that they perform well in certain data categories, suggesting suitability for category-specific investigation. Additionally, we apply CoDoC and DocFlow to classify the entire Android framework, revealing that both identify a substantial number of taint sources, ranging from 21% to 57%. We also show that executing a taint analyzer with the generated taint source lists produces substantial leak detections, imposing high verification costs. Our findings highlight the limitations of existing generators in the identification of DSSTSs. We release the dataset to the community.

File attachments

slides (inayoshi_mobilesoft2025_slides.pdf)	1.29MiB
paper (inayoshi_mobilesoft2025.pdf)	166KiB

Hiroki Inayoshi

Okayama University

Japan

Shoichi Saito

Nagoya Institute of Technology