Field-Sensitive Over-Tainting Reduction in IFDS Taint Analysis via CFL-Reachability
IFDS taint analysis is inherently context- and flow-sensitive, allowing precise encoding of field sensitivity in access-path generation. However, this precision comes at the cost of over-tainting—marking more data facts as tainted than necessary. The root cause is the undecidability of solving two context-free language reachability problems along the same dataflow path, which forces $k$-limiting as an over-approximation of field sensitivity. Consequently, spurious access paths are introduced, increasing analysis time, memory usage, and false positives, especially in large-scale applications.
To address this challenge, we present TnFix, a CFL (Context-Free Language)-reachability-based technique for mitigating over-tainting in IFDS taint analysis. The key insight is that the field sequence of any candidate tainted access path can be checked by a deterministic finite automaton (DFA) that accepts feasible sequences of field accesses. TnFix builds these DFAs by first solving a lightweight field-sensitive CFL-reachability problem to construct a Field Points-to Graph (FPG) that integrates data flows from taint sources and library summaries, and then converting the FPG into per-object DFAs. During taint analysis, TnFix queries these DFAs to prune access paths whose field sequences are rejected, eliminating the spurious paths introduced by $k$-limiting and improving precision without sacrificing scalability.
In a comparative evaluation against FlowDroid on a set of 36 widely used Android apps for taint analysis, TnFix successfully analyzes 7 apps that FlowDroid cannot complete within a three-hour time budget. For the remaining 29 apps, it improves analysis speed by an average of 2.5× and reduces false positives by an average of 12.2%. TnFix thus establishes the first CFL-based optimization framework for reducing over-tainting in IFDS taint analysis, delivering substantial gains in both efficiency and precision for practical use.