China Institute for Socio-Legal Studies, Shanghai Jiao Tong University

2025-04-01 [author] Su Yu preview：

[author]Su Yu

[content]

On the System Establishment of Algorithm Interpretation

Su Yu

Abstract: The interpretation of algorithms is of paramount importance in the realm of algorithm governance. it serves to protect rights enable social interactions, and manage risks. The technical impediments to interpreting algorithms are increasingly being surmounted, and interpretations can now be achieved through a variety of technical mechanisms. Within the sphere of governance of algorithms, the choice of interpretation approaches and technological options should be determined based on usual, critical, and contentious scenarios Algorithm interpretations can be solicited through mechanisms such as freezing, sampling, and mirroring, which then undergo external validation and review to ensure their authenticity and efficacy. The array of mechanisms for interpreting algorithms should be further organized into a comprehensive regulatory system. Within it, the rational setting of requirements such as the routes and precision of interpretations, time limits, and liability for interpretation flaws can balance social benefits against regulatory burdens.

Key words: Interpretations of algorithms: verification of algorithms: algorithmic black box; algorithmic transparency; machine learning; algorithmic governance

Algorithmic interpretation is steadily emerging as one of the most compelling topics in both the study of algorithmic governance and legal practice. Theoretically, the legal relationships concerning algorithmic interpretation have long “become one of the core propositions of algorithmic jurisprudence research”; practically, “legislators around the world are increasingly insisting that algorithmic decision-making be explainable, thereby placing it at the forefront of the algorithmic governance agenda.” Artificial intelligence algorithms, exemplified by machine learning, have introduced the novel challenge of the “algorithmic black box,” thrusting the issue of algorithmic interpretation into the spotlight. While traditional automated decision-making—characterized by clear rules and logical structures—has not engendered substantial difficulties regarding damage attribution and redress of rights, algorithmic models possessing an inherent “black box” quality indeed present such challenges. The processes of machine learning and decision-making are concealed behind multiple “hidden layers,” rendering the attribution of responsibility, effective oversight, and the provision of remedies for algorithmic decisions significantly more arduous. Should the actual criteria underpinning algorithmic decisions—such as the range of considered factors and their respective weights—be transparently elucidated, the principal challenges of artificial intelligence-driven decision-making could be surmounted, thereby largely mitigating the associated risks within the framework of existing governance systems. Consequently, in a remarkably brief span, both domestic and international academic communities have witnessed an unprecedented surge in research on issues related to algorithmic interpretation, the right to an interpretation, algorithmic interpretability, as well as algorithmic transparency and disclosure.

Simultaneously, both the institutional practices and technical research related to algorithmic interpretation are continuously advancing. For example, the provisions on the right to an interpretation contained within the European Union’s General Data Protection Regulation (hereinafter “GDPR”) have long attracted significant attention. In the United States, President Biden’s administration—through the Executive Order on the Safe, Reliable, and Trustworthy Development and Use of Artificial Intelligence—has set forth requirements for the transparency of AI models and for regulated entities to be capable of explaining those models. In our country, aside from Article 24 of the Personal Information Protection Law (PIPL), various regulatory bodies have integrated the relevant mechanisms for algorithmic interpretation into the practice of algorithmic governance through normative documents such as the “Administrative Measures for Algorithmic Recommendation in Internet Information Services” (hereinafter “Algorithmic Recommendation Measures”), the “Interim Measures for the Administration of Generative Artificial Intelligence Services” (hereinafter “Interim Measures”), and the “Guiding Opinions on Strengthening Comprehensive Governance of Internet Information Service Algorithms.” On the technical front, research on algorithmic interpretation is thriving, with diverse technological approaches excelling in their respective domains, and the underlying mathematical principles are gradually being unveiled.

However, this does not imply that the research on algorithmic interpretation is “complete,” nor that the scholarship—particularly within legal studies—has adequately met the practical demands and challenges. The mere establishment of principled provisions for algorithmic interpretation or transparency is far from sufficient to regulate and guide the algorithmic interpretation of major platform companies’ primary models. This is because these super-large and complex models often encompass billions of raw and vector features (or data graphs of comparable scale), and the magnitude and specific characteristics of large generative AI models render algorithmic interpretation substantially more challenging. Moreover, enterprises enjoy almost unrestricted discretion in providing algorithmic interpretations; the continual modifications and iterations of algorithmic models may render such interpretations rapidly obsolete; users are often unable to ascertain the accuracy and validity of the provided interpretations; and judicial bodies typically lack the capacity to effectively review algorithmic interpretations. All these factors contribute to significant institutional challenges for algorithmic interpretation—a subject that lies at the core of algorithmic governance research. With the advent of generative AI, the status of algorithmic interpretation and its related mechanisms is further being questioned, necessitating a thorough investigation into whether the existing framework retains sufficient importance and practical relevance.

Ultimately, these challenges can be distilled into four fundamental questions: How is algorithmic interpretation possible? Why is algorithmic interpretation necessary? How can algorithmic interpretation be implemented? And how can algorithmic interpretation be faithfully preserved? By clearly addressing each of these questions, the overall contours of an effective regime for algorithmic interpretation will become distinctly evident.

1. How Is Algorithmic Interpretation Possible? The Technical Principles of Algorithmic Interpretation

The Logical Premise of Algorithmic Interpretation: “Black-Box” Algorithms Lead to Obscured Decision Rules

“Algorithmic interpretation” is a term of art with a specific connotation. Here, “interpretation” refers to an interactive interface between human users and (machine) decision-makers that functions both as an accurate proxy for the decision-maker and as an interpretable construct for humans. This distinguishes algorithmic interpretation from related concepts such as algorithmic disclosure, algorithmic transparency, or algorithmic openness. Essentially, algorithmic interpretation aims to provide an intermediary mechanism that, in a manner comprehensible to human cognition, reveals which input variables—or combinations thereof—influence the output and to what extent. Users, affected parties, or the public seek to understand why an algorithmic system renders a particular decision, how various factors influence its judgments and determinations, and consequently whether ethical or legal risks such as discrimination or bias may be present.

Algorithmic interpretation presupposes the existence and deployment of “black-box” algorithms. Not all algorithms are inherently black-box, nor do all algorithmic models require interpretation. Traditional automated decision-making typically relies on clearly delineated causal relationships, whereby human agents need only understand the decision rules to discern the causal linkage between inputs and outputs, identify system flaws, or detect factors that might adversely affect them, thereby enabling accountability or rights protection. In contrast, artificial intelligence algorithms—exemplified by deep learning—base their decisions on correlations rather than causality. The relationship between inputs and outputs in such systems is highly nonlinear, obscuring the decision logic and precluding straightforward deduction or quantification of the relationship via simple rules. Consequently, with the advent of machine learning, the “black-box” problem emerged, thereby generating the need for algorithmic interpretation. There exists significant variability among different algorithms regarding their intrinsic interpretability and the consequent demand for interpretation. Algorithms founded on explicit causal rules—such as decision trees or Bayesian inference—are inherently more interpretable and are often regarded as “self-explanatory” models that require little to no additional elucidation. In contrast, models with pronounced black-box characteristics, such as random forests, support vector machines, and most notably deep neural networks, typically necessitate interpretation.

Various machine learning algorithms endeavor to have machines approximate the target function of the decision process as closely as possible. The existence of the “black box” is primarily attributable to a design approach that employs a composite of numerous simple functions to approximate a complex target function. The Kolmogorov–Arnold Representation Theorem demonstrates that any multivariate continuous function can be represented as a finite composition of univariate continuous functions. Based on this theorem, deep neural networks and similar black-box algorithms can stack these univariate continuous functions to approximate the target function, thereby achieving the “universal approximation” capability. This composite structure exhibits quintessential black-box characteristics, entirely detached from conventional human analytical reasoning and decision-making logic, and far removed from intuitive human understanding. In the empirical world, the relationship between input variables and output results is often highly nonlinear; approximating such a relationship necessitates interjecting multiple nonlinear computational processes—such as the nonlinear activation functions in neural networks—into the composite structure. The intricate alternation between linear and nonlinear computations renders the relationship between inputs and outputs difficult to express via a single explicit formula, thereby engendering the so-called “unexplainability” that stands in stark contrast to human logical reasoning. Moreover, the various intermediate variables and parameters in the algorithmic decision-making process do not correspond to any meaningful entities in the real world and cannot be used—as in self-explanatory models like rule-based trees—to articulate the underlying logic in terms of real-world concepts or objects. Even in more recent architectures such as Transformers—which are comparatively more interpretable yet still possess universal approximation capabilities—the vectors that determine the output (e.g., Query, Key, Value, and WO) generally lack direct corresponding real-world concepts, necessitating the establishment of cognitive mappings and connections by human interpreters.

The “universal approximation” capability has underpinned the success of machine learning, but it has also given rise to the inherent “black-box” nature of artificial intelligence. Consequently, humanity must seek approaches to comprehend these black-box algorithmic models. Fundamentally, algorithmic interpretation is about constructing a framework of concepts, meanings, and structures that aligns with human cognitive schemas in order to “approximate” the state of an algorithmic model at a given moment, thereby enabling the formation of informed expectations regarding its operation.

2. Why Is Algorithmic Interpretation Necessary? The Jurisprudential Significance of Algorithmic Interpretation

The jurisprudential significance of algorithmic interpretation can be summarized in terms of its role in protecting rights, facilitating social interaction, and governing risks. Each of these dimensions contributes to bringing artificial intelligence algorithms back within the realm of human comprehension, communication, and control—thus enabling algorithmic governance to penetrate the technical “black box” in a meaningful manner.

2.1 The Significance for Rights Protection

Given the diverse applications of algorithms in both the public and private spheres, algorithmic interpretation is closely linked to the protection of various legal interests—such as the right to information, equality, the right to be heard, privacy, personal data interests, and even human dignity—thereby implicating new normative demands and legal relationships. In the face of a “black box,” users, affected parties, and the general public primarily seek to understand the specific reasons behind algorithmic judgments or decisions. This inquiry concerns whether the human agents behind these algorithms have properly considered relevant factors in accordance with legal, ethical, and value-based standards, and whether such considerations have been accurately reflected through the design and training of the algorithmic model. In everyday life, numerous rights or interests that have not yet been fully “rightsized” urgently require the safeguard provided by algorithmic interpretation. A clear understanding of algorithm-related information assists users in mitigating risks, making informed choices, and seeking legal remedies.

The corresponding right in this context is the right to an algorithmic interpretation, or the right to request one. This right may take the form of a general entitlement to an interpretation of the system’s functionality or a specific right to an interpretation of the decision-making basis in individual cases. At its most fundamental level, this right may be seen as an extension of the “right to know,” which can be further reinforced as the “right to understand.” On an even deeper level, it can be constructed as a due process right that transcends the public–private divide: before one’s fate or interests are determined by an algorithmic decision, the affected party should have a reasonable opportunity to understand the logic and rationale behind that decision and to present arguments or defenses to the automated system or its human decision-makers. Globally, many scholars have explored the concept of “technological due process,” which is increasingly recognized within public law circles. In the context of the algorithmic “black box,” algorithmic interpretation is an essential tool for upholding these due process values—it provides individuals with the opportunity to understand the underlying logic of algorithmic decisions, thereby helping to avert exploitation by opaque technological systems and enabling them to take actions that best protect their rights when confronted with adverse decisions.

Nevertheless, because the technology and practice of algorithmic interpretation are still immature, the institutionalization of a right to an algorithmic interpretation faces significant challenges. Internationally, the formulation of the GDPR has spurred extensive research into such a right. Although the importance of algorithmic explainability was repeatedly emphasized during the drafting of the GDPR, its final text does not explicitly establish a right to an algorithmic interpretation; the provisions most closely related to this right—namely Article 22 and Recital 71—remain subject to debate as to whether they truly constitute a legally enforceable right. In China, Article 24(3) of the Personal Information Protection Law (PIPL) states, “When a decision that significantly affects an individual’s rights is made solely through automated means, the individual shall have the right to request an interpretation from the personal information processor and to refuse decisions made solely by automated means.” Some contend that this provision establishes a right to algorithmic interpretation; however, whether the term “interpretation” here necessarily includes a full algorithmic interpretation remains inconclusive.

In other domestic and international legislative contexts, the adverse action notice provisions of the U.S. Equal Credit Opportunity Act (ECOA) and the Fair Credit Reporting Act (FCRA), as well as the algorithmic interpretation principles referenced in the 2017 “Statement on Algorithmic Transparency and Accountability,” appear to endorse a right to an algorithmic interpretation. Yet, these instruments have not definitively established such a right; at most, they “encourage” institutions that use algorithmic decision systems to voluntarily provide interpretations of their processes and specific decisions. Notably, in the aforementioned statement, this encouragement is the only one among seven principles that is not mandated. Similarly, the U.S. Algorithmic Accountability Act (draft) includes only a limited right to an algorithmic interpretation, and that draft has yet to be enacted into law. In China, the draft version of the “Administrative Measures for Algorithmic Recommendation in Internet Information Services” attempted to impose an obligation to optimize algorithmic explainability and transparency, but the version that eventually came into effect renders this requirement merely advisory.

These legislative examples indicate that although algorithmic interpretation has an irreplaceable role in protecting rights within an algorithm-driven society, its full potential has not yet been realized in legal practice. At present, a variety of regulatory approaches and corresponding legal norms—both domestically and internationally—aim primarily to mandate that developers fulfill obligations of algorithmic transparency. Should institutional mechanisms be developed to set reasonable algorithmic interpretation schemes for different application scenarios—ensuring that interpretations are reliable, feasible, and accessible—the rights-protective significance of algorithmic interpretation would be further enhanced.

2.2 The Significance for Social Interaction

Algorithmic interpretation also bears significant social communicative value by deepening trust in technology and fostering social harmony. Algorithmic decision systems that affect individuals’ vital interests can be viewed as a form of “technology-driven rule-making,” and people are unlikely to comply with rules they do not understand. A lack of knowledge, comprehension, or acceptance of algorithmic decisions can provoke social disputes and conflicts, potentially escalating into mass confrontations against algorithmic systems. Explaining why an artificial intelligence system operates in a certain manner helps dispel misunderstandings about AI and its applications, thereby cultivating an accurate public perception and healthy confidence in the development of the AI industry. On a deeper level, the so-called “algorithmic black box” abstracts the foundational logic of algorithmic decisions away from the semantic and communicative context of human interactions, thereby severing the interpersonal connections essential to social exchange and potentially provoking profound societal discord. Chinese practices in algorithmic governance have recognized this issue; for instance, Article 12 of the draft “Administrative Measures for Algorithmic Recommendation in Internet Information Services” encourages providers to “optimize the transparency and explainability of retrieval, ranking, selection, recommendation, and display rules” in order to “avoid adverse impacts on users and prevent or reduce disputes.” Even if the interpretations produced by algorithms cannot be directly understood by all audiences, as long as they can be comprehended by an educated non-specialist audience, the multi-tiered social dissemination of knowledge—from professional circles to the general public—will help broaden understanding and consensus regarding algorithmic interpretations, thereby enhancing overall societal comprehension of relevant algorithms.

Furthermore, algorithmic interpretation not only improves acceptance of algorithms and their appropriate applications but also helps bridge the “digital divide.” In terms of the digital divide among different social groups, the original theory distinguished between the “access gap” and the “usage gap.” Later, scholars introduced the concept of the “knowledge gap” (first proposed in the 1970s), which is influenced by communication skills, foundational knowledge, and social relationships. Disparities in information supply, differences in information utilization, and divergent strategies for information reception can all lead to or exacerbate a knowledge gap. This gap implies inequality in the ability to acquire and process information, ultimately giving rise to an information crisis characterized by disparities in information, fragmented knowledge, and widening socio-economic divides. Algorithmic interpretation holds the potential to narrow this digital divide because varied methods and levels of algorithmic interpretation can enable the public—even those without specialized knowledge of algorithms—to develop an intuitive understanding of the logic behind algorithm design and the factors that determine algorithmic outcomes, thereby mitigating information asymmetry and bridging the knowledge gap.

2.3 The Significance for Risk Governance

The attention given to algorithmic interpretation by scholars of algorithmic governance is inextricably linked to its role in risk governance. At a fundamental level, there are two justificatory grounds for providing interpretations for artificial intelligence: an intrinsic rationale that focuses on the rights of those affected—recognizing the need for individuals to exercise free will and self-control—and an instrumental rationale, wherein the explainability of AI systems (including the interpretation itself) serves as a tool for improving AI performance and rectifying errors. Consequently, algorithmic interpretation can operate as a risk governance mechanism that reconciles subjective needs with objective requirements. As early as 2019, the European Union’s Algorithmic Accountability and Transparency Regulatory Framework repeatedly stressed the importance of algorithmic interpretation, noting its critical role in risk governance. Furthermore, the Artificial Intelligence Risk Management Framework published by the U.S. National Institute of Standards and Technology in 2023 not only set forth explicit requirements for explainability but also incorporated effective algorithmic interpretation as one of the criteria for risk measurement.

Overall, algorithmic interpretation can defend against, mitigate, and regulate algorithmic risks on three levels. First, it illuminates the logic and influencing factors behind algorithmic decisions, providing users, affected parties, and regulators with essential information to identify, avoid, or control risks—thereby significantly extending risk prevention lead time, reducing regulatory latency, and lowering oversight costs. Second, algorithmic interpretation facilitates the external dissemination of critical information embedded in algorithmic models. This not only enables external experts to promptly detect latent risks but also assists algorithm operators and professional third parties in discerning the actual decision logic and outcomes, thus prompting timely model adjustments and improvements. Third, algorithmic interpretation strengthens accountability; through quantifiable and counterfactual interpretations, it becomes clearer who bears responsibility for algorithmic decisions, which, in turn, compels the involved parties to bolster their risk defense and response mechanisms. For government activities employing algorithmic decision-making, algorithmic interpretation additionally helps curb the abuse of public power by ensuring that AI-driven decisions are subject to adequate oversight. In some U.S. cases, the absence of algorithmic interpretation has been deemed a violation of due process, leading to legal liabilities for administrative bodies; similarly, instances in which reliance on algorithmic systems resulted in unjust reductions of home care hours for Medicaid recipients have been corrected through judicial intervention.

The risk governance significance of algorithmic interpretation fundamentally lies in its capacity to reduce the information asymmetry caused by “black-box” technologies by furnishing essential foundational data to multiple stakeholders, thereby mitigating uncertainty. With sufficient information in hand, uncertainty can be measured and reduced through various approaches based on information theory, thus enabling governance bodies to devise optimal regulatory strategies or management solutions even under uncertain conditions.

The tripartite significance of algorithmic interpretation—in safeguarding rights, enhancing social interaction, and governing risks—renders it an indispensable foundation within algorithmic governance. Yet, legal scholars are chiefly concerned with how to construct algorithmic interpretation as a predictable and effective institutional mechanism that can be embedded within a legal framework, ensuring that its roles in protecting rights, facilitating social communication, and managing risks are reliably and comprehensively realized. Institutions are, after all, human-designed constraints that shape interpersonal interactions and delineate the spectrum of choices available, thereby reducing uncertainty in social life.

3. How Can Algorithmic Interpretation Be Implemented? The Main Implementation Paths and Scenario Adaptations

To date, researchers have explored multiple approaches that, to varying degrees, reveal the decision-making mechanisms of algorithms. Although no single method yet precisely measures human understanding in this domain or establishes a unified evaluation and review mechanism for algorithmic interpretation, these approaches allow for flexible and open exploration of richer ideas and methods. Overall, research on algorithmic interpretation has yielded a wide variety of technical routes and implementation schemes that facilitate the contextualization and institutionalization of algorithmic interpretation.

3.1 Primary Implementation Paths for Algorithmic Interpretation

In the broadest sense, the fundamental approaches to algorithmic interpretation can be divided into two categories. The first category is based on the intrinsic logic of the algorithmic model, offering narrative and graphical interpretations that cannot be precisely verified through numerical metrics. The second category provides a quantitative description of the correlation or causal relationship between “inputs and outputs.” For convenience, we refer to those interpretations that follow specific technical methodologies and permit quantification of accuracy and precision as “hard interpretations,” while those that offer narrative or graphical descriptions that are less amenable to fine quantification and evaluation are termed “soft interpretations.” The content and methods of these two types are clearly different, as are the scenarios in which they are applicable.

3.1.1 Hard Interpretations

Hard interpretations require that the relationship between inputs and outputs be analyzed quantitatively, yielding results that can serve as precise evidence or decision-making references for risk governance and rights protection. Given that the effect of a particular input variable on the output can be strongly nonlinear, its impact may be expressed using a statistical value (e.g., an average), an interval (e.g., upper and lower bounds), or a coarse-grained metric (e.g., an influence rating).

Within the realm of hard interpretations, despite the relatively short development history, the technical routes have become quite diverse. For example, research based on interpretable models has produced primary schemes including decision trees (DT), decision rule sets (DR), feature importance (FI) measures, saliency masks (SM), sensitivity analysis (SA), local dependency graphs, prototype selection (PS), and activation maximization (AM). Essentially, the basic pathways underlying different interpretation schemes boil down to a combination of two mechanisms: one is reformulation, i.e., using a more interpretable model or method (such as a regularized model or a visualizable approach) to approximate the original model’s “input–output” relationship—as seen in DT, DR, and PS, which articulate the relationship through a series of rules or prototypical instances that are more comprehensible to humans; the other is intervention, i.e., deliberately controlling and adjusting input variables to reveal how variations in these inputs affect outputs, as exemplified by SA and AM, which incorporate strategies for input manipulation. It is worth noting that intervention (or manipulation) is also one of the fundamental approaches to causal discovery. Even though relying solely on intervention may not fully reconstruct the entire causal chain in large-scale algorithmic models, for the interpretation of any specific classification outcome, intervention can effectively establish an attribution between inputs and outputs and further provide a quantifiable report on algorithmic transparency.

Among the various hard interpretation routes, some techniques have increasingly demonstrated special significance. Take, for example, the Shapley value method, which can clearly display both the direction and magnitude (i.e., the quantitative contribution) of each input variable on any binary classification outcome. The Shapley value method differs from other FI or SA approaches in that it satisfies, in an unbiased manner, the properties of linear additivity, the dummy property, symmetry, and efficiency—qualities that enable it to comprehensively, deterministically, reliably, and effectively reveal the quantitative contributions of all input variables to the output. Although its computational cost is currently relatively high, it has become a key interpretation method that is closely monitored both in theory and in practice. In the future, hard interpretation methods that satisfy these or similar criteria may have the opportunity to play a decisive role in critical scenarios.

3.1.2. Soft Interpretations

Soft interpretations offer considerable flexibility in terms of their form and level of detail. They are primarily aimed at realizing the social communicative function of algorithmic interpretation while simultaneously addressing rights protection and risk governance needs. In scenarios for rights protection where precise calculation is not required, soft interpretations can, through more vivid and illustrative commentary, explain issues of concern to users, affected parties, or the public—particularly by providing accessible, personalized interpretations that cater to individual characteristics. In contexts where the exercise of the right to an algorithmic interpretation is based on relational communication and trust principles, soft interpretations are indispensable. For instance, in September 2021, Meituan disclosed the calculation rules for delivery times via a combination of text and graphics, elucidating the algorithmic logic behind the estimated delivery time; this constitutes a typical practice of soft interpretation. For many users or affected parties lacking specialized knowledge or quantitative skills, soft interpretations can sometimes be more flexible and practical.

Moreover, soft interpretations can be integrated with hard interpretations to provide more in-depth commentary and elaboration on the quantitative or causal structures furnished by hard interpretations. While soft interpretations do not necessarily exclude the presentation of numerical data, legal norms should also establish baseline requirements for soft interpretations to ensure their overall effectiveness.

3.2 Choice of Algorithmic Interpretation Path

The diversity of artificial intelligence algorithms and their applications is extraordinarily rich, and the requirements for algorithm performance and characteristics vary greatly across different scenarios. Consequently, the selection of an algorithmic interpretation path should be specifically determined based on the principle of scenario-based regulation. From a static perspective of rights protection, in key scenarios that involve significant public interests or the safeguarding of fundamental individual rights (such as those concerning life, health, personality, or sensitive personal data), algorithmic interpretations should, as far as possible, guarantee precision and timeliness. From a dynamic perspective—such as in cases of legal disputes or controversies—algorithmic interpretations should not only provide generalized interpretations but also specifically address points of contention. Therefore, the selection of an algorithmic interpretation path should be differentiated into at least three scenarios, which may be designated as “routine scenarios,” “critical scenarios,” and “dispute scenarios,” each adopting different interpretation approaches.

3.2.1. Routine Scenarios

In routine scenarios that do not involve major public interests or significant individual rights, and where no legal dispute exists, algorithmic interpretations need only meet basic informational requirements. In these cases, operators should be allowed to choose their own algorithmic interpretation path. Even if the interpretation provided is relatively comprehensive, most individuals are unlikely to invest substantial time and effort in understanding it, so imposing an excessive regulatory burden is unnecessary. For example, in the document “Explaining Decisions Made with AI,” prepared jointly by the UK Information Commissioner’s Office and the Alan Turing Institute, the first type of interpretation—termed “principle interpretation”—merely requires that the reasons behind an AI decision be explained in an accessible, non-technical manner. In routine scenarios, the social communicative value of algorithmic interpretation is paramount; it is sufficient if the general public can grasp the basic rationale behind algorithmic decisions. Here, soft interpretations may even serve as the primary interpretation method, since the gap between the technical capabilities of interpretation and the effective communication of that interpretation necessitates a comprehensive consideration of users’ capacities. The textual and graphical elements of soft interpretations can perform effectively in this context. In recent years, major international internet platforms such as Amazon, Google, YouTube, and Uber have included a degree of algorithmic disclosure in their privacy policies, which can be regarded as a practice of providing soft interpretations in routine scenarios—a mechanism that can similarly be applied to domestic algorithmic governance.

3.2.2. Critical Scenarios

In critical scenarios involving significant public interests or the protection of substantial individual rights, quantitative information is indispensable for achieving the objectives of both rights protection and risk governance. Accordingly, operators should be required to provide hard interpretations to the fullest extent possible. If the algorithm is not overly complex, even a simplified hard interpretation may be acceptable. For example, Article 5 of the EU Regulation 2019/1150 on Enhancing Fairness and Transparency for Business Users of Online Intermediation Services mandates that providers specify the primary parameters used to determine rankings, along with the rationale for their relative importance; if a merchant’s direct or indirect payment might affect ranking, the provider must also explain these possibilities and their impacts on the ranking. Such hard interpretations constitute an important component of current international practices in algorithmic disclosure, aimed at protecting the legitimate rights of platform merchants or users. As long as the interpretation accurately encompasses the causal or correlational relationships that legal norms emphasize, the operator should have reasonable discretion in selecting the specific method of interpretation.

3.2.3. Dispute Scenarios

In scenarios involving legal disputes or controversies, algorithmic interpretations must be capable of providing detailed and clear elucidation of the key factors at issue. Such interpretations should satisfy the requirements for legal redress by providing evidence with sufficient confidence to determine whether a specific output was caused by a particular factor. Even if technical limitations currently prevent the generation of conclusive proof, the interpretation should at least meet the evidentiary threshold to be admissible. In this regard, algorithmic interpretation should focus on specific causal relationships, and requiring operators to provide counterfactual interpretations is presently the most intuitive option. Counterfactual interpretations attempt to determine how changes in input variables could yield a specific output, without necessitating the opening of the “black box” and thus without incurring the risk of data leakage. In several jurisdictions, legal practitioners are already accustomed to using counterfactual interpretations to construct causal evidence in civil litigation, and Australian regulatory bodies and courts have employed such methods to analyze the algorithmic details of AI tools to protect consumer rights. In the prominent case before the Australian Competition and Consumer Commission against YouTube, methods such as Partial Dependence Plots (PDP), Accumulated Local Effects (ALE), and counterfactual interpretations were utilized to investigate the prioritization and impact of various factors within the algorithm, thereby providing a robust basis for the court to determine that the company’s algorithm misled consumers.

Requiring operators to provide counterfactual interpretations does not preclude them from offering other types of algorithmic interpretation simultaneously; rather, various forms can mutually corroborate one another, providing comprehensive factual support for dispute resolution. However, counterfactual interpretation should be regarded as a mandatory component—especially in scenarios where the effect of a key variable is questioned by the affected party. Such an interpretation can clearly indicate the extent to which modifications in that variable would materially affect the algorithm’s output, thereby more convincingly demonstrating whether the model has incorporated any inappropriate considerations.

Currently, various algorithmic interpretation schemes are continuously maturing on both technical and practical levels, gradually establishing a realistic basis for being recognized as a legal obligation for AI operators. Nevertheless, the actual effectiveness and impact of algorithmic interpretation may be subject to skepticism—“because the algorithm itself continually learns and evolves, an interpretation disclosed at one moment may become outdated almost immediately.” Additional challenges have emerged: once an interpretation is generated, will it quickly become invalid due to ongoing iterations and adjustments in the algorithmic model? How can the phenomenon of “inadequate” or superficial interpretations be avoided? For large or even ultra-large algorithmic models, how can one, under limited time and technical constraints, determine whether the interpretation is accurate? Only by addressing these issues can we ensure that algorithmic interpretations are genuine, effective, and fulfill their foundational role in algorithmic governance.

4. How Can Algorithmic Interpretation Be Faithfully Preserved? Mechanisms for Fixation, Verification, and Review

To address the aforementioned challenges, algorithmic interpretation must be supported by at least three complementary mechanisms: First, because the underlying code of machine learning algorithms is continuously evolving, iterating, and being modified, it is essential to establish a fixation mechanism for algorithmic interpretations—ensuring that once an interpretation is provided, it remains fixed for a certain period, thereby facilitating verification and evidence preservation. Second, in scenarios where precise quantification of rights protection is required, an algorithm verification mechanism should be introduced to assess whether the interpretation is genuine and accurate. Third, for the purposes of regulatory accountability and rights redress, if an algorithmic interpretation causes harm or gives rise to disputes, a fair and professional review mechanism must be established. These mechanisms are particularly indispensable for “hard interpretations,” which play a key role in rights protection and risk governance.

4.1 Fixation of Algorithmic Interpretations in the Pursuit of Justice and Order

Continuous changes in code and parameters have always posed a significant challenge to algorithmic interpretation, which can easily fall into the trap of “fixing a boat in midstream.” It is imperative that algorithmic interpretations be fixed in some manner to avoid becoming obsolete immediately after being provided, or to prevent ad hoc modifications to the algorithm that fabricate interpretations. Technically, there are three main approaches to fix algorithmic interpretations: (1) requiring that the algorithm and its parameters be frozen (i.e., no further iteration or human modification allowed) for a predetermined period after an interpretation is provided; (2) mandating that, when an interpretation is produced, certain technical measures be taken to preserve or fully provide a mirror image of the algorithmic model (including its key parameters); (3) providing a sufficiently rich sample of parameters along with fidelity-preserving measures, so that regulators and users can verify the model’s performance over a specific time period.

4.1.1. Freezing Mechanism

For algorithmic models that do not involve the protection of critical individual rights (such as those used in entertainment content recommendations), if an interpretation is required, operators may be required to restrict or “freeze” changes to the model and its parameters for a certain period after the interpretation is provided. This allows users, affected parties, or regulators sufficient time to effectively verify the interpretation. When technical conditions permit, a “soft freezing mechanism” may be applied to models that are too large or open—meaning that while changes are not completely prohibited, the degree of variation must not be sufficient to undermine the precision of the provided interpretation.

For models fixed via the freezing mechanism, whether the interpretation is inaccurate or subsequent unauthorized modifications render the interpretation inconsistent with the original model, the operator should bear equal legal responsibility for any inaccuracies in the interpretation. To more rigorously distinguish between these causes of erroneous interpretation, rules emphasizing the retention of a complete “audit trail”—as highlighted in the theory of technological due process—may be employed, requiring that the operator maintain a complete audit trail or system operation log from the time the interpretation is generated until the end of the designated freezing period, ensuring that the operator has faithfully fulfilled its obligations.

4.1.2. Mirror Mechanism

For algorithmic models that are frequently modified or iterated but are relatively small in scale, operators may be required to provide a mirror image of the algorithmic model at the time of interpretation for regulatory review; where commercial confidentiality or fair competition concerns do not apply, this mirror image can also be made available to users or affected parties for verification. This mirror image should accurately and completely include all the code and parameters of the model. For mature AI enterprises, backing up a mirror image of the model is a fundamental capability, and for containerized cloud platforms, the frameworks, libraries, and dependencies used in training can be integrated into the mirror environment. Although requiring such a mirror image may increase regulatory burdens, it remains a fundamentally feasible measure.

The provision of a mirror image aids in the in-depth and precise verification of the accuracy of the algorithmic interpretation; however, once provided, it also exposes commercial secrets related to the algorithm and data. To address this, aside from emphasizing the confidentiality obligations of regulatory agencies, technical protective measures may be adopted—such as storing the mirror image in a secure environment either by the operator or through a trusted third party with open verification interfaces—to ensure that commercial secrets remain protected.

4.1.3. Sampling Mechanism

If it is inconvenient for the operator to provide a mirror image, then, according to relevant standards or regulations, a professional organization or regulator may continuously collect, sample, and retain a certain number of genuine input–output samples from the model without disrupting its operation. These samples can then be used to verify the interpretation during the relevant time period. If the distribution of relationships between inputs and outputs in the samples significantly deviates from the quantitative contributions provided by the interpretation, or if changes in a particular input variable do not match the overall observed output changes as indicated by counterfactual interpretations, then the interpretation may be considered to lack authenticity or accuracy. More conclusive judgments can be provided by an algorithm audit mechanism.

The sampling mechanism generally has lower operational costs but is technically challenging. If a large sample size is required and the model is modified or iterated during the sampling process, slight adjustments to the precision requirements of the interpretation may be necessary to accommodate minor variations in the model and its parameters over time, or a “soft freezing mechanism” (as previously mentioned) may be combined to ensure the effectiveness of the sampling process. Determining, with minimal data collection, whether the interpretation is authentic and accurate requires sophisticated statistical research—a highly specialized and cutting-edge challenge.

4.2 Verification and Review of Algorithmic Interpretations

Once an algorithmic interpretation is effectively fixed, users, affected parties, stakeholders, and even the general public should be able to verify it under certain conditions, while regulatory or remedial agencies with oversight responsibilities may review the interpretation.

4.2.1. Verification of Algorithmic Interpretations

For AI applications and their interpretations that have a substantial impact on the rights of users, affected parties, or the public interest, it is theoretically essential to allow the parties involved, stakeholders, or the public to verify the authenticity and accuracy of the interpretation. However, there is concern that providing the interpretation might lead to the disclosure of commercial or technological secrets. To protect such secrets and competitive interests, a trusted verification process may be constructed—using the aforementioned “secure environment + open verification interface” approach or similar methods—so that relevant parties can verify the interpretation without accessing the source code or parameters. Nonetheless, even with such precautions, the verification process faces the risk of “model extraction”: attackers, by observing the relationship between inputs and outputs, may train a shadow model that mimics the performance of the target model, thereby extracting key parameters or hyperparameters, or even inferring some of the original training data. In this regard, the implementation of algorithmic verification must proceed cautiously, with certain restrictions—such as requiring real-name registration, limiting the number of operations, and maintaining verification operation logs—to prevent malicious competitors from using the opportunity to extract the model or launch other attacks.

4.2.2. Review of Algorithmic Interpretations

The review of algorithmic interpretations requires a high level of expertise; reviewers must possess a deep understanding of AI algorithms and statistical principles. In administrative oversight or in administrative reconsideration cases submitted to higher authorities, regulatory agencies such as those responsible for cybersecurity, public security, or industry and information technology should assign specialized personnel to handle the review of algorithmic interpretations. In cases involving the review and litigation of algorithmic interpretations, mechanisms such as expert consultation in administrative review and expert juror systems should be explored—for example, by incorporating professionals who have been rigorously vetted into a specialized pool for selection and employing panel review and collegial decision-making procedures.

The key points for reviewing algorithmic interpretations should vary according to the interpretation path. For “soft interpretations” that do not contain numerical details of the algorithmic model, since they are less capable of providing finely grained explanatory results and lack precise quantitative metrics, the reviewer need only determine whether they meet the fundamental requirements of algorithmic interpretation—that is, whether their content is consistent with the actual logic of the algorithm’s design, free from false or exaggerated statements, and capable of effectively dispelling general doubts and enhancing trust among users or affected parties. For soft interpretations that include numerical data and for various forms of hard interpretations, specific standards and guidelines are needed to direct the review process. Such reviews generally require detailed data from the algorithmic decision process. For instance, in the United States, widely used applicant tracking systems (ATS) and similar tools provide support for compliance audit tracking in algorithmic employment decisions. Companies, using ATS, can lawfully obtain a series of identity and behavioral data on job applicants and make decisions accordingly, undergoing external compliance audits; the detailed data recorded throughout the algorithmic decision process is critical for demonstrating to reviewers that no discrimination has occurred. Given the rich variety of interpretation paths and methods, relevant standards and guidelines should incorporate diversified rules and indicators, similar to the design of Part 7 of the “Artificial Intelligence Algorithm Financial Application Evaluation Standard” (JR/T 0221-2021), which provides a wide array of optional rules covering existing technical routes depending on the type of algorithmic model. Only when these standards or technical specifications establish key review criteria and conformance indicators for the major interpretation methods can the review of algorithmic interpretations be truly evidence-based, predictable, and convincing.

5. Constructing an Algorithmic Interpretation System: Systematic Integration of Mechanisms

The establishment of mechanisms for choosing, fixing, verifying, and reviewing algorithmic interpretations not only helps realize the threefold significance of algorithmic interpretation but also transforms algorithmic interpretation into a central pillar of algorithmic governance. This systematic integration can robustly support other governance mechanisms such as algorithm registration and review, algorithm impact assessment, algorithm auditing, and algorithm accountability.

5.1 Basic Structure of an Algorithmic Interpretation System

An algorithmic interpretation system is based on the technical principles of algorithmic interpretation and embeds both the right to an interpretation and the corresponding obligations within legal relationships. It integrates the requirements for algorithmic explainability and transparency—as found in existing legal instruments like the “Administrative Measures for Algorithmic Recommendation in Internet Information Services”—into the standards for designing and operating algorithmic models. Operators must fulfill their algorithmic interpretation obligations in accordance with legally prescribed content and relevant technical standards. After providing an interpretation, in some cases the interpretation must be fixed using mechanisms such as freezing, sampling, or mirroring, and then be subject to verification and review based on the right to an interpretation or the corresponding obligation.

First, such a system can enrich the content of both the right to an interpretation and the corresponding obligation, linking them to different standards and requirements depending on the scenario, and thereby establishing clear, layered normative expectations. The right to an interpretation and the obligation to provide one do not necessarily have to correspond exactly; for certain algorithmic applications affecting significant public interests, even if legal norms do not explicitly grant users the right to an interpretation, operators can still be directly imposed with the obligation to provide one. When the right is invoked, the fulfillment of the interpretation obligation need not involve a detailed exposition of the system’s overall technical architecture or operational details; rather, it should primarily provide information that is closely related to the interests of the user or affected party. Nevertheless, if the operator is willing to offer a richer or more in-depth interpretation, such voluntary elaboration should be permitted and even encouraged by law.

Second, this system can link the requirements for algorithmic explainability with those for algorithmic transparency, allowing both to be fully realized. Algorithmic explainability refers to the objective attribute of a model—that is, whether a given algorithm is, by its technical design, capable of being explained. Algorithmic transparency, on the other hand, concerns the relationship between the algorithm’s output and subjective expectations—namely, the extent to which the provided interpretation reveals the internal logic of the decision-making process and the actual influence of specific factors, thereby enabling users to form a clear understanding and stable expectations regarding the algorithm’s operation. Clearly, algorithmic transparency can be achieved through algorithmic interpretation, and it can be directly linked with varying degrees of interpretation precision to serve users, affected parties, or the public; whereas the requirement for algorithmic explainability can guide operators to adopt design schemes that facilitate the fulfillment of their interpretation obligations in the earlier stages of governance. Of course, the implementation of algorithmic transparency principles should also be supplemented by ex post regulatory measures such as algorithm impact assessments, and accurate and reliable algorithmic interpretations can serve as important reference points for such assessments.

Third, this system helps clarify algorithmic responsibility. First, a systematic algorithmic interpretation system aids in determining whether operators have provided interpretations accurately, completely, and in a timely manner in accordance with the relevant standards and regulatory requirements, and whether they have fulfilled their compliance obligations in data and algorithmic governance. Second, if an erroneous interpretation leads affected parties to rely on it and consequently suffer losses, the fixation, verification, and review mechanisms within this system will help assess the extent to which the interpretation contributed to the damage and whether there was any culpable fault, thereby determining the operator’s liability. Third, algorithmic interpretations can assist administrative and judicial authorities in ascertaining issues related to the attribution of harm, responsibility, and the allocation of liability in cases involving algorithmic damage.

Finally, this systematic algorithmic interpretation system supports the operation of other governance mechanisms such as algorithm registration and review, algorithm impact assessment, and algorithm auditing. First, it enables some critical algorithmic models to provide an interpretation concurrently with their registration, thereby deepening and enhancing the effectiveness of the registration review process. Second, algorithmic interpretation results that conform to established standards and are supported by evidence are highly beneficial for assessing the actual impact of an algorithm; the accompanying verification reports can become a primary basis for the outcomes of algorithm impact assessments. Third, algorithmic interpretation and verification can facilitate algorithm audits—for example, the verification interface for algorithms can open channels for audit methods such as “grab audits,” “proxy audits,” or “collaborative audits,” especially benefiting third-party audits.

5.2 Value-Balancing Elements of an Algorithmic Interpretation System

5.2.1. Interpretation Path and Precision

The choice of interpretation path and the level of precision are critical factors in balancing regulatory burden and benefits in an algorithmic interpretation system. The path of algorithmic interpretation should not only encompass both soft and hard interpretations, but also consider intermediate pathways between them. In other words, a continuum exists between soft and hard interpretations. In scenarios where precise quantification of rights protection is paramount, algorithmic interpretations should lean more toward hard interpretations; conversely, in less critical situations, soft interpretations may be more appropriate.

This transitional continuum can be manifested as a “relative softening” of hard interpretations: if it is difficult to obtain exact weights or contributions of input variables on the output, or if there is concern that providing such precise details might reveal proprietary information embedded within the algorithm, operators should be allowed to introduce weight intervals or categorical ratings as a means to relax the precision of the quantification. For example, in a credit limit evaluation algorithm, if one assumes that the influence of education is 0.231 and that of income is 0.889, the influence of education might be expressed as ranging from 0.2 to 0.3 and that of income as ranging from 0.8 to 0.9, or alternatively, education may be rated as a “2-star” factor while income is rated as “4.5 stars.” This approach not only reduces the risk of professional attackers reverse-engineering the model through inference, but also makes it relatively simpler and more stable for enterprises to generate and maintain algorithmic interpretations. When employing a “relative softening” approach, the precision of the interpretation can be dynamically adjusted based on the needs for rights protection; in scenarios where the right to know, the right to understand, or due process rights are particularly critical, the precision of the algorithmic interpretation should be correspondingly enhanced.

5.2.2. Timeliness of Interpretation

Since algorithmic models are in a state of continuous iteration and evolution, algorithmic interpretations must be provided promptly. The longer the delay, the higher the risk that the interpretation becomes distorted and misleading. Although real-time interpretations that adjust alongside model updates are optimal, not all algorithmic models across various industries have the capacity to support real-time interpretation—especially when the non-convexity of the model (i.e., the inability to obtain the global optimum via local solutions) limits the speed of interpretation. With the aid of fixation mechanisms for algorithmic interpretations, a system design may consider offering either “dynamic interpretation” or “periodic interpretation” as a choice for operators to guarantee the timeliness of their interpretations. That is, operators may opt either to provide an interpretation every time the model changes or to provide interpretations at regular intervals, with the frequency determined by the degree to which the timeliness of the interpretation is relevant to public interests and individual rights. When a legal right to request an interpretation is established, the setting of the interpretation time limit should primarily consider rights protection needs, requiring operators to implement technical mechanisms that complete the interpretation within a relatively short timeframe.

5.2.3. Liability for Interpretation Defects

It is necessary to establish liability for defects in algorithmic interpretations if they contain omissions, biases, or errors. For public authorities, the law should set forth appropriate remedial and compensatory mechanisms. Given the special trust placed in public institutions, if defective interpretations lead affected parties to rely on them erroneously—resulting in losses of legally protected interests—public authorities should bear remedial or compensatory responsibilities commensurate with the trust placed. For market entities, defects in interpretations may constitute a violation of consumer rights to be informed; where such defects constitute a warning failure, product liability rules may be applied—especially if the entity is aware of the system’s flaws and risks but fails to disclose them, warranting punitive damages. In other circumstances, it is necessary to distinguish whether the interpretation is contractually governed between the operator and the user, or whether it constitutes an infringement of the user’s rights, with liability forms applied according to either contract law or tort law as appropriate.

The operation of an algorithmic interpretation system inevitably imposes a certain regulatory burden on operators. Current technological developments still struggle to balance model predictive performance with the comprehensive retention and explainability of automated decision outputs; in some instances, a higher degree of interpretation may impair model performance. If the liability for interpretation defects is set too lightly, operators might prefer to incur the penalty rather than fully fulfill their interpretation obligations; if set too heavily, it may reduce model performance, increase regulatory costs, or even lead to a “chilling effect” that discourages market players from actively developing and utilizing innovative, cutting-edge algorithmic models. The balance must be determined by the importance of public interests and individual rights protection in each specific interpretation scenario, ensuring that the liability for interpretation defects is significantly greater than the regulatory burden and is proportionate.

download（英文）苏宇｜算法解释制度的体系化构建(1)