China Institute for Socio-Legal Studies, Shanghai Jiao Tong University

2023-12-29 [author] Ding Xiaodong preview：

[author]Ding Xiaodong

[content]

Legal Reflection and Institutional Reconstruction of Fair Use of Data

*Written by Ding Xiaodong

Professor, School of Law, Chinese Minmin University

Abstract:In the current practice of data transaction and utilization, data, as a key factor of production, is mainly controlled by a small number of enterprises, and it is difficult for individuals and small and medium-sized businesses to use data as users. In order to achieve fair use of data, the European Union tries to give users the right to access and use data, the United States pays attention to the market transaction of personal information data, and China emphasizes the confirmation of the right to enterprise data. However, data has the characteristics of aggregation, relevance, scenario dependence, non-competition and non-exclusivity, and the confirmation of rights is not helpful to resolve disputes in the process of data utilization. For the use of commercial entity data, market autonomy and fair competition order should be emphasized. For the use of personal data, data governance rules should be constructed and improved from the two dimensions of "individual-enterprise" and "individual-collection-enterprise". For the use of public data, the data control of platform enterprises should be reduced, and businesses on the platform should be given limited rights to access and use data, and the right to carry personal information of individual users on the platform should be guaranteed, so as to effectively balance the interests of all parties.

Formulation of the question

In the era of digital economy, data has become a key production factor for data enterprises. In the process of systematic collection, value mining and commercial utilization of data, data enterprises have the right to use data, while it is difficult for individuals and small and medium-sized business users to use data. This phenomenon has aroused the attention and discussion of the academic community on the issue of fair use of data. The American scholar Zubov proposed the concept of "network surveillance capitalism", arguing that Internet companies and other data companies have monopolized the right to use data by collecting data generated by people's online behavior. Cohen pointed out that the free acquisition and exclusive use of user data by data companies is an unfair behavior of system construction, which should be reflected on at the legal level and the relevant system should be reconstructed. Some scholars in China believe that it is necessary to "put forward the legislative principle of 'fair use of data' more clearly" and "strike an appropriate balance between the protection of basic rights of individuals, the development of the digital economy and the sharing of data profits".

Achieving equitable use of data is also the focus of current national legislators. The U.S. has tried to promote the equitable use of data through the property rights of personal data and the role of market mechanisms. In 2022, the European Commission published the Data Law: Proposed Uniform Rules for Fair Access and Use of Data (EU Data Law Proposal), which proposes that users should have a general right to access and use data. Our country is following the legislative trends of the European Union. In December 2022, the Central Committee of the Communist Party of China and the State Council jointly issued the "Opinions on Building a Basic Data System to Better Play the Role of Data Elements" (hereinafter referred to as the "Data 20 Articles"), proposing to "establish and improve a system for the protection of the legitimate rights and interests of all participants in data elements", which should not only "reasonably protect the rights and interests of data processors to independently control the data held in accordance with laws and regulations", but also "protect the rights and interests of data sources to obtain, copy and transfer the data generated by them". In the above context, there are several theoretical issues worthy of in-depth discussion: first, whether the exclusive use of user data (including personal data and business data) by data enterprises constitutes unfair use of data, and whether it is necessary to give users the right to access and use data; second, whether China's existing legal system is conducive to the fair use of data; Third, is it necessary for China to draw on EU legislation and formulate special laws and regulations on the fair use of data?

As the United States and the European Union have successively put forward institutional plans for the fair use of data, China has also adopted the EU's legislative ideas at the policy level, and is ready to introduce the right of data sources to "obtain or copy and transfer" data at the level of the legal system, and the answers to the above questions are not only necessary but also urgent. Combined with the relevant legislation of the European Union, the United States and China, and considering the basic characteristics of data, this paper argues that the fair use of data cannot be effectively realized by giving data rights to all parties, and that data should be regarded as property with mixed rights and interests, and the approach of combining behavior regulation and data governance should be adopted to classify and construct relevant systems.

1、Existing institutional solutions to achieve equitable use of data

The quest for fair value is embedded in the data laws and systems of each country. In order to resolve the various disputes arising from the use of data, the EU and the US have chosen different institutional paths. In addition to personal data protection legislation, the European Union is actively exploring new empowerment schemes, while the United States is committed to improving the personal information data market, hoping to use the power of the free market to promote the fairness of data transactions. China has adopted a legislative approach to the dual protection of personal information and enterprise data, and has not yet formulated laws and regulations specifically for the fair use of data.

1.1 The right to access and use data in EU legislation

In April 2016, the European Union adopted the General Data Protection Regulation (GDPR), which establishes the right to protect personal data. Although the GDPR does not directly deal with the issue of data fairness, it can indirectly promote the fair use of personal data by giving individuals the right to access and carry data. In October 2017, the European Commission launched the "Building a European Data Economy" initiative. The initiative advocates that in order to achieve fairness in the use and transaction of data, data producers should be given exclusive property rights. For example, if a vehicle is owned by a person or rented for a long time, the data generated by that person's driving of the vehicle should be owned by that person; The data collected by various Internet of Things, such as smart home, smart agriculture, and smart healthcare, should be owned by the producer of the data, not by the holder or controller of the data. In 2022, the European Commission published the "EU Data Law Proposal", which further proposes uniform rules for fair access and use of data, which abandons the concept of data producers and introduces concepts such as "data users", "data holders" and "data recipients", trying to distinguish different participants in the process of data transaction and use, and imposing rights and obligations on various types of participants, so as to achieve the institutional effect of fair use of data.

First, to provide users with easy access to data, the EU Data Law Proposal stipulates that data companies should ensure that "the data they generate is easily and securely accessible by users by default, and that users have direct access to the data when relevant and appropriate" when designing and manufacturing products and providing related services. At the same time, enterprises should inform users of information such as "the nature and amount of data", "whether data is likely to be generated continuously and in real time", and "how users can access this data", so that users can effectively exercise their rights. If the user is unable to obtain data directly from the product due to technical and scenario limitations, the data holder shall provide the user with the data generated by the use of the product or related services in a timely manner and free of charge, and where feasible, such provision shall also be continuous and real-time.

Second, in order to enable users to make fair use of data, the EU Data Law Proposal clarifies that users have the right to provide data directly to or share data with third parties. In many cases, users need to use third parties to use or develop data, such as using third parties for after-sales service or data analysis. According to the proposal, the Data Holder shall make available to third parties the data generated by the User's use of the Products or related services free of charge and without undue delay, and shall share data of the same quality and generated in real time if requested by the User. In addition, data holders shall provide data to data recipients "on fair, reasonable, non-discriminatory terms and in a transparent manner", and "shall not discriminate against similar categories of data recipients". If the recipient of the data considers that the conditions under which the data holder provided the data to him or her are discriminatory, the data controller bears the burden of proving that there is no discrimination.

Finally, according to the EU Proposed Data Law, the recipient of the data also has certain obligations towards the user. The recipient of the data shall process the data it receives "only for the purposes agreed with the user and under the conditions agreed with the user", "subject to the rights of the data subject in the case of personal data, and shall delete the data that are no longer necessary for the agreed purposes". The recipient of the data shall not "coerce, deceive or manipulate the user, undermine or impair the user's autonomy, decision-making or choice", nor may the received data be transferred to another third party without the user's consent.

1.2 The U.S. personal information market transaction model

The United States views the fair use of data as an issue of fairness in a market economy. At the federal level, the United States has enacted a series of legislation to protect personal information in the fields of credit reporting, medical care, education, and finance, and many states have introduced relevant legislation in the field of consumer protection. In areas where there is no relevant legislation, the U.S. Federal Trade Commission maintains order in the personal information market by regulating "fraudulent and unfair" market practices. For example, if a business promises not to collect personal information in its privacy policy, but actually collects personal information and sells it to third parties, it is fraudulent or unfair market practice and will be subject to the regulation of the U.S. Federal Trade Commission.

Legislation of the United States does not consider the right to protection of personal information as a fundamental right, nor does it fully adopt the "purpose limitation principle" and "data minimization principle" in the EU GDPR, which give individuals the opportunity to trade and utilize their personal information in the information marketplace. For example, many telecommunications companies in the United States have adopted differentiated pricing methods for personal information, charging lower fees for users who are willing to provide more personal information and higher fees for users who are unwilling to provide personal information. Such personal information transactions would be considered illegal in the European Union, but in the United States, they are lawful as long as they do not violate U.S. federal and state legislation and are not fraudulent or unfair. There are also many obstacles in the practice of personal information market transactions in the United States. Many scholars have pointed out that individuals often have difficulty understanding privacy policies properly and accurately, and lack effective recognition of the value of their personal information. Since individuals usually do not have substantial bargaining power, it is difficult for companies to collect and use personal information as a fair transaction, and at best it can only be regarded as "privacy for convenience". More often than not, the actual effect of such transactions is to provide companies with a "freedom pass" that allows them to easily access and monopolize the use of personal information.

In order to further promote the fair trade of the personal information market, American scholars have proposed a scheme for personal information property rights. In the 90s of the 20th century, the American economist Kenneth Lawton proposed that a regulated "national information market" should be established. In this market, individuals can sell their data to banks, which aggregate the data and sell it on national exchanges. The proposal for the property rights of personal information has been recognized by many jurists. Drawing on Calabresi's theory of property rules and liability rules, Lessig argues that property rules are more conducive to achieving fair transactions than liability rules. He proposed that technology and legal means can be used to enhance individuals' right to negotiate information, so as to provide more bargaining chips for individuals to participate in information market transactions. Paul Schwartz et al. advocate giving limited property rights to information subjects and strengthening the protection of personal information in the process of personal information circulation, so as to achieve a fairer commercial use of personal information.

1.3 China's plans for the protection of personal information and the confirmation of enterprise data rights

China has adopted a dual legislative approach of personal information protection and enterprise data rights confirmation. In terms of personal information protection, after years of academic debate and institutional practice, China has formulated a series of laws and regulations with the Personal Information Protection Law as the core. The Personal Information Protection Law adopts a unified legislative model, and regards the right to the protection of personal information as a basic right. According to the Act, information processors must follow the principles of "purpose limitation" and "information minimization" when processing personal information, and the processing of information must not exceed the necessary limits necessary to provide services. Under this model, personal information is protected to the greatest extent, but the use of personal information is also limited. If a company collects personal information beyond the scope necessary to provide services in consideration for providing a price concession, it may be found to be illegal because it violates the principle of minimum necessity, even if it has obtained the explicit consent of the individual.

China's policies and legal practices have also actively strengthened the protection of enterprise data. In 2020, the Central Committee of the Communist Party of China and the State Council jointly issued the "Opinions on Building a More Perfect System and Mechanism for Market-oriented Allocation of Factors", which pointed out that data is a factor of production similar to land and labor, and it is necessary to "improve the nature of property rights according to the nature of data". The "Data Article 20" proposes that "a property rights operation mechanism for the ownership of data resources, the right to process and use data, and the right to operate data products should be established". In judicial practice, Chinese courts mainly apply the Anti-Unfair Competition Law to protect enterprise data. For example, in cases with high social influence in recent years, such as Sina Weibo v. Maimai Unfair Competition Dispute, Dianping v. Baidu Data Crawler, Taobao v. Meijing Company Unfair Competition, and other cases with high social influence in recent years, the courts have ruled in favor of the plaintiff and found that the use of crawler technology by Internet companies to obtain the data of the other party violated the Anti-Unfair Competition Law.

Under the legal system that protects personal information and corporate data at the same time, the problem of fair use of data is becoming more and more prominent. First of all, it is inevitable that disputes will arise between individuals and data companies over the use of data. For example, data on the microblogging platform can be considered both corporate data and personal information. If it is considered enterprise data, the data control shall be owned by the platform, and if it is considered personal information, the data control shall belong to the individual. So, can a platform enterprise restrict individuals from using their platform data by entering into user agreements, or restrict individuals from entering into commercialization agreements with other platforms for data transfer? To solve such problems, it is obvious that it cannot be analyzed only from the perspective of personal information protection or enterprise data rights confirmation. Second, disputes over the use of data will arise between data sources and data companies that are not individual subjects. The "Data 20" is borrowed from the "EU Data Law Proposal", proposing to give data sources the right to access, copy and transfer data. Once this right is legislated, the data source can transfer its data to a third party, which will inevitably affect the interests of the data company and cause controversy.

2.Problems with the existing system scheme

The institutional exploration of the EU and the United States provides enlightening ideas for achieving equitable use of data. However, the practical effects of these institutional schemes have yet to be tested in practice. China's Personal Information Protection Law focuses on the protection of personal information and the prevention of risks, and whether the dispute over data use between users and enterprises can be resolved by resorting to the enterprise data property right confirmation plan still needs to be reflected and scrutinized at the theoretical level.

2.1 The dilemma of the right to access and use data

In the European Union, as soon as the concept of a data producer was raised, questions arose. During the European Commission's public consultation, many data companies did not agree with the definition of users as data producers. From the perspective of enterprises, the generation of data is the direct result of the company's investment in and construction of data equipment, and it is unfair to identify users as data producers. If it is confirmed that users have property rights to data, it will be difficult for a large number of data companies to develop and utilize the data. Similar criticisms have been made by academics. Wolfgang Cobb pointed out that the production of data is often done by multiple actors, and that giving rights to data producers "does not solve the problem of 'unequal bargaining power' related to data, nor does it solve the problem of access in multi-stakeholder situations", but will lead to more unfair results.

The right to access and use data under the EU Data Law Proposal is also problematic. First, the data does not come from the user alone, and the user should not claim similar ownership rights to the data formed with his participation. As early as the 19th century, there were judicial cases in the United States that made it clear that tourists were free to take pictures of private houses, but the housing information (data) formed by taking photos was not protected by law. This is especially true in the case of modern data production. Although individual users or business users have left traces on such platforms, it is difficult to say that individuals or businesses are the only producers or data sources of such data. Second, it is difficult for users to identify the role they play in the data production process as "labor", and it is difficult to obtain rights based on the theory of digital labor. According to the theory of digital labor, platforms such as social networking, e-commerce, and the sharing economy create a type of large-scale factory, and users will produce data when they use these platforms, which is also a kind of labor. Eric Posner and Glenn Weil point out that user data can become a labor product, and that users can form "data labor unions" and negotiate with companies. However, on the whole, the concept of digital labor is quite different from people's common understanding of labor, and it does not conform to the definition of labor in Marxist labor theory. According to the general perception of society, only in rare cases, such as when users consciously create and write, can data be considered to be created by the user's labor. In most cases, data is just an incidental product that is unconsciously generated by users in the process of entertainment, transactions, etc. In contrast, the architecture construction activities carried out by data companies such as platforms are closer to labor. Since data is produced by multiple parties, and enterprises have invested a lot of resources in the process, it is not justified to give users the right to access and use data.

Access to and exploitation of data can also have a negative impact on data marketplaces. The EU created this right both to achieve equitable use of data and to facilitate the flow of data. EU legislators envisage that the right to access and use data would give both users and third parties the opportunity to participate in data marketplace transactions. However, such an assumption is too idealistic and will also impose unnecessary cost burdens on all parties. In the real world, the data collected by the enterprise will be combined with the existing data system of the enterprise to form part of the enterprise's decision-making mechanism. Users and third-party recipients often do not own such data systems, and even if users have the right to access and use the data collected by the enterprise, or transfer the data collected by the enterprise to the data recipient, it is difficult for users and data recipients to make effective use of the data. It is conceivable that if users and data recipients can easily develop and use this data, then it is natural for the various players in the market to cooperate through open application programming interfaces (APIs), for example. The main reason why users, data companies, and third-party companies do not share and cooperate with each other voluntarily is because such data cannot be effectively embedded in each other's data ecosystems, or the cost of opening such interfaces is too high. The "EU Data Law Proposal" creates the right to access and use data, hoping to create a market for seamless and efficient data circulation, but ignores the prerequisites for data sharing and circulation.

2.2 The dilemma of personal information property rights

In the United States, after the concept of personal information property rights was proposed, a number of companies that acted as agents for personal information transactions emerged, but these companies did not operate smoothly without exception. The proponents of the theory of personal information property rights envisioned a thriving personal information trading market and tried to promote the fair use of data by encouraging individuals to participate in market transactions, but this idea did not become a reality. The reality is that companies still have relatively easy access to and exclusive access to personal data, while individuals have difficulty accessing and exploiting it. In recent years, California and other states in the United States have treated personal information as "quasi-property" in their personal information legislation, requiring companies to inform individuals and obtain consent before collecting personal information, but such legislation has not promoted individuals to effectively participate in data transactions.

The fundamental reason for the failure of the personal information data market is that big data is formed by the aggregation of massive amounts of personal information, and the value of a single piece of personal information is very limited, and individuals are not motivated to participate in such market transactions. Even for very detailed personal information, "the average person's data typically retails for less than $1," while "general information about a person, such as age, gender, and location, is worth only $0.0005 per person." In the face of such meager gains, it is difficult for rational individuals to have an incentive to trade personal information as property. Therefore, the institutional idea of realizing the fair use of data through the property of personal information cannot be implemented.

The most successful way to realize the industrialization and marketization of personal information is "non-fungible tokens" (NFTs). The essence of non-fungible tokens is the use of blockchain technology to form a trusted digital equity certificate. It is generally believed that non-fungible tokens have the characteristics of uniqueness and non-replication, so they have a high degree of credibility, which is conducive to the asset-based utilization of data products such as personal information and digital collections, and this utilization is not subject to the centralized control of platform enterprises. However, it has been proved that the most widespread application scenario of non-fungible tokens is still the transaction of various art collections or celebrity information, and only when personal information or data has high asset value and financial attributes, personal information may become a transaction item. The limitations of the application scenarios of non-fungible tokens show that the industrialization and marketization of personal information are difficult to apply to ordinary individuals, and the industrialization scheme of personal information is unable to solve the problem of fair use of data in the general sense.

2.3 The dilemma of confirming the ownership of enterprise data assets

From the perspective of the theory of enterprise data property ownership confirmation, the pre-emption or "capture rule" itself is a fair institutional arrangement. For example, whoever shoots a wild animal should have control over that wild animal. In the process of collecting data, the enterprise has made a lot of investment and labor. If there is no protection of the property rights of enterprise data, allowing other entities to use enterprise data for free will not only constitute unfair treatment of enterprise labor, but also indirectly encourage unfair behaviors such as "free riding" and getting something for nothing. However, such arguments cannot be sustained. The property protection of enterprise data not only does not help to achieve fair use of data, but also brings more problems.

First of all, the protection of property rights of enterprise data on the grounds that the enterprise has paid labor will lead to excessive protection of data preemptors. The theory of data affirmation can be traced back to the theory of labor property rights proposed by Locke and even earlier theorists, but this theory can only be applied to resources with exclusive characteristics. Data is a non-competitive and non-exclusive resource, and if the preemptor can obtain exclusive protection, the later one will lose the opportunity to obtain the data, which will not be conducive to the fair use of the data. It is for this reason that the IP system imposes additional requirements on data protection. For example, in the process of processing and utilizing data, the data product can obtain copyright protection only if the data product is original; A data product can only be protected by a patent if it is novel, inventive and practical; The law protects the data as a trade secret only when it has commercial value and the enterprise has taken measures to keep it confidential. None of the countries has made the existence of labor a sufficient condition for data to be protected by intellectual property rights. For example, the U.S. Supreme Court stated in the Feist case that "sweat on the forehead" is not necessarily copyrightable. Although it takes a lot of labor to collect and produce information about a phone directory, if it is not original, it is not protected by intellectual property rights. In addition, the special rights protection provided by the EU for databases is based on the consideration of incentivizing investment, rather than on the basis of labor theory, and the protection of database rights in the EU is also quite different from the protection of traditional property rights or intellectual property rights.

Second, prohibiting all free-riding behavior by giving enterprises data property rights may have a negative impact on the rational and fair use of data. Due to the non-competitive and non-exclusive nature of data, it may be the most reasonable system design for the law to provide limited and non-exclusive protection for data and allow social entities to take a free ride on the basis of not harming the rights and interests of others. Free-riding in the flow and sharing of information is common. For example, beautiful information such as the appearance of the house, flowers and trees in the yard is shared by passers-by, and passers-by do not have to pay for the other party as long as they do not invade the privacy of the owner of the house or yard. If such access to information is found to be illegal, everyone in society is an offender. In the intellectual property protection system, individuals and business entities can make reasonable use of the subject matter of intellectual property rights in many cases, and the law sets a term of protection for copyright and patents, which is also intended to encourage innovation while ensuring that the public can make reasonable use of knowledge and data for free. In fact, data in the era of big data is aggregative, and the law should not only prohibit free-riding, but encourage it. The Data Governance Act, passed by the European Union in 2022, introduced a system of "data altruism", encouraging people to donate their own data for the public good to form a data pool with research value.

3.Refactoring the principle of fair use of data

The fundamental reason why the existing system or theoretical proposition of fair use of data does not work is that it does not adapt to the characteristics of data and does not conform to the development law of the digital economy. Different from traditional factors of production, data has the characteristics of aggregation, relevance, scenario dependence, non-competition, and non-exclusivity, and it is advisable to treat data as a property with mixed rights and interests. The transaction, circulation, and utilization of data is a highly scenario-based practice involving multiple subjects. In order to realize the fair use of data, different institutional schemes should be designed for different types of data, and efforts should be made to achieve the fairness of market competition order and the fairness of data public governance.

3.1 The basic characteristics of the data

3.1.1. The aggregate characteristics of data

The aggregation of data mainly refers to the ability of data to gather less into more, so as to exert a superposition effect. Siloed and scattered information or data has existed since ancient times, but it hasn't had the impact it has today. With the development of information technology, especially Internet technology, it is possible to collect massive amounts of data, and data has become an important factor of production. An important reason why the value of big data is widely recognized is that the aggregation of data produces scale effects. The big data industry uses "whole data" rather than "random samples". Through the aggregation of complex data, big data can provide more accurate analysis of specific problems. For example, in 2009, Google predicted the scope and spread of H1N1 influenza outbreaks more accurately than the U.S. public health department by aggregating various information such as users' search history and whereabouts to predict seasonal influenza.

3.1.2. Relevance characteristics of data

Data is correlated, meaning that the criss-crossing relationships between data can affect the value of data. For example, someone's whereabouts may reveal the whereabouts of their fellow travelers, someone's genetic information may help identify other people, and so on. The same is true for non-personal information, the sales data and browsing data of a merchant on an online platform may be related to consumers or platforms, and the generation of data is often the result of the joint action of multiple parties. It is precisely because of the relevance of data that the EU's plan to unilaterally empower users has been widely questioned by all sectors of society. Data is not "produced" by individual or commercial users alone, and no matter whether individual or commercial users are defined as "data producers" or "data sources", they cannot truly reflect the actual situation of data generation.

3.1.3. Scenario dependency of data value

The value of data is highly dependent on the specific context in which it is used. For example, data that reflects the health and financial status of the elderly in a community is of great value to insurance companies, and it is also meaningful to advertising companies that promote health products, but it may not be of much value to other businesses. For data to realize its value, it must be effectively integrated into a company's business strategy and decision-making system. If the advertising agency runs the commercial in a random model rather than a personalized recommendation model, then the above elderly data is not meaningful to the advertising agency. Standardized commodities in the traditional sense, such as oil and gold, can form a "thick market" for commodity circulation, and can even achieve high-frequency trading with the help of exchanges. However, data has more typical "credit" characteristics, and transactions around data are more of a collaboration, i.e., one party uses its data to provide services to the other. It is difficult for data to form a standardized circulation similar to commodity circulation in isolation from specific scenarios, and both parties rarely directly use data as a buying and selling object.

3.1.4. Non-competitive and non-exclusive data

The reason why data is considered non-competitive and non-exclusive is that data can be reused, and specific data can be used by a certain subject, without affecting the development of the data by other subjects. This characteristic of data means that most of the traditional property or property rights principles cannot be directly applied to the object of data. Harding's "tragedy of the commons" hypothesis is mainly applicable to expendable and competing resources, but cannot effectively explain or guide the use of data. Overgrazing of public pastures may lead to the degradation of grasslands, but the public use of data will not cause the "degradation" of data, but can make the value of data more explored. On the contrary, if data is privatized, the utility of data as a public good will be inhibited, and even the result of the "tragedy of the anti-commons" will occur.

3.2 Reconstruction of the legal attributes of data

Due to the characteristics of data such as aggregation, relevance, and scenario dependence, it may be more reasonable to treat data as an aggregate property with mixed rights and interests. In civil law, traditional objects are divisible, and even if they cannot be physically divided, they can be divided according to the "capital contribution" or "equal amount". The value of data comes from the hybridization of information, and it is impossible to distinguish which data is more valuable and which is not. Also, the data should not be fragmented. From the perspective of being most conducive to the role of data elements and realizing the value of data utilization, data should be regarded as a whole. In the case of non-whole things (e.g., money, grain, oil), the division does not lead to the extinction of their value, which is not the case with whole things. In daily life, bridges, reservoirs, etc. are typical wholes. Once the bridge is dismantled, the value of the bridge disappears; If the reservoir is dispersed into water droplets, the reservoir will not be able to generate electricity. In the same way, once data is split into discrete user information, the overall value of the data will largely disappear. For this reason, the institutional norms for data utilization should focus on the construction of a data sharing system, rather than relying too much on the traditional property rights system.

Based on the non-competitive and non-exclusive nature of data, public data on online platforms should be regarded as special property with the characteristics of public property. Public property refers to "items that belong to the public and are open to the public through legal mechanisms", which is different from private property and collective property, and is not state-owned property. The ownership of natural resources by the State means that the State has exclusive rights over natural resources. In the sense of exclusivity, state ownership is similar to private ownership. Public ownership of data means that the data is in the public domain and its ownership does not belong to anyone (including the state and the collective). Treating certain data as public property means that companies can control the use of this data, but they cannot claim property rights protection in the absolute sense of the data. In the traditional legal system, public property is not uncommon. Carol Roth found that Roman law treated roads, waterways, and flooded land as "public property" in order to maximize the value of these resources. After the 21st century, with the rise of the Internet, Ross's public property theory has been widely used in the field of network and data law. Public data has the characteristics of public property, which has become the mainstream understanding of data law in the United States. In recent years, many scholars in China have realized that open data has the characteristics of public property. Some scholars have pointed out that the provision of open government data as state-owned will affect the circulation and use of such data. Some scholars have also pointed out that the law can provide appropriate legal protection for public data on online platforms, but it cannot be regarded as the private property of an enterprise.

3.3 From the path of rights to the regulation of conduct

Starting from the characteristics of data, trying to achieve fair use of data by empowering all parties will face many difficulties. Due to the aggregation and relevance of data, it is difficult to determine the value of isolated user data. Due to the value dependence of data, it is difficult for isolated user data to circulate freely like standardized goods. In particular, enterprise data is composed of massive user data, in which the data value of a single user is often very small, and it will be difficult to give full play to the value of data without the specific scenarios of data aggregation and association, and it will be impossible to realize the fair use of data through general confirmation of data rights. In addition, given the non-competitive and non-exclusive nature of publicly available data, the general identification of rights to such data is bound to raise problems.

The basic characteristics of data and the development law of the data industry determine that in order to realize the fair use of data, behavior regulation and data governance should be the main path, and relevant rules should be designed from the perspective of the integration of public and private law. For purely commercial data processing activities, market mechanisms should be used to adjust and competition law should be applied, and efforts should be made to maintain the fairness of the market competition order. For data collected from massive individual data, in addition to the application of competition law supervision, we can also explore the construction of innovative systems such as public participation and public trust from the perspective of democratic governance. Although public data on online platforms exists in the public space, its underlying architecture is often controlled by enterprises, so the public ownership of public data is not absolute. For all kinds of open data, the path of behavioral regulation should be adopted, and a data utilization system that takes into account the interests of the platform and the public interest should be constructed.

It should be emphasized that failure to confirm the right to data will not affect the transaction of the data market. One of the reasons in favor of data rights confirmation is that there is a lack of property rights protection, the data supplier will worry that its data will be misappropriated by a third party, and the data buyer cannot guarantee the complete property rights on the data it purchases, and the confirmation of rights can reduce the transaction costs of the data supply and demand side and ensure the smooth progress of the transaction. However, in fact, the vast majority of data transaction activities adopt a highly scenario-based contract model, rather than a highly standardized and property-based commodity circulation model. Data trading venues modeled after stock exchanges or shopping malls often have very limited trading volumes. From the perspective of the use of data, data transactions are essentially just ways for market entities to use data to provide services or cooperate with each other. For example, by providing traffic portals for enterprises, Internet platforms push merchants who pay higher prices to pages with higher user views and clicks, and such activities use data to provide services to enterprises, and the relevant agreements have the nature of service contracts rather than property transaction contracts. For this reason, trading on a stock exchange or shopping mall requires confirmation of rights to securities or commodities, while in the cooperative data trading model, there is no need to confirm data rights.

4.Classification and construction of data fair utilization system

According to its source, the data can be divided into merchant (non-individual subject) data and personal information data; According to the degree of openness, it can be divided into open data and non-public data. To construct a fair data utilization system based on the idea of behavior regulation, it is necessary to design utilization rules for different types of data.

4.1The use of merchant data should emphasize market autonomy and fair competition

For the use of commercial (non-individual) user data, competition law norms should be applied on the basis of respecting the autonomy of private law and market regulation. Business entities usually have enough awareness of the value of data to make rational decisions. For example, in the cooperation between a merchant and a small and medium-sized Internet enterprise, the merchant will fully consider the value of the data, and negotiate with the Internet company as the data controller on issues such as who uses the sales data and whether the data can be shared. If the merchant fails to fully consider the value of the data during the negotiation process, it may be because the value of the data is not too valuable to be considered, or it may be because the merchant lacks the business sense to recognize the value of the data. However, in either case, there is no need to interfere with the merchant's choice, and the merchant is responsible for its own decision-making.

In the absence of market dominance, a good order for the fair use of data can usually be formed spontaneously. Data companies are willing to negotiate on the use of data and provide corresponding consideration to merchants based on the value of the data. If the data company ignores the reasonable demands of the merchant, the merchant can cooperate with other data companies. Because data is non-competitive and non-exclusive, data businesses also have an incentive to share data, and will even share some data with merchants for free. In practice, many platforms will provide sales data and official account operation data to merchants on the platform for free, or open their API ports to merchants. Data companies open up their data ecosystems to as many collaborators as possible, which can not only help partner merchants make profits, but also benefit the development of data enterprises themselves. Through data openness and sharing, data enterprises can help their downstream merchants better understand their business conditions, and can also attract more partners to settle in and stay, forming a data ecosystem with scale effect. In economic theory, this type of strategy adopted by firms is also referred to as "internalization of complementary efficiency".

Of course, the data market is inseparable from the regulation of competition law. However, regardless of whether the Anti-Unfair Competition Law or the Anti-Monopoly Law is applied, supervision should be after the fact. In this sense, the fair use of data system constructed by the "EU Data Law Proposal" is obviously unreasonable. The proposal does not distinguish between personal information data and business data, and not only advocates granting business users the right to access and use data, but also imposes various obligations on data holders and data recipients. In addition, the proposal also restricts data sharing contracts between enterprises and various types of data contracts involving micro, small or medium-sized enterprises, stipulating that the behavior of data enterprises restricting the access and use rights of micro, small or medium-sized enterprises in the terms of the contract, during the performance of the contract or within a reasonable period after the termination of the contract is "unilaterally imposed on micro, small or medium-sized enterprises". Such provisions would seriously undermine the fairness and efficiency of the data market. Once the right to access and use data is elevated to a non-waivable and non-tradable legal right, data companies will have to take more measures and spend more costs to ensure their compliance. At the same time, data cooperation between data enterprises based on the principle of voluntariness and reciprocity may also be found to be illegal.

At present, China's legislation focuses on using the Anti-Unfair Competition Law to safeguard the interests of data enterprises, while the policy documents emphasize the need to protect the different rights and interests of multiple entities at the same time. For example, the "Data Article 20" proposes that it is necessary not only to "establish and improve the system for the protection of the legitimate rights and interests of all participants in the data elements", "promote the mode of data circulation and use based on informed consent or the existence of statutory reasons, and ensure that data sources enjoy the rights and interests of data sources to obtain or copy and transfer the data generated by them", but also "reasonably protect the rights and interests of data processors to independently control the data held in accordance with laws and regulations", "fully protect the rights of data processors to use data and obtain benefits", "protect the rights of data processors that have been processed, analysis and other forms of data or data derivative product management rights, regulating the rights of data handlers to license others to use data or data derivative products in accordance with laws and regulations, and promoting the circulation and reuse of data elements". In the future, when translating policy documents into legal systems, it is necessary to design relevant rules according to the idea of behavior regulation. In terms of conduct regulation, the focus should be on anti-monopoly, and the anti-unfair competition law should be prudently used to regulate acts that violate the competition order. If large data companies abuse their dominant market position, small and medium-sized businesses may lose their choice in the negotiation process, resulting in a serious distortion of the competitive order in the data market. The timely and appropriate intervention of the Anti-Monopoly Law is conducive to the healthy operation of the data market.

4.2The use of personal data should pursue the realization of governance fairness

For data from individuals, relevant rules should be designed from the two dimensions of "individual-data enterprise" and "individual collection-data enterprise". In the dimension of "individual-data enterprise", individuals and data enterprises are in an unequal state in terms of information acquisition and cognitive decision-making, and it cannot be expected that the market transaction scheme based on the ownership of personal information can achieve fair use of data. As an alternative, the principle of information fiduciary obligation can be considered to be introduced, and the responsibility of information processors with the protection of individual interests as the core can be constructed. The information fiduciary duty is different from the general fiduciary duty and is also very different from the trust in the field of property law. The logical premise for the establishment of information fiduciary obligations is that information individuals and information processors are not equal in information capabilities. To this end, the obligation does not emphasize the autonomy of the individual's will or the right to information self-determination, but rather the fact that information processors should handle personal information with care and assume a duty of care. In the theory and practice of personal information protection in various countries, the fiduciary obligation of information has been widely recognized, and has had a significant impact on the U.S. Data Privacy and Protection Act and China's legislation. In the past, when discussing information fiduciary obligations, it was emphasized that information processors should have a duty of protection to individuals to ensure that individuals are protected from all kinds of harm after authorization. From the perspective of the theoretical premise and value orientation of the principle of information fiduciary duty, this obligation can also be extended to the dimension of fair use of personal information. Under the principle of fiduciary duty, data enterprises should be allowed to collect personal information to a certain extent, but they must strictly review whether the processing of personal information by enterprises is beneficial to individuals, so as to ensure that individuals can share more of the benefits brought by data utilization activities.

In the relationship dimension of "individual collection - data enterprise", the relevant system should be designed based on the aggregation characteristics of data. Data aggregated from a large amount of personal information should be legally characterized as a kind of property with mixed rights and interests, that is, a pooled property that includes a large number of micro rights and interests. The way in which such data works determines that it is not appropriate to give absolute rights to the individuals concerned. Especially on the collection side, empowerment not only does not help individuals make rational decisions based on risks and benefits, but also may hinder the realization of the overall value of data. Once the aggregation of data fails, the overall value of data cannot be brought into play, and the interests of individual collections and data enterprises will be damaged accordingly. For example, the technological improvement of artificial intelligence is highly dependent on training data, and without the aggregation of personal information and the development of the big data industry, the development of artificial intelligence will be impossible, and it will not be able to create more convenience for human life. Therefore, the law should not place too much emphasis on the control of individuals over their information, nor should it strictly restrict the collection of personal information by information processors, but should pay attention to the overall governance of data and prohibit information processors from misusing personal information. In this regard, there are two institutional options to choose from.

The first option is to provide ways and facilitate public participation in data governance. Ostrom proposed that for resources with public attributes such as knowledge, the sharing and fair use of resources should be realized through collective autonomy, and it is not appropriate to give the ownership of resources to any one subject. The sharing and utilization of data resources can be used as a reference for this model. However, direct public participation in data governance may not be effective. The data rights enjoyed by individuals are often micro-rights, and the vast majority of the public may not be interested in directly participating in data governance. In 2012, Facebook invited its users to vote on its data governance and privacy policies, but Facebook, which has more than 1 billion users, received just over 500,000 votes. In fact, it is difficult to get the response of the vast majority of users to carry out data governance through collective voting, but it may attract a small number of users with extreme preferences to participate, resulting in voting results that can only reflect the interests and needs of a small number of users. Therefore, the model of indirect public participation in data governance is relatively more feasible, and in the future, it can be considered to introduce a certain proportion of professionals who represent the interests of ordinary users in the decision-making level of data enterprises to strengthen the representativeness and publicity of enterprise data governance.

The second option is a public trust for data. Hook noted that the wealth of revenue generated by the aggregation of personal information "flows to the companies that can best exploit this information pool," causing problems such as "privacy violations, economic exploitation, and structural inequalities." It is difficult to effectively solve the problem of fair use of data only through the control of individuals over their own information, or only through the governance of data enterprises, and the introduction of public trust theory can carry out more fair and effective management of data. In a public trust for data, data enterprises have partial control over data and can enter the data market to engage in commercial activities such as transactions, but the ownership and ultimate management of data are owned by the state, and the state should manage data abuse and other behaviors to ensure that data is effectively used. Originally used in the field of natural resource management, public trusts are now also playing a role in many data governance scenarios. For example, New York City in the United States mandates ride-sharing companies such as Uber and Lyfdo to disclose operational data such as the specific time, origin, destination, mileage, trip cost details, and specific routes to New York City public agencies. This data will be used by New York City public agencies to address issues of public interest such as road congestion, ride safety, traffic light settings, and more. The city of Barcelona in Spain has created a platform called "We Decide". The city is asking service companies that collect and utilize personal location data to share their data with the We Decide platform to create a "new type of regional data commons that empower people to collect and share data to address regional issues."

Regardless of which scheme is adopted, the key to realizing the fair use of data in the relationship dimension of "individual collection - data enterprise" is to strengthen the overall representativeness of personal information collection to the collective interests of users. For privately built reservoirs, in order to achieve equitable use of reservoirs, it is necessary to enhance the inclusiveness and public service capacity of reservoirs, and the same is true for data pools formed by the aggregation of personal information. Large-scale data companies with massive amounts of personal information can generate huge social benefits in the process of using data, and play a pivotal role in the public interest. In order to realize the fair use of data, legislators should not only consider the vertical interest relationship between individual collections and data enterprises, nor can they rely on the division of rights and interests or confirmation of rights, but should impose social fairness dimensions on the data use behavior of enterprises while allowing and promoting the reasonable aggregation of data.

4.3 Fair use of open data

The online platform is not only public, but also interconnected. As long as anyone is connected to the Internet, it is equivalent to entering a public domain by default, and in this public domain, data or information has certain characteristics of public property. For this kind of open data, in addition to applying the utilization rules established above, it is necessary to design additional targeted institutional schemes.

First of all, it is necessary to avoid setting exclusive rights on top of public data, and at the same time, it is necessary to derogate from the control of platform enterprises over public data. Private individuals should not have absolute control over resources that are controlled by private individuals but are open to the public, even if they are tangible property. For example, open shops in shopping malls must not discriminate against specific consumer groups and must not prevent certain customers from entering the shops to make purchases. For non-exclusive and non-competitive public data, the control of data enterprises should be restricted, because regular access to public data will not hinder the realization of data value. Enterprises that control public data, while not able to assert similar property rights to data in the public domain, can protect data in the background, such as by setting up bot protocols to prevent certain malicious crawler behavior.

Secondly, under the condition of technical feasibility, merchants on the platform should be given the right to access and use data. However, this right of access and use of data is not the same as that of the EU Data Law Proposal. For the public data on the online platform, the merchant can access and use it without barriers, which is equivalent to the fact that the merchant already has the right to access and use the data. Explicitly granting such rights to merchants at the institutional level does not require platform enterprises to develop additional technologies or bear additional costs, but it can help platform enterprises connect with more merchants and create a larger and more dynamic data ecosystem. It is important to discuss that platforms may enter into agreements with merchants for competitive purposes to prevent merchants on the platform from migrating data to competing platforms. At this time, it is controversial whether the merchant's data access and use rights can be opposed to the agreement. This article argues that under the premise that there is no monopoly power, merchants should have the right to access and use the public data on the platform, but if the merchant migrates data in violation of the agreement, the platform should be allowed to pursue the merchant's liability for breach of contract. Although the merchant's access to and migration of data is in line with the interconnection characteristics of the online platform and is conducive to promoting competition between the online platform, in view of the fact that the merchant and the platform have signed an agreement, both parties also have the ability to make rational decisions, and allowing the platform to recover from the merchant does not violate the principle of fair market. It is in this sense that this article advocates giving merchants on the platform the right to access and use data, which is different from the right of data access and use envisaged in the EU Data Law Proposal. According to the EU Data Law Proposal, all agreements that restrict merchants from migrating data will be deemed invalid, and merchants will be free to migrate data without compensation to the platform. While the proposal allows data holders to claim reasonable compensation, it limits the compensation to "the costs incurred in providing access to the data" and explicitly requires that such compensation should be "paid by a third party and not by the user".

Thirdly, it is necessary to ensure the implementation of the right to portability of personal information. At present, China's Personal Information Protection Law stipulates the right to carry personal information (the right to transfer), and "if an individual requests to transfer personal information to the personal information processor designated by him/her, and meets the conditions stipulated by the national cyberspace administration, the personal information processor shall provide a channel for the transfer". Similar to a merchant's access to and exploitation of platform data, an individual's access to their data on a publicly available online platform does not place an additional burden on the platform. However, the right to carry personal information is more empowering than the merchant's right to access and use data. Individuals not only have access to and use their personal information, but also have the right to request that third-party platforms receive their information, and the platform cannot exclude this right through a user agreement. Even if an individual violates the User Agreement, the Platform has no right to hold the individual liable for the breach of contract. The reason for this is that merchants can negotiate with platforms about the use of their data, while individuals often do not have the ability to negotiate with platforms. Protecting the right of individual users on the platform to carry their personal information can correct the inequality between individuals and the platform in terms of information utilization capacity to a certain extent, and promote the fair use of data.

Finally, a reflective and dynamic data governance mechanism should be established to lay the foundation for the fair distribution of data revenue. On how to achieve the fair distribution of data revenue, the "Data 20 Articles" put forward principled opinions, including but not limited to giving full play to the decisive role of the market in resource allocation, and improving the mechanism for data elements to be evaluated and contributed by the market, and the remuneration is determined according to the contribution; Establish and improve a more reasonable market evaluation mechanism to promote the matching of workers' contributions and labor remuneration; Promote the income of data elements to be reasonably tilted towards the creators of data value and use value. Obviously, these opinions are limited to the distribution link, and do not fully consider the impact of the data utilization link on the income distribution link. The fair distribution of data income and the fair use of data are not two separate issues, the former issue can be regarded as a logical extension of the latter problem in a certain sense, and the proper handling of the latter problem will also help to solve the former problem. The equitable distribution of data benefits is extremely complex, especially in the field of open data utilization, where not only are there many stakeholders, but also their respective roles in the data utilization process are difficult to characterize and quantify. Therefore, to achieve fair data distribution, it is not only necessary to establish a revenue distribution system that reflects efficiency and promotes fairness, but also needs to introduce a reflective and dynamic data governance mechanism in all aspects of data utilization, so as to provide an institutional platform for timely and full communication between multiple subjects, so as to effectively coordinate the complex interests of merchants, individuals, and data enterprises.

The original article was published in the second issue of "Legal Research" in 2023, thanks to the WeChat public account "Legal Research" for its authorization to reprint!

download丁晓东｜数据公平利用的法理反思与制度重构