China Institute for Socio-Legal Studies, Shanghai Jiao Tong University

2024-06-28 [author] Tang Yingmao preview：

[author]Tang Yingmao

[content]

Generative AI to help modernise trials

Tang Yingmao

Professor of School of Law, Fudan University

Member of the Planning Committee of the China Institute for Socio-Legal Studies, Shanghai Jiao Tong University

With the rapid development of generative artificial intelligence, especially the emergence of the video generation model Sora, the management of litigation sources is facing unprecedented opportunities and challenges. Sora's video generation ability can greatly improve the efficiency and effect of legal propaganda, case filing, evidence demonstration and other litigation sources, but it may also lead to a series of problems, such as evidence forgery and privacy infringement. By analysing the challenges brought by Sora to the management of the court's source management work, we put forward the coping strategies of strengthening the court's informatisation construction, establishing and improving the management norms of data security and outsourcing services, and adjusting the management mechanism to meet the needs of the development of AI in a timely manner.

Introduction

Sora can quickly "turn" text into video, and the resulting video is so lifelike that it is virtually indistinguishable from a real film made by a human being. In the spring of 2024, Hollywood producer and director Tyler Perry announced that he was halting plans for an $800 million studio expansion, following the release of test videos generated by Sora by the US-based company OpenAI. "If I want to be in the snow in Colorado, or I'm writing a scene on the moon, I can easily generate it with text through AI", he said, "and I no longer need to travel to the location". Elon Musk, the Silicon Valley Iron Man and owner of Tesla, also commented, "Humans are out of luck (GG humans)."

ChatGPT's expertise in "converting" words into text has made it a favourite among scholars and textualists. Sora has had a greater impact than text on the general public. One of the reasons is that the video by the general public's attention is much larger, on the work of the court, the source of governance to bring more far-reaching impact. An earlier study by the author showed that by the end of 2016, the number of visits to each adjudication document in each province across the country ranged from 8-20 times. In comparison, the average number of courtroom video broadcasts on the live trial website in 2016 was as high as 47,398, more than a thousand times the number of visits to the referee's documents website. Different groups pay different attention to different legal media: while academics and legal professionals care about referee.com, the public seems to pay more attention to court trial live streaming.com. The problem brought about by this is that the live broadcast of court hearings on the live trial website can easily create a situation where millions or tens of millions of people watch the trial online, which is prone to triggering online public opinion.

From the perspective of litigation governance, its nature is similar to the work of the masses, dealing with trivial matters such as parents and neighbours, etc. Sora's ability to quickly generate videos is easily favoured by the public and can also help the courts. It can help promote the rule of law, clarify the rights and obligations of the public, so that the people personally "feel" fairness and justice, and improve the accessibility of the governance of the source of the lawsuit, so that mediation deeper into the community, deeper into the grassroots, but it may also bring a series of problems such as evidence forgery, privacy infringement, ethical impact, and so on, to reduce the fairness of the mediation, and increase the difficulty of the governance of the source of the lawsuit. More importantly, as with other generative AI such as ChatGPT, the application of Sora in litigation source governance will most likely rely on the collection, labelling and training of massive data by enterprises, and the court system is afraid that it will be difficult to develop and apply it independently, or commercially, the cost of independent development will be too high. In the process of cooperation between the court and enterprises, how to prevent AI monopoly, or even AI hegemony in the context of innovation and openness, and to maintain the data security of the system of governance of the source of the lawsuit, all need to save for a rainy day and make a proper response.

1. Meaning and Benefits of Sora

1. 1 What is Sora

Sora is a text-to-video generation model that generates videos based on text instructions (prompts). The so-called text instructions can be understood as text prompts, or questions or requests in the form of text.Sora is a product of Open Artificial Intelligence (OAI). According to Open AI, Sora is able to generate complex scenarios that include multiple characters, specific types of movement, and accurate details of the subject matter and context.Sora not only understands what the user is asking for in the prompt, but also understands how the objects that the user is asking to be mentioned exist in the physical world.

As an example, in one of the test videos published by Sora, the following text instruction was entered into Sora: "A stylish lady walks through the streets of Tokyo, which are filled with warm neon lights and vivid city signs. She is wearing a black leather jacket, a long red dress and black boots, and is holding a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is wet and reflective, creating a mirrored effect of coloured lights. Many pedestrians walk around." Sora accurately understood the above textual instructions and generated a 60-second video. The video shows the Tokyo street, the neon lights, the lady's clothes and accessories, and the lady's movements are very natural and realistic.

In addition, in the test video, the camera keeps changing, far away and close, but the lady's image remains consistent. The video is amazingly realistic, as if it were a real film clip shot with a video camera on the streets of Tokyo. Based on the results Sora has shown, it is almost as good at creating videos as a human being. In addition to the "Lady on the streets of Tokyo" video, Sora also showed "Reflections on a Tokyo train window," "A church from a drone's perspective," and even test videos such as "Bike Racing with Marine Animals"，which are amazingly realistic.

Of course, these scenes can be filmed in real scenes or virtual scenes can be created using computer effects. However, human creativity is constrained by the real world. For example, people need to rest and cannot work 24 hours a day. In order to film a scene, the film crew needs to buy camera equipment and the director's team needs to organise crowd actors. This all requires a lot of manpower, time and money costs. Even if film special effects are used to build the virtual scenes, the costs behind them are also very high.Sora is different. As an AI model, it needs no breaks, no camera equipment, no organisation and no management. Sora's ability to produce video "affordably" has had a huge impact on both the tech and video production industries!

1.2 Why Sora is great

Before Sora appeared, there were already generative AI products on the market that could convert text into video, however, there is a significant gap between their capabilities and Sora. Firstly, previous generative models for video had a weak understanding of the instructions and could have comprehension errors, which in turn led to errors in the generated video. Second, the videos generated by previous models are not realistic enough. The images, actions, and camera changes are significantly weaker than the level of human video creation. Viewers can easily judge that these videos are "poor" works of AI. Finally, the previous models can only generate short videos of about 3 seconds, and it is difficult to generate longer video clips.Sora's excellent performance is due to its leading technical capability, which is the result of the combined effect of model design capability and engineering realisation capability.

1.2.1 Model Design Capability

From the perspective of model design, Sora uses the Diffusion Transformer architecture, which combines the advantages of the Transformer and Diffusion architectures.

Transformer architectures are widely used in large language modelling domains such as ChatGPT. It is similar to the "brain" of Sora. It evaluates the correlation between the input video and the message (textual cue), and is able to effectively capture the contextual relationships in the message (textual cue). This allows Sora to maintain a high degree of consistency between videos.

The diffusion model architecture is a commonly used model architecture in the field of image and video generation. The diffusion model is Sora's "paintbrush", and during the training process, the diffusion model learns how to generate videos and images step by step based on conditions, i.e., the text that describes the video content. With the conversion model as the "brain" and the diffusion model as the "brush", Sora has the infrastructure to generate video.

1.2.2 Engineering Capabilities

Model design may not be the key to Sora's success. In fact, the design of the diffusion transformation model used by Sora is not entirely original, and is largely based on a paper published in 2023 by William Peebles and Saining Xie. Later, William Peebles joined Open AI and was involved in the development of Sora. However, the paper that laid the foundation of Sora's technology is a clear departure from what it can generate today. It was only able to generate images, not videos. Moreover, the images it generates are mainly animals and objects, with very few human images. In other words, the work in this paper is just a proof of principle from 0 to 1. Although it is pioneering, it has not yet produced a mature technology product.

Firstly, open AI companies need to collect large amounts of video as data for training AI models (Sora). For example, if we take the world's largest video website, Tubing (YouTube), as an example, about 500 hours of video will be uploaded to Tubing every minute, then Tubing will add about 22 million hours of new video footage every month. These massive amounts of video are potential data for training Sora.

Secondly, Open AI needs to annotate the videos and construct training data that corresponds to the text and videos. Take the text-to-picture generation model of Open AI as an example, the first generation of text-to-picture generation model is called Dall-E model, which is trained on 250 million "text-to-picture" data. To put it simply, it is necessary to manually (or with the help of technical means) annotate each picture, indicating what the corresponding text is, such as whether a picture is a Starbucks coffee cup or a small car, and thus build a text-picture corresponding data set. The 250 million "text-image" data are thus labelled one by one for AI model training. The so-called training is to "teach" the AI model the correspondence between text and pictures. Industry insiders often joke about this labelling work to illustrate the relationship between human labour and intelligence: how much powerful intelligence can be generated, how much human labour is needed to support it.

Finally, training Sora requires massive amounts of storage space and massive amounts of arithmetic power. Assuming that the storage space occupied by each hour of video is 100MB, and the above mentioned oil pipe adds 22 million hours of video per month, the storage space required for these videos is about 2,200TB. assuming that the number of Sora's references is about 30B , if we want to train these videos, this paper speculates that just the cost of the hardware (the cost that needs to be paid for the use of the massive arithmetic power) will require about hundreds of millions of dollars!

Taking our live trial network as a reference, as of February 2024, the live trial network has amassed a total of about 22 million trial videos, and according to an average of 1 hour per video, we have accumulated 22 million hours of video, which is roughly equivalent to the amount of video uploaded by the oil pipe every month. If we use 22 million hours of video to train Sora, the hardware cost (arithmetic fee) that we need to invest in alone could be as high as hundreds of millions of dollars!

2. The Intelligence of Sora

The success of Sora not only marks a significant advancement of AI technology in the field of video generation, but also heralds a complete revolution in the AI paradigm. In recent years, generative AI such as ChatGPT has been rapidly developed, which not only expands the scope, domain and use of AI, but also, it dramatically changes the way traditional AI is applied.This difference is further strengthened after the emergence of Sora.

2.1 Traditional AI: specialised tools

Generative AI is not a mainstream application until the ChatGPT boom in 2023. In the past two decades, the mainstream applications of AI have been supervised learning, unsupervised learning, and reinforcement learning. Of these, supervised learning is among the most important AI applications.

For example, speech recognition technology has been widely used in the judicial field, which relies on supervised learning AI. Specifically, some courts use speech recognition technology to assist clerks in recording court proceedings and making court transcripts. To train a speech recognition model, the whole process is similar to teaching an AI to understand the correspondence between human voice frequencies and words through a large number of examples. Therefore, it is first necessary to construct a dataset as the basis for training the AI model. In the speech recognition scenario, the dataset includes the human voice frequency, and the corresponding text text. After labelling the dataset and extensive training, the AI model will be able to learn the relationship between audio and text, or "knowledge". After training, if a new piece of audio is input, even if the AI has never "heard" this piece of audio, it can convert the audio to text based on the "knowledge" it has learnt.

In addition to speech recognition, there are many other application scenarios for supervised learning AI in the judicial field. For example, the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) system in the United States, which is also a supervised learning-based AI system, is used to assess a defendant's risk of re-offending after parole and to assist the judge in making a decision on whether or not to parole the defendant, to allow the defendant to rejoin society instead of detaining him in a detention centre or prison. In order to train such a model, it is necessary to comb through historical data to construct a dataset of "defendant characteristics - recidivism".

Defendant characteristics refer to data such as age, gender, employment, and criminal history of the defendant, while recidivism refers to whether the defendant has reoffended after being released on parole. The AI model learns from historical data to "understand" the association between defendant characteristics and recidivism. For example, if a male defendant is more likely to reoffend than a female defendant, the AI model remembers the "male-recidivism" relationship. When faced with a new case, i.e. a new defendant, the AI model can predict the risk of re-offending based on various characteristics of the defendant and further assist the judge in making a decision on whether or not to grant parole.

Traditional AI possesses outstanding strengths as well as significant weaknesses. In terms of strengths, traditional AI requires a small amount of data and relatively low arithmetic costs, which makes it easy to deploy and easy to see results. For example, the Compass system described earlier built its recidivism risk assessment system using data from just about 7,000+ defendants. For data volumes of this size, and for AI models using this model architecture, users can train and deploy them on an ordinary home computer. The "small scale" nature of traditional AI makes it easy for organisations to train, deploy and use, and easy to "see results".

In terms of weaknesses, traditional AI is highly specialised, which limits its applicability. As can be seen from the training method of traditional AI, different AI models are used to handle different tasks, and different datasets need to be constructed and used to train the AI models for "learning". It is difficult to transfer the processing power of an AI model from one task to a new task. For example, if we need to build an AI model that identifies the points of contention in a judgement, then we need to build a dataset of judgements, annotate the points of contention in each judgement, and then use it to train the AI model. However, if we need an AI model to write summaries of the decisions, we need to build a "decision-summary" dataset and train a new AI model. Thus, different tasks - identifying points of contention or summarising decisions - require different datasets to be constructed (labelled) and different AI models to be trained with different datasets. From this perspective, the cost of deploying traditional AI does not diminish at the margin: each time a new AI is deployed, a new cost needs to be invested. To apply AI to a new task, you need to develop a new AI model, and often you need to start with the "physical work" of labelling the data and constructing the dataset. Therefore, in the traditional AI era, although AI is "small and beautiful", easy to develop and deploy, but the expansion and application of AI faces cost constraints.

2.2 Generative artificial intelligence: a general-purpose intelligence tool

After the emergence of ChatGPT, this means that generative AI has made significant progress, breaking through the traditional AI paradigm in one fell swoop. Generative AI performs diverse tasks, responds to a variety of human commands, and is highly versatile: the same model can be used for a wide range of tasks. Therefore, it is also considered to be the precursor of General Artificial Intelligence (AGI). In the era of generative AI, if you want AI to assist in processing text, or if you want AI to assist in generating various types of videos, you no longer need to train the AI from scratch, but only need to give simple instructions to the big language model. For example, if we want the AI model to output a summary of a judgement, we only need to input the judgement into the big language model and instruct it to generate a summary, then the model can output the summary of the document we need.

However, the versatility demonstrated by generative artificial intelligence is based on a significant initial investment. Both Sora and ChatGPT have not made significant breakthroughs in principle: they are based on statistical learning frameworks, predicting the next possible word based on cue words. However, the enormous amount of engineering required for large language models has enabled artificial intelligence to "emerge" with astonishing capabilities. Taking Meta's LlaMa2 model as an example, its maximum number of parameters reaches 70 billion, and the corpus required for model training is 2 trillion tokens, each of which is equivalent to approximately 1.3 English words. If a person reads one million tokens every day, it takes 5000 years to read. Therefore, large language models have a huge number of parameters and require a huge training set, which makes the training cost of large language models very high, especially the computational cost is very high. For example, the LlaMa2 model trained 33111616 GPU hours on a graphics processing unit (GPU) cluster, which cost approximately $5 million just for the hardware cost (computing power) of training.

Therefore, in the era of generative artificial intelligence, ordinary institutions find it difficult to independently develop and deploy their own artificial intelligence like in the traditional era of artificial intelligence. They usually need to collaborate with external parties to develop a specific application. On the one hand, the huge initial investment has set a high threshold for the development of generative models, and ordinary institutions are basically unable to develop new models; On the other hand, even if not developed independently but using open-source generative artificial intelligence models, the cost of independent deployment is also very high, requiring very high hardware costs. The development and deployment of Sora's application scenarios, including its application in source governance (detailed below), may be difficult to achieve solely with the court's own investment.

2.3 The Integration of Traditional Artificial Intelligence and Generative Artificial Intelligence

The wave of generative artificial intelligence has just emerged, and there may not be a clear answer to where future artificial intelligence technology will develop. In the near future, traditional artificial intelligence and generative artificial intelligence will each demonstrate their strengths, complement each other's strengths, and deeply integrate.

On the one hand, generative artificial intelligence can provide support for the application of traditional artificial intelligence and provide data input for traditional artificial intelligence. For example, the author's team participated in the design of a traditional artificial intelligence model to assist in determining whether the parties involved are willing to mediate. In order to improve the effectiveness of the model, team members attempted to add some new indicators. One of the indicators is the emotional state of the parties involved in the mediation process, which is measured by the emotions reflected by the parties speaking in the mediation recording. If we follow the traditional research path of artificial intelligence, we need to manually construct a dataset between the mediation call audio features of the parties involved (such as pitch, speaking speed) and emotions (such as speaking slowly meaning calm). However, team members used a generative artificial intelligence model to input the call records of the parties involved into a large language model, which quickly made judgments and determined the emotional states of the parties involved (such as happiness, depression, etc.). Team members input the emotional state results into traditional artificial intelligence models, and by adding this parameter or indicator, the traditional artificial intelligence models have significantly improved their ability to judge the willingness of parties to mediate.

On the other hand, looking at it in reverse, generative artificial intelligence can also integrate the application of traditional artificial intelligence, incorporate the output results of traditional artificial intelligence into the indications of generative artificial intelligence, and further expand the application scenarios of generative artificial intelligence. For example, by inputting the output of the Compass system (traditional artificial intelligence) (i.e. whether the defendant is on parole) into a generative artificial intelligence model, the generative artificial intelligence, combined with the final ruling of the judge (i.e. whether the defendant is on parole), can automatically generate a ruling. Similarly, with the help of video generation artificial intelligence models such as Sora, corresponding video archives can be generated for archiving or public supervision.

3. The potential application of Sora in the governance of litigation sources

Whether it is traditional artificial intelligence or generative artificial intelligence, they have a wide range of applications in the governance of litigation sources. In some aspects of litigation source governance, they have even emerged and begun to play unexpected roles. As a generative artificial intelligence, Sora has many potential application scenarios, mainly concentrated in areas where video has a significant effect on the governance of litigation sources, such as legal publicity, auxiliary case filing, and evidence display. In the long run, the combination of Sora and virtual reality (metaverse) technology may also create realistic virtual mediation scenarios and improve the effectiveness of litigation source governance.

3.1 Auxiliary filing

The use of generative artificial intelligence in the filing process essentially utilizes the historical experience data accumulated by the court to train generative artificial intelligence models (such as ChatGPT or Sora), improving the accuracy and efficiency of filing.

The classification of the names of different causes of action, the matching of the names of causes of action with the factual descriptions in the indictment, and the types and contents of evidence materials required for different causes of action are all historical data that have been formed by the court. Using this data, the generative artificial intelligence model is trained. After the model is trained, if the parties present their case in the court filing hall, the trained model can quickly convert the parties' statements into "instructions" (questions), and the model generates and outputs corresponding answers, namely, which cause of the case should be classified, what the parties' claims are, what evidence materials are required, and what the trial elements of the case are

With the modernization of judicial work, in many courts, case materials, including indictment, defense, documentary evidence, physical evidence, etc., have been digitized and stored in PDF format on court servers. With the help of technologies such as Optical Character Recognition (OCR), the presiding judge has been able to retrieve, query, and calculate electronic documents. In this context, use these electronically stored case data to train the big language model. After completing the model training, the parties come to the court to file a case. Based on the parties' expressions, the big language model can quickly provide feedback on what the dispute belongs to, what the parties' demands are, whether the parties are qualified, whether to mediate first or directly file a case for trial

Compared to ChatGPT, Sora's role is to quickly convert certain filing materials, such as the plaintiff's description of the case process (such as the occurrence of a traffic accident), into videos and submit them as auxiliary filing materials to the court's filing court judge for review. Alternatively, the filing court judge can use Sora embedded in the court's filing management system to quickly convert complex textual information into video format, and conduct further review based on the video to help them understand the case faster and better, and quickly file, divide or take other measures. Even as Sora becomes more mature, the court filing system automatically generates videos to assist in judgment based on the case situation.

In the filing process, Sora's auxiliary role may seem simple, even a bit redundant: why do we still need a video version of the complaint and evidence materials when we have a written version? However, it is hard to underestimate the role of Sora in improving the efficiency and effect of case filing through video when considering why Tiktok is so popular around the world and its impact and empowerment on other industries and scenes. For example, in the process of product display on e-commerce platforms, in addition to the original text and image introductions, short videos or even VR (virtual reality) videos are almost simultaneously displayed. Buyers can truly feel the texture and size of clothes, as well as the size and depth of rooms.

From the perspective of litigation source governance, Sora's auxiliary role in filing cases may be more prominent. The reason behind it is very simple. A considerable portion of disputes involving people's mediation, industry mediation, grassroots mediation, and other mediation organizations are disputes such as "family feuds" and "trivial matters". For such disputes, converting them into videos as auxiliary filing materials based on the expressions of the parties is highly likely to be more vivid than written materials and easier to understand by the filing personnel of various mediation organizations. Presenting core evidence materials in the form of videos, such as presenting traffic accident scenes in video format, can quickly help the filing personnel feel the specific scene of the dispute and understand the specific characteristics of the dispute. The latter, based on the characteristics of disputes, searches, pushes, and matches other mediation organizations, personnel, and resources that are most suitable for resolving such disputes within the "entire network" of the court's online mediation platform. This is an example of Sora's convenience for the people and assistance in modernizing litigation source governance.

3.2 Promote reconciliation and mediation

3.2.1 Promoting regulation

From foreign research, it can be seen that pushing the judgment information of similar cases to the parties helps them understand the court's position in similar cases, thereby promoting the parties to reach a settlement or mediation agreement. This has achieved certain results in labor dispute mediation in Mexican courts. In China, the application of artificial intelligence technology to promote mediation between parties is not yet widespread. In some courts, the filing hall reminds the parties of the litigation risk, and the filing equipment concretely displays the risk of losing the lawsuit; In some social mediation organizations, computer systems send text messages to parties to inform them of litigation risks and promote their choice of mediation. There are many similar practices, and there are also many studies abroad on similar measures to promote parties to choose mediation or settlement. However, although there are many such practices, they all belong to non intelligent mediation measures. From the perspective of promotion, various types of artificial intelligence have practical and potential application scenarios. In certain cases that require visual display, Sora's mediating effect may be very evident.

For example, based on the personal characteristics of the parties involved, such as gender, age, educational background, etc., as well as the characteristics of the mediation case, such as the amount, term, interest rate, etc. in loan disputes, traditional artificial intelligence models can make some simple predictions. For example, what gender of person and how long the debt period is in a case, and what kind of language will the system (or mediator) use for these people and cases to better resolve disputes? Is it better to use "risk oriented" language, such as telling the parties to apply for compulsory execution if they don't repay the money again, or use "incentive oriented" language, such as telling the parties how much principal and interest can be reduced in a one-time repayment. According to the predictions of traditional artificial intelligence models, people's courts and various mediation organizations can use online mediation platforms to push text messages with different language skills to the parties involved. Through the combination of language skills and technology, disputes can be resolved through non litigation methods.

The essence of using generative artificial intelligence for mediation is that the system can more intelligently push mediation information to the parties involved, and the content of the pushed mediation information is richer and more three-dimensional in image. For example, traditional artificial intelligence can usually only push a single piece of information, such as the judgment result information of class cases (who wins and who loses), or the comprehensive win rate information of multiple class cases (what is the plaintiff's win rate). Based on generative artificial intelligence models like ChatGPT, online mediation platforms can obtain multiple types of information and push all the information to the parties involved, such as simultaneously pushing the winning rate of the original lawsuit, the proportion of the defendant's one-time repayment or installment repayment, and the amount of the defendant's one-time repayment (relative to whether the principal is discounted or interest is reduced), in order to help the parties obtain more comprehensive and systematic information and facilitate the parties to reach a mediation.

If Sora is applied to the mediation process, due to the imagery and authenticity of the video, unexpected mediation effects may still be achieved. For example, when pushing "risk oriented rhetoric" to the parties involved via text message, the content of the message is text. No matter how you use text to depict a certain risk, if you don't repay the money, you will end up in prison, and the impression of the parties involved will definitely not be profound enough. However, if Sora transforms text (such as finding specific information about judgments and evidence in similar cases) into images and videos, and then pushes them to the parties involved, whether it is a "risk oriented" or "motivational" language, the visual sense (such as prisoners squatting in prison), three-dimensional sense (such as high walls and iron windows in prisons), and realism (such as specific prisons in the jurisdiction) of the mediation information will greatly enhance the effectiveness of the mediation information on the parties involved, promoting both parties to resolve disputes through mediation.

3.2.2 Mediation

Similarly, due to the realism brought by videos, generative artificial intelligence, especially artificial intelligence models like Sora that can generate videos, can greatly improve the visualization of the mediation process, and even create immersive real mediation scenes based on virtual reality technology (metaverse).

For example, during the mediation process, Sora generates a video of the accident scene based on the description of witnesses, simulating the process of the incident, helping mediators and parties understand the details of the case, similar to the effect of displaying VR (virtual reality) products on e-commerce platforms. For example, when the victim is unable to be present, Sora creates a virtual victim, allowing the victim to recount their experiences, reducing the psychological pressure of the victim testifying in court and causing further harm. For example, with the help of generative artificial intelligence models, virtual intelligent mediators can even be created. Virtual mediators can not only have simple conversations with the parties involved, like modern voice robots, but can also engage in complex communication with the parties involved with the support of large language models. Sora can even generate a video version of a virtual mediator to engage in complex voice communication with the parties involved in a specific video image.

At the most advanced stage, with the help of virtual reality technology, mediators use Sora to create virtual mediation rooms that reflect their preferences, create the virtual mediator image they want to showcase, and the parties also use Sora to create the virtual image they want to showcase. Each virtual image engages in dialogue and mediation in the virtual mediation room, creating an immersive and authentic mediation scene. In other words, the mediator and the parties involved may each stay at their own homes, wearing pajamas and slippers, lying on the sofa, and enter the virtual mediation room through their respective accounts to engage in online dialogue and mediation. Their image in the virtual mediation room may be serious in their suits and shoes, or they may have a graceful Chinese robe, resembling a serious conversation.

3.3 Mediation publicity and management

In the filing process, Sora helps the parties to file cases conveniently, and helps courts and mediation organizations efficiently collect, divide cases, and match mediation resources. This actually involves the potential application of Sora in mediation, publicity, and management.

From the perspective of mediation and publicity, for example, after the introduction of a new legal regulation, judicial interpretation, and guidance case, with Sora's ability to convert text into video, the court can quickly produce legal publicity videos, improve public legal awareness, and lower the threshold for legal services. For example, Sora can convert the content of citizen rights and obligations in specific laws and judicial interpretations into promotional videos to quickly explain and promote the specific rights and obligations of citizens, which is also a potential application scenario of Sora.

Similarly, for typical cases that arise in people's mediation, industry mediation, social mediation, and court mediation, Sora quickly generates promotional videos based on textual versions of mediation records and agreements, or generates relevant videos based on mediation records, mediation process recordings, and even virtual mediation room videos that have already been generated. This is also the potential application of Sora in mediation promotion, targeting the key elements of the case.

From the perspective of mediation management, in the future, with the expansion of court online mediation platforms, especially their further expansion to society, industry, and grassroots, the cases of court online mediation platforms not only include mediation cases handled by courts, but also mediation cases handled by various mediation organizations. Therefore, using generative artificial intelligence to achieve dynamic situational awareness of litigation source governance and assist courts in making litigation source governance decisions will also be a potential application scenario of generative artificial intelligence, including Sora.

In other words, court staff engaged in the governance and management of litigation sources, as long as they can ask questions and provide prompts, the online mediation system can intelligently respond and even provide suggestions for optimization work. For example, when the president of a grassroots court logs into an online mediation platform and verbally asks "the number, distribution, characteristics, and difficulties of property management disputes in this jurisdiction this month", the platform implanted with generative artificial intelligence will automatically provide the corresponding answer. If the platform is supported by Sora, the answer provided by the platform can not only include a textual version of the answer, but also a short video answer, allowing the questioner to quickly and vividly perceive the situation of source governance and obtain corresponding countermeasures and suggestions.

4. The challenges and responses brought by Sora to the governance of litigation sources

In the governance of litigation sources, Sora has the potential applications mentioned above, which carries great imagination. There is great uncertainty in whether various applications can be implemented, to what extent, and when. Therefore, we cannot overestimate the risks and challenges that Sora brings. However, with the accelerated development of artificial intelligence and the continuous alignment of Sora, a video generation model, with the preferences of the public, we also need to be mentally prepared and have contingency plans in place regarding the risks and challenges that Sora brings to the governance of litigation sources.

4.1 Challenges and responses to fair mediation

Sora can improve case filing efficiency, enhance the visibility of evidence (video evidence), protect the privacy of witnesses (virtual witnesses), and enhance the authenticity of virtual mediation, all of which rely on the higher dissemination effectiveness and influence of video media, as well as the ability of videos to highly reflect the real world. However, when the video world is highly integrated with the real world, and the audience cannot distinguish between the imagined world (video world) and the real world, Sora will bring a series of challenges to the fairness of mediation.

For example, highly realistic video production capabilities may be abused by parties to create misleading or false evidence, thereby disrupting the mediation process. If the threshold of Sora is too low, anyone can use it (very likely in the future), just like everyone can skillfully operate the "clip" app and produce Tiktok short videos, then the parties can produce high fidelity video evidence at a low cost, which may affect the fairness of mediation.

Similarly, unauthorized production of videos involving specific individuals or events may violate privacy rights, especially without the consent of the parties involved. The video face swapping technology, which was previously hotly discussed in society, is almost indistinguishable from its authenticity, and is a typical scenario where video technology infringes on the privacy of the parties involved and even constitutes a crime. Even if it does not constitute a crime or infringement, using someone else's image in inappropriate situations and in highly realistic situations may blur the boundary between law and ethics, thus bringing new problems to the governance of the source of litigation.

However, Sora is definitely not the first, and certainly not the last, technology that poses challenges to judicial work. The solutions to the problems brought about by technological development often rely on further technological advancements. For example, in 1965, when the Supreme Court of the United States ruled that live court proceedings interfered with fair trials, the live court proceedings required large and cumbersome machines, various wiring in the courtroom, a large amount of lighting, and manual operation by photographers. Undoubtedly, such a lively live broadcast is a significant and strong interference for judges and litigation participants.

However, more than fifty years later today, especially during live court proceedings in China, judges and parties can hardly feel the presence of live broadcasts except for the almost invisible cameras fixed on the court walls. Even some Chinese scholars who oppose live court hearings rarely adopt the logic of the 1965 US Supreme Court, as the times have undergone tremendous changes. Therefore, in 1965, as a new technology, television live streaming emerged, which caused significant interference in the fair judgment of American judges and was met with resistance from American justices. In the 21st century, the emergence of high-tech cameras, the development of live media technology, the improvement of internet speed, and the popularization of mobile phones have greatly reduced the anxiety of judges and the public about live streaming interfering with trials.

Similarly, Sora has the ability to generate realistic videos, which will inevitably bring challenges and interference to trial and mediation work, and bring new problems to the governance of litigation sources. However, it will also be gradually solved with the development of video recognition technology. Even when the threshold for video production and recognition is lowered, and Sora becomes a part of people's daily lives, the false evidence, privacy rights, and ethical issues we imagine may no longer exist. The role of legal rules lies more in "identification" and "waiting": when technological development is not sufficient to solve technical problems, legal rules should prohibit or restrict the application of technology; When technological development is sufficient to solve the challenges brought by technology, legal rules should embrace the application of technology, or at least not hinder its application, allowing the public to gradually adapt to the changes brought by technology until it becomes a part of ordinary people's daily lives.

4.2 Challenges and responses to the informatization of litigation source governance

Although generative artificial intelligence has a wide range of potential applications in litigation source governance, it is still in the exploratory stage based on current practices in various regions. There are many reasons for this. Generative artificial intelligence requires a large amount of data, storage space, and computing power to fully utilize its strengths and advantages. Sora, which has the ability to generate videos, has a more diverse range of data dimensions, including sound, text, images, and time, requiring more storage space and computing power. Even if the resources of the entire court system are utilized, it may be difficult to support independent development of relevant models and applications, or from a commercial perspective, the cost of investment may be too high to be worth developing models and applications separately. In addition, legal professionals prefer text and deep reading, with relatively low demand for sound, images/videos, which can also affect Sora's application in the judicial field.

However, to some extent, the governance of litigation sources focuses on mass work, involving trivial matters related to family matters. The parties involved have a relatively higher demand for audio, video, and images, and place greater emphasis on the efficiency and speed of information dissemination. The generative AI technology represented by Sora is likely to be quickly accepted and used by the public. Similar to the popularity of Tiktok short videos all over the world in a few years, it will bring new technologies into court and mediation, which may lead to a situation of passive response, passive investment and passive construction of courts.

In order to address this challenge, the informationization construction of courts needs to further strengthen interconnectivity and reserve space for future technological development. At the same time, the court system also needs to do a good job in outsourcing technical services to prevent a series of issues such as data security and supplier security.

For example, the ongoing construction of a "one network" for courts aims to form a nationwide unified trial management system, facilitating the interconnection of trial data across the country. From the perspective of litigation source governance, the construction of a "one network" should not only achieve interconnectivity between the trial management system, but also with the online mediation platform system. Or, in other words, whether it is a court filing judge or a trial judge, they need to reserve ports for relevant mediation functions in the trial management system they use. In this way, judges can distribute cases that are not suitable for trial on their trial management system to online mediation platforms, and connect platform mediation resources through the platform. They can also retrieve cases that cannot be mediated or are not suitable for mediation, allowing them to re-enter the trial process and achieve dynamic litigation mediation integration.

At the same time, the interconnection and interoperability between "One Network" and online mediation platforms not only means the port docking of the two platforms and systems mentioned above, but also means the interconnection of the underlying data of the two platforms and systems. Only in this way can the various applications of generative artificial intelligence mentioned above, especially the intelligent management of complaint source governance and dynamic situational awareness, be truly realized.

Furthermore, considering the characteristics of generative artificial intelligence such as Sora, such as big data, large storage, and high computing power, it is highly likely that the court will need to rely on external third parties to cooperate with them in developing Sora's application in the source of litigation governance scenario. In this situation, making good use of outsourcing service providers and managing them well becomes an important part of the source of litigation governance work. For example, the court provides a massive amount of litigation and mediation file data to outsourcing service providers for training generative artificial intelligence models such as Sora. The court transmits the filing and evidence information of new trial and mediation cases to outsourcing service providers, and the data is transmitted from the internal network to the external network. Therefore, outsourcing service providers must be able to ensure the privacy and security of data processing on the external network.

For example, generative artificial intelligence models such as Sora are open-source models and usually do not have traditional model ownership issues. However, should the court claim ownership or similar rights to a certain artificial intelligence model trained with court data in cooperation with outsourcing service providers, in order to ensure that it can not be "choked" by outsourcing service providers on the basis of limited investment while continuously upgrading the model and increasing costs? In response to the challenges of cooperation between institutions and enterprises, technological upgrades, and service outsourcing in the context of the big model, the court needs to have a sound system for data outsourcing and service outsourcing to ensure.

4.3 Challenges and responses to the governance and management system of litigation sources

Without the intervention of generative artificial intelligence, from the perspective of the court, the internal and external relationships of litigation source governance have undergone significant changes. From an external perspective, under the institutional arrangement of the Party Committee's leadership in the governance of litigation sources, the court needs to go out and connect with people's mediation organizations, industry mediation organizations, and social mediation organizations, thereby expanding the traditional boundaries of trial organizations. From the perspective of internal relations, under the concepts of diversified dispute resolution, litigation coordination, and one-stop governance, the functions of the court filing division have been greatly strengthened. The litigation service halls of courts in various regions are very grand and modern, which is a manifestation of the adjustment of internal relations within the courts.

With the support of future generative artificial intelligence, especially Sora video generation technology, the adjustment of the internal and external boundaries of the court mentioned above will become more apparent. For example, currently, in addition to the mediation systems connecting courts across the country, online mediation platforms have incorporated a large number of social mediation organizations, industry mediation organizations, and grassroots mediation organizations into the platform. The concept of litigation coordination and one-stop diversified dispute resolution is no longer as simple as a few industry mediation organizations staying in the litigation service hall of a county or district court, or a court chief filing judge regularly holding a litigation source governance coordination meeting with the People's Mediation Committee. Against the backdrop of the vigorous expansion of online mediation platforms, a certain intellectual property dispute in Nanshan District, Shenzhen may be recommended by the platform to a talkative Northeast uncle in Changchun for mediation, thereby achieving nationwide coordination, allocation, and optimization of mediation resources under the guidance of the court.

When generative artificial intelligence becomes more intelligent, Sora video capabilities become stronger, and online mediation platforms become more convenient and intelligent, the relationship between courts and other mediation organizations will become more complex. For example, the issues of false evidence, video privacy, ethical and moral issues mentioned earlier will increase the difficulty for a court to coordinate its relationship with other mediation organizations, mediators, and mediation cases nationwide with the expansion of online platforms. The Nanshan District Court in Shenzhen not only needs to coordinate online with the Northeast Uncle mediator, but may also need to handle many events and issues that have not been encountered before online. In addition, due to the existence of technology outsourcing services, how to handle the relationship between the court and outsourcing service providers, how to handle a series of technical issues such as internal and external networks, court data and external (processed) data, open source models and trained models, etc., is also a challenge for the litigation source governance management system.

Summary

Developing productive forces requires a production relationship that is compatible with it. The advancement of litigation source governance technology also requires the transformation of corresponding management mechanisms. In this sense, the governance and management system of court litigation sources is facing profound changes in demand. The development of generative artificial intelligence, especially the Sora video generation technology, is becoming a catalyst for the transformation of the court's litigation source governance management system. How to change, when to change, and the magnitude of the change depend, of course, on the speed of the development of generative artificial intelligence technology and its application speed in the governance of court litigation sources. From the current development status of these two factors, it can be seen that the urgency of establishing a "Complaint Source Governance Office" within the court to coordinate the relationship between the court and external mediation organizations, the relationship between the court's internal filing court and other courtrooms, and the relationship between the court and service outsourcing providers may not be high. However, with the penetration of generative artificial intelligence, including Sora, in the governance of court litigation sources, the transformation of internal management systems, the improvement of management rules, and the establishment of management institutions will inevitably be included in the agenda.

The original text was published in the second issue of China Applied Law in 2024. Thanks for the authorized reprint of "China Applied Law" on WeChat official account!

download（英文）唐应茂｜生成式人工智能助力审判工作现代化