ZeroAI Project Architecture and Core Concepts
Project link: zeroai.haozheli.com
This project aims to use machine learning and natural language processing technology to determine whether the input text is generated by AI or written by humans. The following will detail the overall architecture, module functions, and core process of the project.
What is ZeroAI
ZeroAI is a machine learning-based classifier that can be used to predict whether a piece of text is generated by AI. For a given text, ZeroAI can predict whether it is generated by AI and provide the probability of AI generation. Unlike other neural network (NN)-based classifiers on the market, ZeroAI has a strong generalization ability, is less affected by language, and can judge whether the text is generated by AI sentence by sentence.
Working Principle
General Idea
If you often use tools like ChatGPT, you may have developed some ability to distinguish between AI-generated and human-written text. AI-generated text usually has the following characteristics:
- Lack of diversity, with a consistent language style.
- Three-part structure: introduction, point, and conclusion.
- Strong logic but lack of personalized expression.
Although these characteristics can be used as a reference, they cannot be used to completely determine whether the text is generated by AI. Therefore, we use machine learning technology to capture more subtle features in the text and make more accurate predictions.
ZeroAI uses a pre-trained GPT-2 model to calculate the perplexity value of the text, which is an indicator of the text's tone and content diversity. We quantify all input text into multiple perplexity values and then use a machine learning model to learn these features and predict whether the text is written by AI.
Core Process
- Text Input: The user submits the text to be detected.
- Text Preprocessing: Clean and segment the text to ensure input validity.
- Perplexity Calculation: Use the GPT-2 model to calculate the perplexity of each sentence.
- Feature Extraction: Statistics on average perplexity, maximum perplexity, number of sentences, and continuous low-perplexity fragments.
- Machine Learning Judgment: Input the extracted features into a random forest model to output the probability that the text is generated by AI.
Accuracy
ZeroAI has an accuracy rate of up to 97% in English testing. The specific test results are as follows:
precision recall f1-score
Human 0.97 0.97 0.97
AI 0.97 0.97 0.97
accuracy 0.97
macro avg 0.97 0.97 0.97
weighted avg 0.97 0.97 0.97
We have recently released the Zero AI Chinese Beta model, which has been fine-tuned for Chinese sentences and has also achieved an accuracy rate of 97%. The test results are as follows:
precision recall f1-score
Human 1.00 0.95 0.98
AI 0.94 1.00 0.97
accuracy 0.97
macro avg 0.97 0.98 0.97
weighted avg 0.97 0.97 0.97
The Beta model performs well in Chinese testing, but the results may fluctuate, and the Beta model does not support sentence-by-sentence analysis. We will update and iterate the Chinese model as much as possible.
Training and Open Source
ZeroAI's code is open-sourced on GitHub, and users can download and develop it freely.
FAQ
Q: How to use the ZeroAI classifier?
A: You only need to visit our webpage: zeroai.haozheli.com, paste or type your content in the input box, and click the "Analyze" button to start the analysis.
Q: How do I understand ZeroAI's analysis results?
A: ZeroAI will output the following for your reference:
- Label: The text classification result, displayed as AI-generated or human-written.
- Likelihood Score: The probability that the text is generated by AI, ranging from 0-100%.
If you click to view the detailed report, you can also get:
- The number of sentences determined to be AI-written.
- The average perplexity value.
- Sentence-by-sentence analysis results, highlighting potential AI-generated sentences.
Q: What is ZeroAI's accuracy rate?
A: ZeroAI has reached an accuracy rate of at least 97% in multiple tests. Specific data can be found in the "Working Principle" section. However, please note that this AI detection tool provides analysis based on statistical patterns and the results should be interpreted as probability indicators, not absolute judgments.
Q: Can ZeroAI recognize Chinese?
A: ZeroAI can recognize Chinese. If the input text is detected as Chinese, ZeroAI will start the Chinese Beta model for detection. Please note that this model is still in the Beta phase, and the accuracy rate may fluctuate, and it does not support sentence-by-sentence analysis.