TXT Chapter Splitting (Regular Expressions)

Why Split TXT

TXT is a plain text format that does not inherently include chapter information. If imported directly, some features within the application (like chapter navigation) may not work properly.

From both a design and user perspective, performing a simple chapter split before importing helps enhance the subsequent reading experience.

How to Split

Rodel Reader uses regular expressions to match characteristic chapter titles. Every time a title matching the pattern is found, a split is made, thus dividing the entire book.

Afterwards, the entire TXT file is split and combined according to predefined rules to form an EPUB file, which is then imported into the library. Therefore, what you read later is not the original TXT but this EPUB file.

If you have some basic knowledge of regular expressions, chapter splitting will be much easier for you.

If you don't know anything about regular expressions, no worries. Thanks to AI advancements, you can query any LLM to get a regular expression. For specific operations, see AI Generating Regular Expressions.

Built-in Rules

To simplify operations, Rodel Reader includes some basic chapter matching rules (which will be continuously improved):

Chinese Chapters:
Expression: ^(第)([―－\-─—壹贰叁肆伍陆柒捌玖一二两三四五六七八九十○零百千O0-9０-９]{1,12})([章节節回集卷部篇])(.*)
Description: This expression is suitable for matching common chapter-style titles, supporting the vast majority of online literature and traditional Chinese novels.
Numbered Chapters:
Expression: ^\d+\s*[\.\)\-\–\—\、\。\，\；]*[\s\u3000]*.+
Description: This expression can match formats like Number + Punctuation + Chapter Name, for example, 1. xxxxxx
English Chapters:
Expression: [Cc]hapter\s+(\d+|[IVXLCDM]+)(.*)
Description: This expression is mainly used for matching standard English chapter titles, such as Chapter 1 xxxxxx

Custom Rules

The built-in rules obviously cannot handle all situations, so you may need to write your own regular expressions if necessary.

You can view basic regular expression rules at Regular Expression - Syntax. This document focuses on the process with a simple example:

Use Notepad or another text editor to open the book you want to import (e.g., "Harry Potter")

We can see that its chapter title structure is CHAPTER xxx
Write a simple expression based on the chapter characteristics
We can replace the specific chapter number and title with placeholders .*, which is a simple matching pattern.
This gives us an expression like:
CHAPTER\s(.*)
where \s represents a space.
Input it in the APP to verify the result

From the result, our expression is correct, and it helps us properly split all the chapters.

AI Generating Regular Expressions

Now with AI, you can use regular expressions to split TXT novels like a pro even without understanding the rules of regular expressions.

Use your preferred AI service, modify the following prompt based on your actual situation, and give it to the AI.

TIP

If the regular expression result returned by AI is not as expected, try providing more chapter titles so that AI can better identify the title pattern.

markdown

Generate a regular expression that can be used in C# based on the following titles (only the expression, no code needed) to match similarly structured chapter titles, no grouping required, keep it as simple as possible:

```
{Your chapter title}
```

For example, for the above example of "Harry Potter", the modified prompt would be:

markdown

Generate a regular expression that can be used in C# based on the following titles (only the expression, no code needed) to match similarly structured chapter titles, no grouping required, keep it as simple as possible:

```
CHAPTER ONE
```

Give it to OpenAI, and its response would be:

plaintext

^CHAPTER [A-Z]+$

Plug it into the APP and try it out, no problem.

TXT Chapter Splitting (Regular Expressions) ​

Why Split TXT ​

How to Split ​

Built-in Rules ​

Custom Rules ​