Sentence splitting
Splitting a text into a sequence of sentences
Sentence splitting is the process of converting a text into a sequence of sentences.
Sentence splitting is an essential step in machine translation to build parallel data that is aligned at the sentence level. Alignment at the sentence level is necessary in machine translation to make training efficient.
Common approaches to sentence splitting include rules such as regular expressions and machine learning models.
Sentence splitting is also called sentence segmentation or sentence boundary detection.