Users who used linguistics:
Number of articles per page:
Systems, Man and Cybernetics, Part B, IEEE Transactions on 34 (2), 834-44 (2004)
The paper introduces a rough set technique for solving the problem of mining Pinyin-to-character (PTC) conversion rules. It first presents a text-structuring method by constructing a language information table from a corpus for each pinyin, which it will then apply to a free-form textual corpus. Data generalization and rule extraction algorithms can then be used to eliminate redundant information and extract consistent PTC conversion rules. The design of our model also addresses a number of important issues such as the long-distance dependency problem, the storage requirements of the rule base, and the consistency of the extracted rules, while the performance of the extracted rules as well as the effects of different model parameters are evaluated experimentally. These results show that by the smoothing method, high precision conversion (0.947) and recall rates (0.84) can be achieved even for rules represented directly by pinyin rather than words. A comparison with the baseline tri-gram model also shows good complement between our method and the tri-gram language model.
Speech and Audio Processing, IEEE Transactions on 8 (1), 76-84 (2000)
Multispan language modeling refers to the integration of various constraints, both local and global, present in the language. It was recently proposed to capture global constraints through the use of latent semantic analysis, while taking local constraints into account via the usual n-gram approach. This has led to several families of data-driven, multispan language models for large vocabulary speech recognition. Because of the inherent complementarity in the two types of constraints, the multispan performance, as measured by perplexity, has been shown to compare favorably with the corresponding n-gram performance. The objective of this work is to characterize the behavior of such multispan modeling in actual recognition. Major implementation issues are addressed, including search integration and context scope selection. Experiments are conducted on a subset of the Wall Street Journal (WSJ) speaker-independent, 20000-word vocabulary, continuous speech task. Results show that, compared to standard n-gram, the multispan framework can lead to a reduction in average word error rate of over 20%. The paper concludes with a discussion of intrinsic multi-span tradeoffs, such as the influence of training data selection on the resulting performance
Due to the emergence of SMS messages, the significance of effective text entry on limited-size keyboards has increased. In this paper, we describe and discuss a new method to enter text more efficiently using a mobile telephone keyboard. This method, which we called HMS, predicts words from a sequence of keystrokes using a dictionary and a function combining bigram frequencies and word length. We implemented the HMS text entry method on a software-simulated mobile telephone keyboard and we...
<< Prev 0 Showing entries 1 to 8 of 8 total Next 0 >>



