Close Menu
    Facebook X (Twitter) Instagram
    SciTechDaily
    • Biology
    • Chemistry
    • Earth
    • Health
    • Physics
    • Science
    • Space
    • Technology
    Facebook X (Twitter) Pinterest YouTube RSS
    SciTechDaily
    Home»Science»Linguistics Research May Improve Future Internet Search Engines
    Science

    Linguistics Research May Improve Future Internet Search Engines

    By Max Planck InstituteJuly 17, 20121 Comment5 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn WhatsApp Email Reddit
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email Reddit
    Symbolic representation how language can be depicted in a binary sequence
    Symbolic representation how language can be depicted in a binary sequence, thus making it possible to draw conclusions about the content of the text. Credit: Gianluca Costantini

    By translating ten different English texts into various codes, scientists from the Max Planck Institute for the Physics of Complex Systems found a long-range correlation between letters and also within higher linguistic levels, such as words.

    Human beings have the ability to convert complex phenomena into a one-dimensional sequence of letters and put it down in writing. In this process, keywords serve to convey the content of the text. How letters and words correlate with the subject of a text is something Eduardo Altmann and his colleagues from the Max Planck Institute for the Physics of Complex Systems have studied with the help of statistical methods. They discovered that what denotes keywords is not the fact that they appear very frequently in a given text. It is that they are found in greater numbers only at certain points in the text. They also discovered that relationships exist between sections of text which are distant from each other, in the sense that they preferentially use the same words and letters.

    The Dresden-based scientists mathematically studied the semantic properties of texts by translating ten different English texts into various codes. One of the chosen texts was the English edition of Leo Tolstoy’s “War and Peace.”

    One example of what the scientists did was translate letters in a text into a binary sequence. They replaced all vowels with 1 and all consonants with 0. By employing additional mathematical functions, the scientists examined different levels of the text – both individual vowels and letters, as well as whole words – which had been translated into various codes. In so doing, it was possible to identify repeating patterns within the text as a whole. Such correlation within a text is referred to as long-range correlation. This indicates whether two letters located at arbitrarily distant points in the text are connected with each other. For example, when we find a letter “W” at a certain point, there is a measurably higher probability that we will find the letter “W” again a few pages later. “Understandably enough, if a certain point in the book talks about war, there is a high probability that the word war will also appear a few pages later. What is surprising is that we also find this higher probability at the level of individual letters,” says Altmann.

    Keywords are more frequent in certain passages of text

    The scientists found this long-range correlation not only between letters, but also within higher linguistic levels, such as words. Within individual levels, the correlation remains when looking at different texts. “What we find much more interesting is to examine how the correlation changes between the levels,” says Altmann. Long-range correlation enables the scientists to draw conclusions about the extent to which certain words are connected to a topic. “Even the connection between a word and the letters it is composed of can be analyzed in this way,” explains Altmann.

    Furthermore, the scientists also studied what is known as “burstiness,” which describes whether increased occurrence of a pattern of characters is present in a passage of text. It shows, for instance, whether a word comes up at increased frequency in a certain text section. The more frequently a certain word is used in a passage, the more likely it is that that word is representative of a certain subject.

    The scientists demonstrated that certain words come up repeatedly throughout a text, are however not present in bursts in a given text passage. Although these words do exhibit long-range correlation, they are not closely related to the topic at hand. “Articles are the best examples of these. They come up very frequently in every text, but they are not crucial in conveying a given topic,” says Altmann.

    Statistical text analysis works irrespective of language

    Whereas both letters and words exhibit long-range correlation, it is rare for letters to appear in bursts at certain points in a text. “It is, in fact, very rare for a letter to be as closely connected with a topic as the word it forms a part of. In a manner of speaking, letters can be used more flexibly,” explains Altmann. An “a,” for example, can be a part of a great many words that have no connection with one and the same topic.

    The scientists employed statistical text analysis as an easy way of identifying the defining words of a given text. “By so doing, it is absolutely irrelevant which language the text is written in. The only thing that matters is the story and not language-specific rules,” says Altmann. Their findings could be used in future to improve Internet search engines, and they could also help to analyze texts and identify plagiarism.

    Reference: “On the origin of long-range correlations in texts” by Eduardo G. Altmann, Giampaolo Cristadoro and Mirko Degli Esposti, 2 July 2012, Proceedings of the National Academy of Sciences.
    DOI: 10.1073/pnas.1117723109

    Never miss a breakthrough: Join the SciTechDaily newsletter.
    Follow us on Google and Google News.

    Language Linguistics Mathematics Max Planck Institute
    Share. Facebook Twitter Pinterest LinkedIn Email Reddit

    Related Articles

    The End of Language As We Know It? Scientists Challenge 60 Years of Linguistic Research

    Linguists Tested 191 Universal Grammar Rules. Only One-Third Survived

    The Battle of Tongues: When Languages Collide, Which Survives?

    Challenging Linguistic Assumptions: Size of Society Doesn’t Dictate Grammar Complexity

    New Hybrid Hypothesis Shakes Up Indo-European Language Origin Theories

    Old World Monkeys Vocalizations May Be More Sophisticated Than Previously Realized

    New Linguistic Analyses Dates Dravidian Language Family

    Children with Cochlear Implants Learn Words Faster Than Children With Normal Hearing

    Disease Mapping Methods Indicate That Indo-European Languages Originated From Anatolia

    1 Comment

    1. Grig Woods on May 22, 2021 12:27 am

      This is very informative information and I also want to share with you my thoughts about the successful promotion of sites that is very important for the development of the search engine. I advise you to pay attention to the link provided where there is a detailed description of the benefits of SEO optimization for website promotion.

      Reply
    Leave A Reply Cancel Reply

    • Facebook
    • Twitter
    • Pinterest
    • YouTube

    Don't Miss a Discovery

    Subscribe for the Latest in Science & Tech!

    Trending News

    Even Occasional Binge Drinking May Triple Liver Damage Risk

    Liftoff! NASA’s Artemis II Launch Sends Astronauts Around the Moon for First Time in 50 Years

    Scientists Discover New Way To Eliminate “Zombie Cells” Driving Aging

    This New Quantum Theory Could Change Everything We Know About the Big Bang

    This One Vitamin May Help Protect Your Brain From Dementia Years Later

    Stopping Weight-Loss Drugs Like Ozempic Can Quickly Erase Heart Benefits

    A 500-Million-Year-Old Surprise Is Forcing Scientists to Rethink Spider Evolution

    Coffee and Blood Pressure: What You Need To Know Before Your Next Cup

    Follow SciTechDaily
    • Facebook
    • Twitter
    • YouTube
    • Pinterest
    • Newsletter
    • RSS
    SciTech News
    • Biology News
    • Chemistry News
    • Earth News
    • Health News
    • Physics News
    • Science News
    • Space News
    • Technology News
    Recent Posts
    • Tiny 436-Million-Year-Old Fish Fossil Rewrites the Origins of Vertebrates
    • 1,800 Miles Down: Scientists Uncover Mysterious Movements at the Edge of Earth’s Core
    • Scientists Uncover Earth’s Hidden “Gold Kitchen” Beneath the Ocean Floor
    • You Don’t Need To Be Rich: New Study Reveals a Simple Life Is the Real Secret to Happiness
    • “Crazy Dice” Help Scientists Prove Only One 150-Year-Old Theory About Randomness Works
    Copyright © 1998 - 2026 SciTechDaily. All Rights Reserved.
    • Science News
    • About
    • Contact
    • Editorial Board
    • Privacy Policy
    • Terms of Use

    Type above and press Enter to search. Press Esc to cancel.