Close Menu
    Facebook X (Twitter) Instagram
    SciTechDaily
    • Biology
    • Chemistry
    • Earth
    • Health
    • Physics
    • Science
    • Space
    • Technology
    Facebook X (Twitter) Pinterest YouTube RSS
    SciTechDaily
    Home»Technology»Theoretical Breakthrough at MIT Could Boost Data Storage
    Technology

    Theoretical Breakthrough at MIT Could Boost Data Storage

    By Steve Nadis, MIT CSAILNovember 21, 20211 Comment5 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn WhatsApp Email Reddit
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email Reddit
    Computer Data Center
    New discovery related to linear-probing hash tables could improve data storage and retrieval efficiency in computers.

    New work on linear-probing hash tables from MIT CSAIL could lead to more efficient data storage and retrieval in computers.

    A trio of researchers that includes William Kuszmaul — a computer science PhD student at MIT — has made a discovery that could lead to more efficient data storage and retrieval in computers.

    The team’s findings relate to so-called “linear-probing hash tables,” which were introduced in 1954 and are among the oldest, simplest, and fastest data structures available today. Data structures provide ways of organizing and storing data in computers, with hash tables being one of the most commonly utilized approaches. In a linear-probing hash table, the positions in which information can be stored lie along a linear array.

    Suppose, for instance, that a database is designed to store the Social Security numbers of 10,000 people, Kuszmaul suggests. “We take your Social Security number, x, and we’ll then compute the hash function of x, h(x), which gives you a random number between one and 10,000.” The next step is to take that random number, h(x), go to that position in the array, and put x, the Social Security number, into that spot.

    If there’s already something occupying that spot, Kuszmaul says, “you just move forward to the next free position and put it there. This is where the term ‘linear probing’ comes from, as you keep moving forward linearly until you find an open spot.” In order to later retrieve that Social Security number, x, you just go to the designated spot, h(x), and if it’s not there, you move forward until you either find x or come to a free position and conclude that x is not in your database.

    The Tombstone Problem in Deletion

    There’s a somewhat different protocol for deleting an item, such as a Social Security number. If you just left an empty spot in the hash table after deleting the information, that could cause confusion when you later tried to find something else, as the vacant spot might erroneously suggest that the item you’re looking for is nowhere to be found in the database. To avoid that problem, Kuszmaul explains, “you can go to the spot where the element was removed and put a little marker there called a ‘tombstone,’ which indicates there used to be an element here, but it’s gone now.”

    This general procedure has been followed for more than half a century. But in all that time, almost everyone using linear-probing hash tables has assumed that if you allow them to get too full, long stretches of occupied spots would run together to form “clusters.” As a result, the time it takes to find a free spot would go up dramatically — quadratically, in fact — taking so long as to be impractical. Consequently, people have been trained to operate hash tables at low capacity — a practice that can exact an economic toll by affecting the amount of hardware a company has to purchase and maintain.

    But this time-honored principle, which has long militated against high load factors, has been totally upended by the work of Kuszmaul and his colleagues, Michael Bender of Stony Brook University and Bradley Kuszmaul of Google. They found that for applications where the number of insertions and deletions stays about the same — and the amount of data added is roughly equal to that removed — linear-probing hash tables can operate at high storage capacities without sacrificing speed.

    Graveyard Hashing

    In addition, the team has devised a new strategy, called “graveyard hashing,” which involves artificially increasing the number of tombstones placed in an array until they occupy about half the free spots. These tombstones then reserve spaces that can be used for future insertions.

    This approach, which runs contrary to what people have customarily been instructed to do, Kuszmaul says, “can lead to optimal performance in linear-probing hash tables.” Or, as he and his coauthors maintain in their paper, the “well-designed use of tombstones can completely change the … landscape of how linear probing behaves.”

    A Paradigm Shift in Data Structures

    Kuszmaul wrote up these findings with Bender and Kuszmaul in a paper posted earlier this year that will be presented in February at the Foundations of Computer Science (FOCS) Symposium in Boulder, Colorado.

    Kuszmaul’s PhD thesis advisor, MIT computer science professor Charles E. Leiserson (who did not participate in this research), agrees with that assessment. “These new and surprising results overturn one of the oldest conventional wisdoms about hash table behavior,” Leiserson says. “The lessons will reverberate for years among theoreticians and practitioners alike.”

    As for translating their results into practice, Kuszmaul notes, “There are many considerations that go into building a hash table. Although we’ve advanced the story considerably from a theoretical standpoint, we’re just starting to explore the experimental side of things.”

    Reference: “Linear Probing Revisited: Tombstones Mark the Death of Primary Clustering” by Michael A. Bender, Bradley C. Kuszmaul and William Kuszmaul, 2 July 2021, Computer Science > Data Structures and Algorithms.
    arXiv:2107.01250

    Never miss a breakthrough: Join the SciTechDaily newsletter.
    Follow us on Google and Google News.

    Algorithm Computer Science CSAIL MIT
    Share. Facebook Twitter Pinterest LinkedIn Email Reddit

    Related Articles

    CausalSim: MIT’s New Tool for Accurately Simulating Complex Systems

    MIT’s Innovative Volumetric Approach: The Future of 3D Shape Mapping

    Security Tool – Privid – Guarantees Privacy in Surveillance Footage

    Computer Science: How Quickly Do Algorithms Improve?

    New MIT Social Intelligence Algorithm Helps Build Machines That Better Understand Human Goals

    Using Mathematical Theory to Find the True Potential of Algorithms

    New MIT Model Recovers Valuable “Lost Dimensions” of Images and Video

    Algorithms Improve AUV Navigation and Detecting Capabilities

    New Algorithm Enables Wi-Fi Connected Vehicles to Share Data

    1 Comment

    1. John Smith on February 17, 2023 12:33 am

      A hash table is a data structure that is commonly used in computer programming for efficient data retrieval and storage. It is also known as a hash map or dictionary. A hash table organizes data into an array using a hashing function, which generates a unique index for each data item based on its key. This index is used to store and retrieve data in the array. Hash tables offer constant time complexity for retrieval and insertion operations on average, making them ideal for use cases where quick access to data is required, such as in databases, caches, and indexing systems.

      Reply
    Leave A Reply Cancel Reply

    • Facebook
    • Twitter
    • Pinterest
    • YouTube

    Don't Miss a Discovery

    Subscribe for the Latest in Science & Tech!

    Trending News

    New Study Reveals Why Ozempic Works Better for Some People Than Others

    Climate Change Is Altering a Key Greenhouse Gas in a Way Scientists Didn’t Expect

    New Study Suggests Gravitational Waves May Have Created Dark Matter

    Scientists Discover Why the Brain Gets Stuck in Schizophrenia

    Scientists Engineer “Tumor-Eating” Bacteria That Devour Cancer From Within

    Even “Failed” Diets May Deliver Long-Term Health Gains, Study Finds

    NIH Scientists Discover Powerful New Opioid That Relieves Pain Without Dangerous Side Effects

    Collapsing Plasma May Hold the Key to Cosmic Magnetism

    Follow SciTechDaily
    • Facebook
    • Twitter
    • YouTube
    • Pinterest
    • Newsletter
    • RSS
    SciTech News
    • Biology News
    • Chemistry News
    • Earth News
    • Health News
    • Physics News
    • Science News
    • Space News
    • Technology News
    Recent Posts
    • This 15,000-Year-Old Discovery Changes What We Know About Early Human Creativity
    • 35-Million-Year-Old Mystery: Strange Arachnid Discovered Preserved in Amber
    • Revolutionary Gas Turbine Generates Power Without Air Compression
    • Is AI Really Just a Tool? It Could Be Altering How You See Reality
    • JWST Reveals a “Forbidden” Planet With a Baffling Composition
    Copyright © 1998 - 2026 SciTechDaily. All Rights Reserved.
    • Science News
    • About
    • Contact
    • Editorial Board
    • Privacy Policy
    • Terms of Use

    Type above and press Enter to search. Press Esc to cancel.