Close Menu
    Facebook X (Twitter) Instagram
    SciTechDaily
    • Biology
    • Chemistry
    • Earth
    • Health
    • Physics
    • Science
    • Space
    • Technology
    Facebook X (Twitter) Pinterest YouTube RSS
    SciTechDaily
    Home»Technology»“Data Science Machine” Replaces Human Intuition with Algorithms
    Technology

    “Data Science Machine” Replaces Human Intuition with Algorithms

    By Larry Hardesty, Massachusetts Institute of TechnologyOctober 16, 2015No Comments5 Mins Read
    Facebook Twitter Pinterest Telegram LinkedIn WhatsApp Email Reddit
    Share
    Facebook Twitter LinkedIn Pinterest Telegram Email Reddit
    Automating Big-Data Analysis
    MIT engineers have created a system called the “Data Science Machine” that surpasses human intuition with algorithms. Credit: MIT

    Engineers from MIT have developed a new system that replaces human intuition with algorithms. The “Data Science Machine” outperformed 615 of 906 human teams in three recent data science competitions.

    Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which “features” of the data to analyze usually requires some human intuition. In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.

    MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers’ “Data Science Machine” finished ahead of 615.

    In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.

    “We view the Data Science Machine as a natural complement to human intelligence,” says Max Kanter, whose MIT master’s thesis in computer science is the basis of the Data Science Machine. “There’s so much data out there to be analyzed. And right now it’s just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving.”

    Between the lines

    Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper that Kanter will present next week at the IEEE International Conference on Data Science and Advanced Analytics.

    Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in big-data analysis, such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk for dropping out of online courses.

    “What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering,” Veeramachaneni says. “The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas.”

    In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT’s online-learning platform MITx doesn’t record either of those statistics, but it does collect data from which they can be inferred.

    Featured composition

    Kanter and Veeramachaneni use a couple of tricks to manufacture candidate features for data analyses. One is to exploit structural relationships inherent in database design. Databases typically store different types of data in different tables, indicating the correlations between them using numerical identifiers. The Data Science Machine tracks these correlations, using them as a cue to feature construction.

    For instance, one table might list retail items and their costs; another might list items included in individual customers’ purchases. The Data Science Machine would begin by importing costs from the first table into the second. Then, taking its cue from the association of several different items in the second table with the same purchase number, it would execute a suite of operations to generate candidate features: total cost per order, average cost per order, minimum cost per order, and so on. As numerical identifiers proliferate across tables, the Data Science Machine layers operations on top of each other, finding minima of averages, averages of sums, and so on.

    It also looks for so-called categorical data, which appear to be restricted to a limited range of values, such as days of the week or brand names. It then generates further feature candidates by dividing up existing features across categories.

    Once it’s produced an array of candidates, it reduces their number by identifying those whose values seem to be correlated. Then it starts testing its reduced set of features on sample data, recombining them in different ways to optimize the accuracy of the predictions they yield.

    “The Data Science Machine is one of those unbelievable projects where applying cutting-edge research to solve practical problems opens an entirely new way of looking at the problem,” says Margo Seltzer, a professor of computer science at Harvard University who was not involved in the work. “I think what they’ve done is going to become the standard quickly — very quickly.”

    Reference: “Deep Feature Synthesis: Towards Automating Data Science Endeavors” by James Max Kanter and Kalyan Veeramachaneni, 19 October 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).
    DOI: 10.1109/DSAA.2015.7344858
    PDF

    Never miss a breakthrough: Join the SciTechDaily newsletter.
    Follow us on Google and Google News.

    Algorithm Artificial Intelligence Computer Science Computer Technology Engineering Mathematics MIT
    Share. Facebook Twitter Pinterest LinkedIn Email Reddit

    Related Articles

    New Technique Illuminates the Inner Workings of AI Systems

    New System for Performing “Tensor Algebra” Offers Faster Big-Data Analysis

    New Debugging Method Finds 23 Undetected Security Flaws in Popular Web Applications

    New Energy-Friendly Chip Can Perform Powerful AI Tasks

    New Technique Could Enable Chips with Thousands of Cores

    Algorithm Analyzes Information From Medical Images to Identify Disease

    Halide, A New and Improved Programming Language for Image Processing Software

    New Algorithm Enables Wi-Fi Connected Vehicles to Share Data

    New Approach Uses Mathematics to Improve Automated Security Monitoring

    Leave A Reply Cancel Reply

    • Facebook
    • Twitter
    • Pinterest
    • YouTube

    Don't Miss a Discovery

    Subscribe for the Latest in Science & Tech!

    Trending News

    New Study Reveals Why Ozempic Works Better for Some People Than Others

    Climate Change Is Altering a Key Greenhouse Gas in a Way Scientists Didn’t Expect

    New Study Suggests Gravitational Waves May Have Created Dark Matter

    Scientists Discover Why the Brain Gets Stuck in Schizophrenia

    Scientists Engineer “Tumor-Eating” Bacteria That Devour Cancer From Within

    Even “Failed” Diets May Deliver Long-Term Health Gains, Study Finds

    NIH Scientists Discover Powerful New Opioid That Relieves Pain Without Dangerous Side Effects

    Collapsing Plasma May Hold the Key to Cosmic Magnetism

    Follow SciTechDaily
    • Facebook
    • Twitter
    • YouTube
    • Pinterest
    • Newsletter
    • RSS
    SciTech News
    • Biology News
    • Chemistry News
    • Earth News
    • Health News
    • Physics News
    • Science News
    • Space News
    • Technology News
    Recent Posts
    • This 15,000-Year-Old Discovery Changes What We Know About Early Human Creativity
    • 35-Million-Year-Old Mystery: Strange Arachnid Discovered Preserved in Amber
    • Revolutionary Gas Turbine Generates Power Without Air Compression
    • Is AI Really Just a Tool? It Could Be Altering How You See Reality
    • JWST Reveals a “Forbidden” Planet With a Baffling Composition
    Copyright © 1998 - 2026 SciTechDaily. All Rights Reserved.
    • Science News
    • About
    • Contact
    • Editorial Board
    • Privacy Policy
    • Terms of Use

    Type above and press Enter to search. Press Esc to cancel.