RELATIONSHIP BETWEEN BIG DATA AND DECISION SUPPORT SYSTEMS

This article aimed to identify relationships between Big Data and Decision Support Systems. For this, we conducted a search in the Scopus database and as a result, we identified a report according to the increased frequency of publications, frequency of publications in journals and, using the VOSviewer software, we performed an analysis of words co-citation. We identified 5 groups of keywords that suggest different areas of study (e


INTRODUCTION
Big Data can be defined as a generic term for any collection of large, complex datasets that are difficult to store, process, analyze, and understand using traditional database processing tools (Huang & Chaovalitwongse, 2015). Big Data has emerged as a paradigmatic shift on how organizations make decisions (Mortenson, Doherty, & Robinson, 2015). There is a common acceptance that Big data can be adopted at strategic, tactical, and operational levels, thus improving existing decision-making practices (Wamba, Ngai, Riggings, & Akter, 2017).
The term "Big Data" has become hugely popular in the business world in recent years (Gupta, Chen, Hazen, Kaurd, & Gonzalez, 2018). Its use is either to support decision making or to make automated decisions (Davenport & Dyche, 2013). Through Decision Support Systems (DSS), it is possible to process large volumes of data using output models and outputs with interfaces that increasingly permeate the professions with a high level of knowledge (Constantiou & Kallinikos, 2014).
From this context, we try to understand what relations Big Data have with DSS. Our goal was to understand how the use of Big Data contributes to DSS in different aspects of use, given its great applicability. To do this a literature mapping was performed, this being a variation of systematic literature review, used in this study to understand the interaction between keywords and to answer the question of research that guided this study: what Big Data relationships have with DSS?
We performed a search in the Scopus database and analyzed the data of 93 articles through the VOSviewer software. We identified 5 clusters through network analysis, recent studies with 'machine learning' and 'sustainable development' terms related to Big Data and DSS, and a strong relationship of the term 'artificial intelligence' to the search terms used in this research.
Our study is organized as follows: We present a brief theoretical basis in item 2, then we describe the methodological procedure and data collection in item 3, then we present the results in item 4, and finally the conclusion in item 5 along with suggestions of future research opportunities in depth to confirm our findings as well as to increase the understanding about the topic that has been presented as very relevant for society.

THEORETICAL BACKGROUND
As highlighted by Laney (2001), Big Data has three primary characteristics: volume, velocity and variety, and later complemented by SAS (https://www.sas.com/en_us/insights/big-data/what-is-big-data.html) with the characteristics of variability and complexity. These dimensions required more powerful computers, an ubiquitous network, and algorithms capable of connecting datasets in order to make possible analyzes that would not be possible until then, and this convergence made possible the commercial application of data Science (Provost & Fawcett, 2013). Consequently, this has driven a new generation of technologies to extract value from Big Data (Gantz & Reinsel, 2011).
According to Manyika et al. (2011), Big Data can play a significant economic role for the benefit not only of private trade, but also of national economies and their citizens. The authors also highlight the value generated by the use of Big Data in five domains: health, public sector, retail, manufacturing and global personal location data. There is enormous potential in big data analytics in the healthcare industry, especially with regard to a better understanding of strategic implications (Wang, Kung, & Byrd, 2018). Modern systems produce collections of data sets so large and complex that it is impossible to store and process them manually (Burattin, Cimitile, Maggi, & Sperduti, 2015). Analytical techniques through the DSS can provide for example, annotations, medical prescriptions, images and laboratory information (Raghupathi & Raghupathi, 2014).
The use of big data also helps decision making in public policies by combining geographical information with health care data, such as consumption of tobacco, alcohol and economic factors, public managers can map areas most likely to experience social problems (Fredriksson, 2018). Big Data processing has also been studied as a business process monitoring and analysis system (Vera-Baquero, Colomo-Palacios, Molloy & Elbattah, 2015), logistics, service and planning (Brinch, Stentoft, Jensen & Rajkumar, 2018). On the other hand, there is a discussion about the reduction of privacy that occurs in highly centralized control environments, where governments can interfere with citizens' freedom (Power, 2016).
To capture the value of big data, techniques and technologies are needed that use a series of disciplines such as mathematics and statistics (Chen & Zhang, 2014) through computational tools (Fayyad, Piatetsky-Shapiro and Smyth (1996). And Big Data analysis must include the phases of data generation, acquisition, storage and analysis (Chen, Mao & Liu (2014).
Machine learning, artificial intelligence, and cognitive computing are dominating conversations about how advanced analytics can provide companies, for example, with a competitive advantage for business. Machine Learning is a form of Artificial Intelligence that allows a system to learn from data rather than explicit programming, and enables data scientists and business analysts to make predictions based on databased analytical models (Hurwitz & Kirsch, 2018).

DATA AND METHODOLOGY
For this study, we used the Scopus database because it is one of the main databases in the area of Social Sciences. We performed the research on 05/30/2019 with the Boolean term: "Big Data" AND ("Decision Theory" or "decision support system*" or "decision-support system*"). The use of the '*' symbol was so that plurals, gerunds or nouns were also identified in the search. As an initial restriction, we limited the search so that the words were contained in title, abstract, or keywords.
As a result, we obtained 1,031 results that were reduced to 243 results after the filtering of areas (limiting to Business, Management and Accounting, Decision Sciences; Social Sciences) and after a new filter was limited to articles only, we obtained 93 results. These papers underwent an individual exploratory analysis with title reading, summary and when necessary, reading directed to the article to identify the possible contributory character to answer our research question.
To help us with the research question, we use the grouping technique using the VOSviewer software. Grouping techniques play a prominent role in bibliometric research by providing clusters of authors, publications and related keywords (Van Eck & Waltman, 2017). This analysis conditioned us to understand the most relevant connections between the articles identified in the research through the relationship between the keywords of the works. This method is described by Zupic and Čater (2015) as useful in a phase prior to a literature review, for example, as it guides the researcher to the fields of research by reducing subjective bias. Thus, we perform cooccurrence analyzes of keywords through three ways of visualization via VOSviewer: network, overlap and density. The methodological procedures followed in this study can be visualized in Figure 1.

RESULTS
Initially, we analyzed the frequency of publications of the 93 selected papers, as can be seen in Figure   We also count the frequency of works per journal over the period, as presented in Table 1 and we can verify that they were "Journal Of Decision Systems" and "Decision Support Systems" with 9 and 7 papers, respectively. Of the total number of Journals with publications registered in the research, 13 represented 48.39% of the sample (these with more than one registered work), the other Journals had only one work registered, thus representing 51.61%, and were not mentioned in the table. We note that the journals could be classified into four different editorial scopes: decision making, sustainability and the environment, operational research and intelligent and information systems. This shows us different areas of research that are being discussed in the studied subjects and consequently different applications of concepts related to the areas. Table 2 shows the frequency that the keywords were cited in the analyzed works. As an exclusion criterion, we chose to filter out keywords that had been quoted at least four times, and as a result 32 keywords from a total of 931 were recorded, which were cited 389 times out of a total of 1,400, representing 27.79% of total citations.
It should be considered that the keywords identified with IDs 2 and 5 are separated only by the fact that the first one is in the plural and the second in the singular, but could be considered as the same keyword and in this case it would be the most frequent word (n = 75), the same occurred with the keywords in ID 8 and 10, but we opted to maintain the originality of the data extracted from the software. The 32 most frequently used keywords in the papers therefore represent 27.79% of the total frequency of keywords.
Some terms that are quite recurring have caught our attention, e.g. data mining, which is a set of techniques to extract valuable information from the data and which may involve methods and statistics from Machine Learning (Chen & Zhang, 2014) and cloud computing, which has a strong relationship with Big Data because Big Data offers users the ability to use computing to process queries distributed across multiple data sets and return the resulting sets in a timely manner (Hashem et al., 2015). From the analysis of frequency of keywords, we analyze the grouping of these by means of the network visualization as shown in Figure 3. From this verification, we can notice the formation of 5 groupings discriminated, these identified with the colors red, blue, green, purple and yellow.

Figure 3. Networking of keywords
After the network visualization, we perform the keyword overlapping visualization, which organizes them according to the frequency identified in a time horizon so that the yellow words are the most recent and as they darken characterize the frequency in an older period (Figure 4).

Figure 4. Keyword overlay view
Finally, we perform density analysis, which intensifies color (with warm colors) according to frequency and relation to other words ( Figure 5).

Figure 5. Keyword density view.
From the descriptive analysis we could verify that for the Boolean term used, 5 groups were identified that characterize large areas: the cluster identified in red (Figure 3) suggests studies directed at the areas of health, logistics and IT as well, it is possible to infer that cloud computing and business intelligence technologies and analytical techniques are related to this group that also has the expression related to decision theory as a recurrent.
The cluster identified with the blue color ( Figure 3) suggests studies aimed at decision-making systems/processes, knowledge management and data visualization techniques. In this cluster, no specific area was predominant to the point of standing out. The grouping identified with the green color indicates studies aimed at data mining techniques, data handling, artificial intelligence and predictive analysis. The areas that stood out were social media and online social networks.
The small cluster identified with the purple color indicates studies aimed at decision support systems oriented towards sustainable development (word extended for space and omitted by the software). Finally, the cluster identified with the yellow color, also small, indicates studies aimed at strategic decision making via machine learning (word extended for space and omitted by the software).
We could also highlight, from the verification of Figure 4, that recent studies are directed towards sustainable development and machine learning (words below in the image and that were omitted by the software due to the little space for the legend), words that close in the image indicate that studies may even be addressing both subjects simultaneously. On the right side of Figure 4 we can still see that analytical techniques may be discussed simultaneously with decision theory, and above but in a slightly more isolated way in a chronological aspect, decision-making processes are also shown as a recent term in studies.
Finally, through the visualization of keyword density ( Figure 5), we noticed a strong density and centralization of the words Big Data, artificial intelligence and decision making, indicating a high capillarity in the studies in general.

CONCLUSION
This article aimed to identify the relationships between Big Data and DSS; and to achieve this objective a search in the Scopus database was carried out and interpreted using the VOSviewer software. Initially, we analyzed the frequency of publications of the research carried out, as well as journals with the respective frequency of published works, enabling us to understand the scope of the main journals and thereby relate to the published works. As a result, we have seen exponential growth since 2014 and highlight "Journal of Decision Systems" and "Decision Support Systems" as the journals with the most published work.
Next, we identified 5 groups of keywords that suggest different areas of study (e.g. logistics, health and social media), as well as a more recent focus on studies aimed at sustainable development, machine learning, analytical techniques and decision-making processes decision. An important contribution that should also be highlighted was the strong relationship between the keywords Big Data, artificial intelligence and decision making, suggesting studies involving the three terms in a large number of works.
As a limitation to the study we can highlight that the combination of the term Boolean directs the frequency of keywords extracted from the database, so a different combination can result in different relationships between the keywords, so we cannot gauge that the results found in this research truly reflect the relationships between surveys. To mitigate this uncertainty, as a suggestion of future studies, we propose an extension to this preliminary bibliometric analysis by means of citation and co-citation analyzes of the works.
Citations, according to Zupic and Čater (2015) are used as a measure of influence, so that if an article is very quoted it is considered important based on the assumption that the authors cite documents that they consider important for their work. Co-citations, defined as the frequency with which two units are cited together (Small, 1993) indicate that the more items are quoted together, the more likely the content is related to them (Zupic & Čater, 2015). Thus, it would be possible to support the findings of this work, to relate relevant authors to the identified areas and also to condition a systematic review of the literature for a deepening of the identified factors, as suggested by (Zupic & Čater, 2015).