Storing and structuring big data in histological research (vertebrates) using a relational database in SQL

  • V. Langraf Constantine the Philosopher University in Nitra
  • R. Babosová Constantine the Philosopher University in Nitra
  • K. Petrovičová University of Agriculture in Nitra
  • J. Schlarmannová Constantine the Philosopher University in Nitra
  • V. Brygadyrenko Oles Honchar Dnipro National University
Keywords: histology; big data; SSMS; structure database; data quality; data type.

Abstract

Database systems store data (big data) for various areas dealing with finance (banking, insurance) and are also an essential part of corporate firms. In the field of biology, however, not much attention has been paid to database systems, with the exception of genetics (RNA, DNA) and human protein. Therefore data storage and subsequent implementation is insufficient for this field. The current situation in the field of data use for the assessment of biological relationships and trends is conditioned by constantly changing requirements, while data stored in simple databases used in the field of biology cannot respond operatively to these changes. In the recent period, developments in technology in the field of histology caused an increase in biological information stored in databases with which database technology did not deal. We proposed a new database for histology with designed data types (data format) in database program Microsoft SQL Server Management Studio. In order that the information to support identification of biological trends and regularities is relevant, the data must be provided in real time and in the required format at the strategic, tactical and operational levels. We set the data type according to the needs of our database, we used numeric (smallint,numbers, float), text string (nvarchar, varchar) and date. To select, insert, modify and delete data, we used Structured Query Language (SQL), which is currently the most widely used language in relational databases. Our results represent a new database for information about histology, focusing on histological structures in systems of animals. The structure and relational relations of the histology database will help in analysis of big data, the objective of which was to find relations between histological structures in species and the diversity of habitats in which species live. In addition to big data, the successful estimation of biological relationships and trends also requires the rapid accuracy of scientists who derive key information from the data. A properly functioning database for meta-analyses, data warehousing, and data mining includes, in addition to technological aspects, planning, design, implementation, management, and implementation.

References

Baxevanis, A. D. (2011). The importance of biological databases in biological discovery. Current Protocols in Bioinformatics, 34(1), 111–116.
Benson, D. A., Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., & Sayers, E. W. (2014). GenBank. Nucleic Acids Research, 42, 32–37.
Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T., & Tasumi, M. (1977). The protein data bank: A computer-based archival file for macromolecular structures. Journal of Molecular Biology, 112(3), 535–542.
Birney, E. (2004). Biological database design and implementation. Briefings in Bioinformatics, 5(1), 31–38.
Bogen, J. (2008). Experiment and observation. In: Machamer, P., & Silberstein, M. (Eds.). The Blackwell guide to the philosophy of science. Blackwell Publishers Ltd. Pp. 128–148.
Bourne, P. (2005). Will a biological database be different from a biological journal? PLoS Computational Biology, 1(3), e34.
Bradley, A. R., Rose, A. S., Pavelka, A., Valasatava, Y., Duarte, J. M., Prlić, A., & Rose, P. W. (2017). MMTF – an efficient file format for the transmission, visualization, and analysis of macromolecular structures. PLOS Computational Biology, 13(6), e1005575.
Burge, S. W., Daub, J., Eberhardt, R., Tate, J., Barquist, L., Nawrocki, E. P., Eddy, S. R., Gardner, P. P., & Bateman, A. (2013). Rfam 11.0: 10 years of RNA families. Nucleic Acids Research, 41, 226–232.
Canali, S. (2019). Evaluating evidential pluralism in epidemiology: Mechanistic evidence in exposome research. History and Philosophy of the Life Sciences, 41, 4.
Dalmaris, E., Avramidou, E. V., Xanthopoulou, A., & Aravanopoulos, F. A. (2020). Dataset of targeted metabolite analysis for five taxanes of hellenic Taxus baccata L. populations. Data, 5(1), 22.
Dietrich, M. R., Ankeny, R. A., & Chen, P. M. (2014). Publication trends in model organism research. Genetics, 198(3), 787–794.
Duggirala, S. (2018). NewSQL databases and scalable in-memory analytics. Advances in Computers, 109, 49–76.
Duigou, T., du Lac, M., Carbonell, P., & Faulon, J. L. (2019). RetroRules: A database of reaction rules for engineering biology. Nucleic Acids Research, 47, 1229–1235.
Elliott, K. C., Cheruvelil, K. S., Montgomery, G. M., & Soranno, P. A. (2016). Conceptions of good science in our data-rich world. BioScience, 66(10), 880–889.
Fazekas, D., Koltai, M., Türei, D., Módos, D., Pálfy, M., Dúl, Z., Zsákai, L., Szálay-Bekö, M., Lenti, K., Farkas, I. J., Vellai, T., Csermely, P., & Korcsmáros, T. (2013). SignaLink 2 – a signaling pathway resource with multi-layered regulatory networks. BMC Systems Biology, 7(1), 7.
Feld, C. K., Sousa, J. P., da Silva, P. M., & Dawson, T. P. (2010). Indicators for biodiversity and ecosystem services: Towards an improved framework for ecosystems assessment. Biodiversity and Conservation, 19(10), 2895–2919.
Gharajeh, M. S. (2017). A learning analytics approach for job scheduling on cloud servers. In: Peña-Ayala, A. (Ed.). Learning analytics: Fundaments, applications, and trends. Springer, Cham. Vol. 94. Pp. 269–302.
Gharajeh, M. S. (2018). Biological big data analytics. Advances in Computers, 109, 321–355.
Illari, P., & Floridi, L. (2014). Information quality, data and philosophy. In: Floridi, L., & Illari, P. (Eds.). The philosophy of information quality. Berlin, Springer. Pp. 5–23.
Kashyap, H., Ahmed, H. A., Hoque, N., Roy, S., & Bhattacharyya, D. K. (2015). Big data analytics in bioinformatics: A machine learning perspective. Journal of Latex Class Files, 13(9), 1–20.
Kinjo, A. R., Bekker, G. J., Suzuki, H., Tsuchiya, Y., Kawabata, T., Ikegawa, Y., & Nakamura, H. (2017). Protein Data Bank Japan (PDBJ): Updated user interfaces, resource description framework, analysis tools for large structures. Nucleic Acids Research, 45(1), 282–288.
Leonelli, S. (2012). When humans are the exception: Cross-species databases at the interface of biological and clinical research. Social Studies of Science, 42(2), 214–236.
Leonelli, S. (2017). Global data quality assessment and the situated nature of “best” research practices in biology. Data Science Journal, 16, 32.
Leonelli, S. (2020). Scientific research and big data. In: Edward, N. Z. (Ed.). The Stanford encyclopedia of philosophy. Stanford University, Stanford.
Leonelli, S., & Ankeny, R. A. (2012). Re-thinking organisms: the impact of databases on model organism biology. Studies in History and Philosophy of Science, 43(1), 29–36.
Leonelli, S., & Tempini, N. (2018). Where health and environment meet: The use of invariant parameters in big data analysis. Synthese, 198(10), 2485–2504.
Nickles, T. (2018). Alien reasoning: Is a major change in scientific research underway? Topoi, 39(4), 901–914.
Pejić Bach, M., Bertoncel, T., Meško, M., Suša Vugec, D., & Ivančić, L. (2020). Big data usage in European countries: Cluster analysis approach. Data, 5(1), 25.
Pietsch, W. (2015). The causal nature of modeling with big data. Philosophy and Technology, 29(2), 137–171.
Raj, P. (2018). A detailed analysis of NoSQL and NewSQL databases for big data analytics and distributed computing. Advances in Computers, 109, 1–48.
Ratti, E. (2015). Big data biology: Between eliminative inferences and exploratory experiments. Philosophy of Science, 82(2), 198–218.
Sarita, S., Kumar, G. S., Anuradaha, N., Sanjay, K., Rajendra, N., Kishore, S. P., & Kumar, P. K. (2010). Comparative modeling study of the 3-D structure of small delta antigen protein of hepatitis delta virus. Journal of Computer Science and Systems Biology, 3(1), 47.
Shanthi, V., Ramanathan, K., & Sethumadhavan, R. (2009). Role of the cation-π interaction in therapeutic proteins: A comparative study with conventional stabilizing forces. Journal of Computer Science and Systems Biology, 2(1), 51–68.
Shavit, A., & Griesemer, J. (2009). There and back again, or the problem of locality in biodiversity surveys. Philosophy of Science, 76(3), 273–294.
Silva, Y. N., Dietrich, S. W., Reed, J. M., & Tsosie, L. M. (2014). Integrating big data into the computing curricula. In: SIGCSE '14: Proceedings of the 45th ACM technical symposium on computer science education. Machinery, Ney York. Pp. 139–144.
Sterner, B., & Franz, N. M. (2017). Taxonomy for humans or computers? Cognitive pragmatics for big data. Biological Theory, 12(2), 99–111.
The Gene Ontology Consortium (2019). The gene ontology resource: 20 years and still going strong. Nucleic Acids Research, 47(1), 330–338.
Published
2022-07-18
How to Cite
Langraf, V., Babosová, R., Petrovičová, K., Schlarmannová, J., & Brygadyrenko, V. (2022). Storing and structuring big data in histological research (vertebrates) using a relational database in SQL . Regulatory Mechanisms in Biosystems, 13(3), 207-212. Retrieved from https://medicine.dp.ua/index.php/med/article/view/813