Assignment Sample on Databases for Large Datasets
Introduction
Due to the development and increased use of various web based applications, technologies like distributed computing, cloud computing, IoT etc has made the volume of data that are collected daily to a large amount and storing those data in a database based on the information collected is a big challenge. There are two types of data bases that are in use now relational database and NoSQL database. To store the information in a structured form relational databases are used, the information that are not in a structured form can be stored in the NoSQL database. The information like feedback from the customer, data that are collected from the web pages or web application etc which does not have definite format can be stored in the NoSQL database since. At present distributed storage is playing an important role and it acts as an alternate solution to the structure relational database with good scalability, performance and availability of the information when ever required. The information that are stored in the distribute database is a non relational database model to store the data.
Commercial database systems
There are various database systems that are available among which some of them are commercially available in the market like (Dawan, 2021)Oracle, IBM DB2, Microsoft SQL Server, SAP Sybase ASE, Teradata, ADABAS, MySQL, FileMaker, Microsoft Access, Informix, SQLite, PostgreSQL, Amazon RDS, MongoDB, HBase, Cassandra DB, Riak, CouchDB, Oracle NoSQL Database etc.
Comparison between Relational and NoSQL database
Relational database has only one form based on the relational model and all the application that is using the relational database should follow the fixed database schema. In the relational database model the application conform the needs of the database. Examples of the relational database are MySQL, Microsoft SQL Server, Postgres, IBM DB2, Oracle etc. Relational database are available in open source as well as proprietary (Licensed). But most of the NoSQL databases are open source. The relational databases with licensed version will scale better when compared with the open source MySQL database. NoSQL databases provide a good opportunity for the researches to investigate about the various features of the database and also a cheaper storage when compared with the proprietary models of the database. In relational database to make it more efficient it should be scale up by upgrading the server which will increase the effort that are required by an administrator to upgrade the relational database it also need to upgrade the hardware (Douglas Kunda, 2017). While upgrading there are various challenges that should be faced like the amount of RAM that can be supported by a hardware has a fixed value even thou the relational database has the ability to scale the limitation in the hardware will not allow the database scalability. But in NoSQL it follows a horizontal scalability which uses the commodity server’s. It will not affect any hardware limitation since it will combine small, cheaper and powerful server and provide a high scalability instead of having only one server. It is also easy to implement virtual machines which can be added or deleted without the performance upgradation of the database. Relational databases need some good investment so that they can benefit from various new and advanced features along with additional cost for hardware upgrades. So it is more expensive. NoSQL are mostly open source and it is cheaper. It also has the ability to use it as a virtual machine so the cost of maintenance is also reduced. The increased usage of web and the volume of data that are to be handled in a real time environment is increased. Relational database has failed to handle large volume of data based on big data. NoSQL handles the large volume of information that is collected from the various applications in the internet excellently. There is an increased number of user and most of them are spending more time in accessing the information from the various websites, social media sites, ecommerce applications and cloud storage. If the information is stored in a relational database management even with very powerful server it will suffer from single point of failure (S.Sharma, 2016). Availability of the data is another limitation due to scale up. But in the distributed database like NoSQL availability of information to the users is all the time even if there is a hardware failure. It guarantees a continuous access of information even during failure of systems. The time taken to process the information is high while NoSQL will complete the same process quickly. The information is retrieved from the volatile memory in NoSQL database so it has an increased performance where as in relational database management the information is retrieved from non-volatile memory. When the information that are created are of complex data and to convert those into the table form is difficult to store using relational databases but in NoSQL it is possible to store both semi structure and unstructured data and it also provide flexibility to store information in multiple varieties of data even in raw state without loss of the information. In relational database the only data manipulation language that is used is SQL which provide a strong foundation to the SQL database which is lacking in the NoSQL. NoSQL has its own data manipulation for each implementation which takes more time for the developers to learn and develop the model. Relational database also offers a strong consistency due to the strict database schema but in NoSQL it provides good availability (J.Kepner, 2016) but it is poor in consistency. The various security challenges like SQL injection, cross site scripting etc are faced by relational database model even thou it has a strong security mechanisms. But in NoSQL it does not have any security mechanisms, the security is handled by a middleware and it is more vulnerable to attacks.
Case study
DyanamoDB
Amazon’s DynamoDB is a NoSQL database which is based on the key-value and document database. Most of the world’s fastest growing business like Redfin, Samsung, Toyota, Capital One, Airbnb etc are using DynamoDB since it support mission-criticla workload. This database is used to handle large scale of application that handle large amount of data it provie consistent, single digit millisecond response time. The application that are build using the NoSQL database have an unlimited throughput and storage which is not possible if we use a relational database model. It is a serverless and automatically scale up and down the tables and adjust the capacity and performance. It provides on-demand capacity modes (Anon., 2021).
SQL database
SQL database like Oracle, SQL, SQLite etc can be more suitable for application oriented development where the details about the data that will be collected from the customer and the maximum and minimum size of storage required to store the information in the database the details about various fields the data is required etc are known in advanced. The relational database models are more useful in this situation. It can be used in application that has a structured data and all the data will be collected in the required format. It is most suitable for application that is developed based on well structured data. The information that are stored using SQL database is more secure when compared with that of NoSQL database. Retrieval of information is done using the SQL queries only.
MongoDB
At present the use of smartphone in the business and its application has made millions of users and various applications. Handling a large amount of information and performing large amount of transactions simultaneous is not possible using the RDBMS. MongoDB provides a cost effective mobile application for various fields like financial sector, healthcare sector etc. The database that are created using MongoDB are more flexible and it is also rich in query functionality. Enterpise like automatic data processing are using MongoDB and there are more than 41000 clients who are managing their applications using MongoDB database. The various other applications that are implemented using the MongoDB are the weather channels, IoT, Bosch etc are building application using MongoDB.
Conclusion
The advantages and disadvantages of SQL and NoSQL are described briefly. At present most of the applications are developed using NoSQL database since it is more suitable for storing large amount of information and analysis of information that are stored using the NoSQL data is also easy. Relational database is used only in developing the application that require a limited number of storage space and the structure of the data should be known while developing the application. But at present most of the information that are collected from the web application does not have a predefined structure and analysis are to be done based on the huge volume of information that are collected which is not possible using Relational database so NoSQL databases are used in web based application where a large amount of data is being collected and it also does not have any structure or meaningful information. Raw data is stored in the NoSQL it has the capacity to handle large volume of information with less cost.
Bibliography
Anon., 2021. Amazon DynamoDB. [Online]
Available at: https://aws.amazon.com/dynamodb/
[Accessed 12 April 2021].
Dawan, M., 2021. Quora. [Online]
Available at: https://www.quora.com/What-are-commercial-database-management-systems/answer/Mehak-Dhawan-40
[Accessed 12 April 2021].
Douglas Kunda, H. P., 2017. A Comparative Study of NoSQL and Relational Database. Sambia Information Communication Technology (ICT) Journal, pp. 1-4.
J.Kepner, D. H. H. T. M. S. A., 2016. Associative Array Model of SQL, NoSQL and NewSQL databases. s.l.:s.n.
Know more about UniqueSubmission’s other writing services: