Skip to content

Database

FullstackCodingGuy edited this page Jan 11, 2024 · 3 revisions

What is DBMS?

A DBMS (Database Management System) is a software application that interacts with end-users, other applications, and the database itself to capture and analyze the data. A DBMS is a collection of programs that enables you to store, modify, and extract information from a database. It provides an interface for users to interact with the database and perform various operations like creating, reading, updating, and deleting data.

The DBMS is responsible for managing the database and ensuring that data is stored and retrieved efficiently, accurately, and securely. It also provides tools for data backup and recovery, data integrity, and data security. A DBMS can include additional functionality such as data validation, data access control, and reporting.

There are many different types of DBMS, each with its own strengths and weaknesses, and each suitable for different types of applications and environments. Examples include relational DBMS, object-oriented DBMS, NoSQL DBMS, and graph DBMS.

Types of DBMS

  • Relational DBMS (RDBMS)
  • Object-oriented DBMS (OODBMS)
  • Hierarchical DBMS (HDBMS)
  • Network DBMS (NDBMS)
  • NoSQL DBMS
  • Document-oriented DBMS
  • Column-family DBMS
  • Key-value databases
  • Graph DBMS
  • In-memory DBMS
  • Time-series DBMS

image image

Cloud Databases

image

NoSQL Database

What Problems NoSQL Databases Solve?

Bigness.

NoSQL is seen as a key part of a new data stack supporting: big data, big numbers of users, big numbers of computers, big supply chains, big science, and so on. When something becomes so massive that it must become massively distributed, NoSQL is there, though not all NoSQL systems are targeting big. Bigness can be across many different dimensions, not just using a lot of disk space.

Massive write performance.

This is probably the canonical usage based on Google's influence. High volume. Facebook needs to store 135 billion messages a month. Twitter, for example, has the problem of storing 7 TB/data per day with the prospect of this requirement doubling multiple times per year. This is the data is too big to fit on one node problem. At 80 MB/s it takes a day to store 7TB so writes need to be distributed over a cluster, which implies key-value access, MapReduce, replication, fault tolerance, consistency issues, and all the rest. For faster writes in-memory systems can be used.

Fast key-value access.

This is probably the second most cited virtue of NoSQL in the general mind set. When latency is important it's hard to beat hashing on a key and reading the value directly from memory or in as little as one disk seek. Not every NoSQL product is about fast access, some are more about reliability, for example. but what people have wanted for a long time was a better memcached and many NoSQL systems offer that.

Flexible schema and flexible datatypes.

NoSQL products support a whole range of new data types, and this is a major area of innovation in NoSQL. We have: column-oriented, graph, advanced data structures, document-oriented, and key-value. Complex objects can be easily stored without a lot of mapping. Developers love avoiding complex schemas and ORM frameworks. Lack of structure allows for much more flexibility. We also have program and programmer friendly compatible datatypes likes JSON.

Schema migration.

Schemalessness makes it easier to deal with schema migrations without so much worrying. Schemas are in a sense dynamic, because they are imposed by the application at run-time, so different parts of an application can have a different view of the schema.

Write availability.

Do your writes need to succeed no mater what? Then we can get into partitioning, CAP, eventual consistency and all that jazz.

Easier maintainability, administration and operations.

This is very product specific, but many NoSQL vendors are trying to gain adoption by making it easy for developers to adopt them. They are spending a lot of effort on ease of use, minimal administration, and automated operations. This can lead to lower operations costs as special code doesn't have to be written to scale a system that was never intended to be used that way.

No single point of failure.

Not every product is delivering on this, but we are seeing a definite convergence on relatively easy to configure and manage high availability with automatic load balancing and cluster sizing. A perfect cloud partner. Generally available parallel computing. We are seeing MapReduce baked into products, which makes parallel computing something that will be a normal part of development in the future.

Programmer ease of use.

Accessing your data should be easy. While the relational model is intuitive for end users, like accountants, it's not very intuitive for developers. Programmers grok keys, values, JSON, Javascript stored procedures, HTTP, and so on. NoSQL is for programmers. This is a developer led coup. The response to a database problem can't always be to hire a really knowledgeable DBA, get your schema right, denormalize a little, etc., programmers would prefer a system that they can make work for themselves. It shouldn't be so hard to make a product perform. Money is part of the issue. If it costs a lot to scale a product then won't you go with the cheaper product, that you control, that's easier to use, and that's easier to scale? Use the right data model for the right problem. Different data models are used to solve different problems. Much effort has been put into, for example, wedging graph operations into a relational model, but it doesn't work. Isn't it better to solve a graph problem in a graph database? We are now seeing a general strategy of trying find the best fit between a problem and solution.

Avoid hitting the wall.

Many projects hit some type of wall in their project. They've exhausted all options to make their system scale or perform properly and are wondering what next? It's comforting to select a product and an approach that can jump over the wall by linearly scaling using incrementally added resources. At one time this wasn't possible. It took custom built everything, but that's changed. We are now seeing usable out-of-the-box products that a project can readily adopt.

Distributed systems support.

Not everyone is worried about scale or performance over and above that which can be achieved by non-NoSQL systems. What they need is a distributed system that can span datacenters while handling failure scenarios without a hiccup. NoSQL systems, because they have focussed on scale, tend to exploit partitions, tend not use heavy strict consistency protocols, and so are well positioned to operate in distributed scenarios.

Tunable CAP tradeoffs.

NoSQL systems are generally the only products with a "slider" for choosing where they want to land on the CAP spectrum. Relational databases pick strong consistency which means they can't tolerate a partition failure. In the end this is a business decision and should be decided on a case by case basis. Does your app even care about consistency? Are a few drops OK? Does your app need strong or weak consistency? Is availability more important or is consistency? Will being down be more costly than being wrong? It's nice to have products that give you a choice.

Some Use Cases of NoSQL

  • Managing large streams of non-transactional data: Apache logs, application logs, MySQL logs, clickstreams, etc.
  • Syncing online and offline data. This is a niche CouchDB has targeted.
  • Fast response times under all loads.
  • Avoiding heavy joins for when the query load for complex joins become too large for a RDBMS.
  • Soft real-time systems where low latency is critical. Games are one example.
  • Applications where a wide variety of different write, read, query, and consistency patterns need to be supported. There are systems optimized for 50% reads 50% writes, 95% writes, or 95% reads. Read-only applications needing extreme speed and resiliency, simple queries, and can tolerate slightly stale data. Applications requiring moderate performance, read/write access, simple queries, completely authoritative data. Read-only application which complex query requirements.
  • Load balance to accommodate data and usage concentrations and to help keep microprocessors busy.
  • Real-time inserts, updates, and queries.
  • Hierarchical data like threaded discussions and parts explosion.
  • Dynamic table creation.
  • Two tier applications where low latency data is made available through a fast NoSQL interface, but the data itself can be calculated and updated by high latency Hadoop apps or other low priority apps.
  • Sequential data reading. The right underlying data storage model needs to be selected. A B-tree may not be the best model for sequential reads.
  • Slicing off part of service that may need better performance/scalability onto it's own system. For example, user logins may need to be high performance and this feature could use a dedicated service to meet those goals.
  • Caching. A high performance caching tier for web sites and other applications. Example is a cache for the Data Aggregation System used by the Large Hadron Collider.
  • Voting.
  • Real-time page view counters.
  • User registration, profile, and session data.
  • Document, catalog management and content management systems. These are facilitated by the ability to store complex documents has a whole rather than organized as relational tables. Similar logic applies to inventory, shopping carts, and other structured data types.
  • Archiving. Storing a large continual stream of data that is still accessible on-line. Document-oriented databases with a flexible schema that can handle schema changes over time.
  • Analytics. Use MapReduce, Hive, or Pig to perform analytical queries and scale-out systems that support high write loads.
  • Working with heterogenous types of data, for example, different media types at a generic level.
  • Embedded systems. They don’t want the overhead of SQL and servers, so they uses something simpler for storage.
  • A "market" game, where you own buildings in a town. You want the building list of someone to pop up quickly, so you partition on the owner column of the building table, so that the select is single-partitioned. But when someone buys the building of someone else you update the owner column along with price.
  • JPL is using SimpleDB to store rover plan attributes. References are kept to a full plan blob in S3.
  • Federal law enforcement agencies tracking Americans in real-time using credit cards, loyalty cards and travel reservations.
  • Fraud detection by comparing transactions to known patterns in real-time.
  • Helping diagnose the typology of tumors by integrating the history of every patient.
  • In-memory database for high update situations, like a web site that displays everyone's "last active" time (for chat maybe). If users are performing some activity once every 30 sec, then you will be pretty much be at your limit with about 5000 simultaneous users.
  • Handling lower-frequency multi-partition queries using materialized views while continuing to process high-frequency streaming data.
  • Priority queues.
  • Running calculations on cached data, using a program friendly interface, without have to go through an ORM.
  • Unique a large dataset using simple key-value columns.
  • To keep querying fast, values can be rolled-up into different time slices.
  • Computing the intersection of two massive sets, where a join would be too slow.
  • A timeline ala Twitter.

References

https://databasetown.com/7-types-of-dbms-with-examples/ https://www.mongodb.com/databases/types

Clone this wiki locally