Tech

An Overview of NoSQL Databases

What it is, what it does, and how it came about

The acronym NoSQL was coined in 1998. Many people think that NoSQL is a derogatory term designed to attack SQL. Actually, the term doesn’t just mean SQL. The idea is that both technologies can coexist and each has its own place. The NoSQL movement has been in the news over the past few years as many Web 2.0 leaders have adopted NoSQL technologies. Companies like Facebook, Twitter, Digg, Amazon, LinkedIn, and Google all use NoSQL in one way or another. Let’s break down NoSQL so you can explain it to your CIO or colleague.

NoSQL was born out of necessity.

Information Repository: Digital data stored around the world is measured in exabytes. An exabyte is one billion gigabytes (GB) of data. According to Internet.com, the amount of stored data added in 2006 was 161 exabytes. In just four years, in 2010, the amount of data stored will grow by more than 500% to nearly 1,000 exabytes. In other words, there is a lot of data stored in the world and it will continue to grow.

Linked data: Data is increasingly networked. Web creation is facilitated by hyperlinks, blogs have pingbacks, and all major social networking systems have tags that tie it all together. Large systems are built to be interconnected.

Complex data structures: NoSQL can easily handle hierarchically nested data structures. To do the same in SQL you need multiple relational tables with all kinds of keys. There is also a correlation between performance and data complexity. The performance of traditional RDBMSs can degrade as they store vast amounts of data required by social networking applications and the Semantic Web.

What is NoSQL?

One way to define NoSQL is to consider non-NoSQL. It’s not SQL, it’s not relational. As the name suggests, it complements, not replaces RDBMS. NoSQL is designed for distributed data storage for very large data requirements. Think of Facebook with 500,000,000 users or Twitter accumulating terabits of data every day.

NoSQL databases have no fixed schemas and joins. RDBMSs “scale out” by getting faster and faster hardware and adding more memory. NoSQL, on the other hand, can take advantage of scale-out. Scale out means distributing the load across many commodity systems. It is a component of NoSQL that makes it a cost-effective solution for large data sets.

NoSQL category

The current NoSQL world can be divided into four basic categories:

  • key-value store It is mainly based on Amazon’s Dynamo Paper written in 2007. The main idea is the existence of a hash table with unique keys and pointers to specific pieces of data. These allocations are usually accompanied by a caching mechanism to maximize performance.
  • Column Family Store It was developed to store and process very large amounts of data distributed across many machines. It still has a key, but it points to multiple columns. In the case of BigTable (Google’s Column Family NoSQL model), rows are identified by a row key, and data is sorted and stored by that key. Columns are organized into column families.
  • document database Inspired by Lotus Notes, it is similar to a key-value store. Essentially, a model consists of versioned documents, which are collections of other key-value collections. Semi-structured documents are stored in a JSON-like format.
  • graph databases consists of nodes, relationships between nodes, and properties of nodes. Use a flexible chart model that scales across multiple systems instead of the rigid structure of row and column tables and SQL.
  • Major NoSQL Players

    NoSQL’s major players emerged primarily because of the organizations that introduced it. Key NoSQL technologies include:

    • generator: Developed by Amazon.com, Dynamo is the most popular key-value NoSQL database. Amazon needed a highly scalable, decentralized platform for its e-commerce store, and developed Dynamo. Amazon S3 uses Dynamo as its storage mechanism.
    • Cassandra: Open-sourced by Facebook, Cassandra is a columnar NoSQL database.
    • big table: BigTable is Google’s proprietary columnar database. Google allows the use of BigTable, but only for Google App Engine.
    • Simple DB: SimpleDB is another Amazon database. It is used for Amazon EC2 and S3 and is part of Amazon Web Services, billed as per usage.
    • CouchDB: CouchDB and MongoDB are open source, document-oriented NoSQL databases.
    • Neo For J: Neo4j is an open source graphics database.

    Query in NoSQL

    The question of how to query a NoSQL database is of interest to most developers. After all, data stored in huge databases is of no use unless it can be retrieved and presented to end users or web services. NoSQL databases do not provide a high-level declarative query language like SQL. Querying these databases instead depends on the data model.

    Many NoSQL platforms allow RESTful interfaces to data. Others provide query APIs. Some query tools have been developed that attempt to query multiple NoSQL databases. These tools typically work within a single NoSQL category. An example is SPARQL. SPARQL is a declarative query specification for graph databases. Here is an example of a SPARQL query that retrieves the URL of a specific blogger (provided by IBM).

    PREFIX foaf: 
    SELECT ?url
    FROM
    WHERE {
    ?contributor foaf:name "Jon Foobar" .
    ?contributor foaf:weblog ?url .
    }

    datatype=”code”>

    The future of NoSQL

    Organizations with large data storage needs are seriously considering NoSQL. Obviously, this concept is not popular with small organizations. According to a survey conducted by Information Week, 44% of enterprise IT professionals have never heard of NoSQL. Additionally, only 1% of respondents said NoSQL was part of a strategic alliance. NoSQL has a clear place in the connected world, but it needs to evolve to catch the public’s attention, which many believe is possible.


    More information

    An Overview of NoSQL Databases

    What it is, what it does and how it came to be

    The acronym NoSQL was coined in 1998. Many people think NoSQL is a derogatory term created to poke at SQL. In reality, the term means Not Only SQL. The idea is that both technologies can coexist and each has its place. The NoSQL movement has been in the news in the past few years as many of the Web 2.0 leaders have adopted a NoSQL technology. Companies like Facebook, Twitter, Digg, Amazon, LinkedIn, and Google all use NoSQL in one way or another. Let’s break down NoSQL so you can explain it to your CIO or even your co-workers.

    NoSQL Emerged From a Need

    Data Storage: The world’s stored digital data is measured in exabytes. An exabyte is equal to one billion gigabytes (GB) of data. According to Internet.com, the amount of stored data added in 2006 was 161 exabytes. Just 4 years later in 2010, the amount of data stored will be almost 1,000 ExaBytes which is an increase of over 500%. In other words, there is a lot of data being stored in the world and its just going to continue growing.

    Interconnected Data: Data continues to become more connected. The creation of the web fostered in hyperlinks, blogs have pingbacks and every major social network system has tags that tie things together. Major systems are built to be interconnected.

    Complex Data Structure: NoSQL can handle hierarchical nested data structures easily. To accomplish the same thing in SQL, you would need multiple relational tables with all kinds of keys. In addition, there is a relationship between performance and data complexity. Performance can degrade in a traditional RDBMS as we store the massive amounts of data required in social networking applications and the semantic web.

    What is NoSQL?

    I guess one way to define NoSQL is to consider what it is not. It’s not SQL and it’s not relational. Like the name suggests, it’s not a replacement for an RDBMS but compliments it. NoSQL is designed for distributed data stores for very large scale data needs. Think about Facebook with its 500,000,000 users or Twitter which accumulates Terabits of data every single day.

    In a NoSQL database, there is no fixed schema and no joins. An RDBMS “scales up” by getting faster and faster hardware and adding memory. NoSQL, on the other hand, can take advantage of “scaling out”. Scaling out refers to spreading the load over many commodity systems. This is the component of NoSQL that makes it an inexpensive solution for large datasets.

    NoSQL Categories

    The current NoSQL world fits into 4 basic categories.

    Key-values Stores are based primarily on Amazon’s Dynamo Paper which was written in 2007. The main idea is the existence of a hash table where there is a unique key and a pointer to a particular item of data. These mappings are usually accompanied by cache mechanisms to maximize performance.
    Column Family Stores were created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. In the case of BigTable (Google’s Column Family NoSQL model), rows are identified by a row key with the data sorted and stored by this key. The columns are arranged by column family.

    Document Databases were inspired by Lotus Notes and are similar to key-value stores. The model is basically versioned documents that are collections of other key-value collections. The semi-structured documents are stored in formats like JSON.
    Graph Databases are built with nodes, relationships between notes and the properties of nodes. Instead of tables of rows and columns and the rigid structure of SQL, a flexible graph model is used which can scale across many machines.
    Major NoSQL Players

    The major players in NoSQL have emerged primarily because of the organizations that have adopted them. Some of the largest NoSQL technologies include:

    Dynamo: Dynamo was created by Amazon.com and is the most prominent Key-Value NoSQL database. Amazon was in need of a highly scalable distributed platform for their e-commerce businesses so they developed Dynamo. Amazon S3 uses Dynamo as the storage mechanism.
    Cassandra: Cassandra was open sourced by Facebook and is a column-oriented NoSQL database.
    BigTable: BigTable is Google’s proprietary column oriented database. Google allows the use of BigTable but only for the Google App Engine.
    SimpleDB: SimpleDB is another Amazon database. Used for Amazon EC2 and S3, it is part of Amazon Web Services that charges fees depending on usage.
    CouchDB: CouchDB along with MongoDB are open source document-oriented NoSQL databases.
    Neo4J: Neo4j is an open source graph database.
    Querying NoSQL

    The question of how to query a NoSQL database is what most developers are interested in. After all, data stored in a huge database doesn’t do anyone any good if you can’t retrieve and show it to end users or web services. NoSQL databases do not provide a high-level declarative query language like SQL. Instead, querying these databases is data-model specific.

    Many of the NoSQL platforms allow for RESTful interfaces to the data. Other offer query APIs. There are a couple of query tools that have been developed that attempt to query multiple NoSQL databases. These tools typically work across a single NoSQL category. One example is SPARQL. SPARQL is a declarative query specification designed for graph databases. Here is an example of an SPARQL query that retrieves the URL of a particular blogger (courtesy of IBM):

    PREFIX foaf: SELECT ?urlFROM WHERE {?contributor foaf:name “Jon Foobar” .?contributor foaf:weblog ?url .}

    data-type=”code”>
    Future of NoSQL

    Organizations that have massive data storage needs are looking seriously at NoSQL. Apparently, the concept isn’t getting as much traction in smaller organizations. In a survey conducted by Information Week, 44% of business IT professionals haven’t heard of NoSQL. Further, only 1% of the respondents reported that NoSQL is a part of their strategic direction. Clearly, NoSQL has its place in our connected world but will need to continue to evolve to get the mass appeal that many think it could have.

    #Overview #NoSQL #Databases

    An Overview of NoSQL Databases

    What it is, what it does and how it came to be

    The acronym NoSQL was coined in 1998. Many people think NoSQL is a derogatory term created to poke at SQL. In reality, the term means Not Only SQL. The idea is that both technologies can coexist and each has its place. The NoSQL movement has been in the news in the past few years as many of the Web 2.0 leaders have adopted a NoSQL technology. Companies like Facebook, Twitter, Digg, Amazon, LinkedIn, and Google all use NoSQL in one way or another. Let’s break down NoSQL so you can explain it to your CIO or even your co-workers.

    NoSQL Emerged From a Need

    Data Storage: The world’s stored digital data is measured in exabytes. An exabyte is equal to one billion gigabytes (GB) of data. According to Internet.com, the amount of stored data added in 2006 was 161 exabytes. Just 4 years later in 2010, the amount of data stored will be almost 1,000 ExaBytes which is an increase of over 500%. In other words, there is a lot of data being stored in the world and its just going to continue growing.

    Interconnected Data: Data continues to become more connected. The creation of the web fostered in hyperlinks, blogs have pingbacks and every major social network system has tags that tie things together. Major systems are built to be interconnected.

    Complex Data Structure: NoSQL can handle hierarchical nested data structures easily. To accomplish the same thing in SQL, you would need multiple relational tables with all kinds of keys. In addition, there is a relationship between performance and data complexity. Performance can degrade in a traditional RDBMS as we store the massive amounts of data required in social networking applications and the semantic web.

    What is NoSQL?

    I guess one way to define NoSQL is to consider what it is not. It’s not SQL and it’s not relational. Like the name suggests, it’s not a replacement for an RDBMS but compliments it. NoSQL is designed for distributed data stores for very large scale data needs. Think about Facebook with its 500,000,000 users or Twitter which accumulates Terabits of data every single day.

    In a NoSQL database, there is no fixed schema and no joins. An RDBMS “scales up” by getting faster and faster hardware and adding memory. NoSQL, on the other hand, can take advantage of “scaling out”. Scaling out refers to spreading the load over many commodity systems. This is the component of NoSQL that makes it an inexpensive solution for large datasets.

    NoSQL Categories

    The current NoSQL world fits into 4 basic categories.

    Key-values Stores are based primarily on Amazon’s Dynamo Paper which was written in 2007. The main idea is the existence of a hash table where there is a unique key and a pointer to a particular item of data. These mappings are usually accompanied by cache mechanisms to maximize performance.
    Column Family Stores were created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. In the case of BigTable (Google’s Column Family NoSQL model), rows are identified by a row key with the data sorted and stored by this key. The columns are arranged by column family.

    Document Databases were inspired by Lotus Notes and are similar to key-value stores. The model is basically versioned documents that are collections of other key-value collections. The semi-structured documents are stored in formats like JSON.
    Graph Databases are built with nodes, relationships between notes and the properties of nodes. Instead of tables of rows and columns and the rigid structure of SQL, a flexible graph model is used which can scale across many machines.
    Major NoSQL Players

    The major players in NoSQL have emerged primarily because of the organizations that have adopted them. Some of the largest NoSQL technologies include:

    Dynamo: Dynamo was created by Amazon.com and is the most prominent Key-Value NoSQL database. Amazon was in need of a highly scalable distributed platform for their e-commerce businesses so they developed Dynamo. Amazon S3 uses Dynamo as the storage mechanism.
    Cassandra: Cassandra was open sourced by Facebook and is a column-oriented NoSQL database.
    BigTable: BigTable is Google’s proprietary column oriented database. Google allows the use of BigTable but only for the Google App Engine.
    SimpleDB: SimpleDB is another Amazon database. Used for Amazon EC2 and S3, it is part of Amazon Web Services that charges fees depending on usage.
    CouchDB: CouchDB along with MongoDB are open source document-oriented NoSQL databases.
    Neo4J: Neo4j is an open source graph database.
    Querying NoSQL

    The question of how to query a NoSQL database is what most developers are interested in. After all, data stored in a huge database doesn’t do anyone any good if you can’t retrieve and show it to end users or web services. NoSQL databases do not provide a high-level declarative query language like SQL. Instead, querying these databases is data-model specific.

    Many of the NoSQL platforms allow for RESTful interfaces to the data. Other offer query APIs. There are a couple of query tools that have been developed that attempt to query multiple NoSQL databases. These tools typically work across a single NoSQL category. One example is SPARQL. SPARQL is a declarative query specification designed for graph databases. Here is an example of an SPARQL query that retrieves the URL of a particular blogger (courtesy of IBM):

    PREFIX foaf: SELECT ?urlFROM WHERE {?contributor foaf:name “Jon Foobar” .?contributor foaf:weblog ?url .}

    data-type=”code”>
    Future of NoSQL

    Organizations that have massive data storage needs are looking seriously at NoSQL. Apparently, the concept isn’t getting as much traction in smaller organizations. In a survey conducted by Information Week, 44% of business IT professionals haven’t heard of NoSQL. Further, only 1% of the respondents reported that NoSQL is a part of their strategic direction. Clearly, NoSQL has its place in our connected world but will need to continue to evolve to get the mass appeal that many think it could have.

    #Overview #NoSQL #Databases


    Synthetic: Vik News

    Đỗ Thủy

    I'm Do Thuy, passionate about creativity, blogging every day is what I'm doing. It's really what I love. Follow me for useful knowledge about society, community and learning.

    Trả lời

    Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *

    Back to top button