Apache Cassandra

2015年10月29日 (四) 06:48的版本

您可以在Wikipedia上了解到此条目的英文信息 Apache Cassandra Thanks, Wikipedia.

Apache Cassandra是一套开源分布式Key-Value存储系统。它最初由Facebook开发，用于储存特别大的数据。Facebook目前在使用此系统。

主要特性

分布式
基于Column的结构化
高度可伸展性

2 nodes can handle 100,000 transactions per second, 4 nodes will support 200,000 transactions/sec and 8 nodes will tackle 400,000 transactions/sec。

Cassandra的主要特点就是它不是一个数据库，而是由一堆数据库节点共同构成的一个分布式网络服务，对Cassandra 的一个写操作，会被复制到其他节点上去，对Cassandra的读操作，也会被路由到某个节点上面去读取。对于一个Cassandra群集来说，扩展性能是比较简单的事情，只管在群集里面添加节点就可以了。

Cassandra是一个混合型的非关系型数据库，类似于Google的BigTable。其主要功能比 Dynomite（分布式的Key-Value存储系统）更丰富，但支持度却不如文档存储MongoDB（介于关系数据库和非关系数据库之间的开源产品，是非关系数据库当中功能最丰富，最像关系型数据库。支持的数据结构非常松散，是类似JSON的bjson格式，因此可以存储比较复杂的数据类型）Cassandra最初由Facebook开发，后转变成了开源项目。它是一个网络社交云计算方面理想的数据库。以Amazon专有的完全分布式的Dynamo为基础，结合了Google BigTable基于列族（Column Family）的数据模型。P2P去中心化的存储。很多方面都可以称之为Dynamo 2.0。

和其他数据库比较，有几个突出特点：

模式灵活：使用Cassandra，像文档存储，你不必提前解决记录中的字段。你可以在系统运行时随意的添加或移除字段。这是一个惊人的效率提升，特别是在大型部署上。
真正的可扩展性：Cassandra是纯粹意义上的水平扩展。为给集群添加更多容量，可以指向另一台电脑。你不必重启任何进程，改变应用查询，或手动迁移任何数据。
多数据中心识别：你可以调整你的节点布局来避免某一个数据中心起火，一个备用的数据中心将至少有每条记录的完全复制。

一些使Cassandra提高竞争力的其他功能：

范围查询：如果你不喜欢全部的键值查询，则可以设置键的范围来查询。
列表数据结构：在混合模式可以将超级列添加到5维。对于每个用户的索引，这是非常方便的。
分布式写操作：有可以在任何地方任何时间集中读或写任何数据。并且不会有任何单点失败。

版本

3.x
2.x

What’s New in Cassandra 2.2: JSON Support

1.x

指南

OS X

brew install cassandra 
cassandra -f

bin/cassandra -f
bin/cqlsh
or cqlsh 1.2.3.4 9042 // ip, port
cqlsh> help

cqlsh> CREATE KEYSPACE mykeyspace
   ... WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh> use mykeyspace;
cqlsh:mykeyspace> CREATE TABLE users (
              ... user_id int PRIMARY KEY,
              ... fname text,
              ... lname text
              ... );
cqlsh:mykeyspace> INSERT INTO users (user_id,  fname, lname)
              ... VALUES (1745, 'john', 'smith');
cqlsh:mykeyspace> INSERT INTO users (user_id,  fname, lname)
              ... VALUES (1744, 'john', 'doe');
cqlsh:mykeyspace> INSERT INTO users (user_id,  fname, lname)
              ... VALUES (1746, 'john', 'smith');
cqlsh:mykeyspace> select * from users;

 user_id | fname | lname
---------+-------+-------
    1745 |  john | smith
    1744 |  john |   doe
    1746 |  john | smith

(3 rows)

Tools

cassandra-stress tool

cd apache-cassandra-2.1.8/tools/bin
cassandra-stress help -schema
cassandra-stress write n=1000000 // 写100万行
cassandra-stress read n=200000 // 读20万行
cassandra-stress write n=1000000 cl=one -mode native cql3 -schema keyspace="stress" -log file=~/load_1M_rows.log // 写入100万行
cqlsh:mykeyspace> select count(*) from stress.standard1;
 count
---------
 1000000

系统信息

nodetool --host 127.0.0.1 cfstats
cqlsh> select * from system.batchlog;
cqlsh> select * from system.compaction_history;
cqlsh> select * from system.compactions_in_progress;
cqlsh> select * from system.hints;
cqlsh> select * from system.local;
cqlsh> select * from system.peers;
cqlsh> select * from system.peer_events;
cqlsh> select * from system.range_xfers;
cqlsh> select * from system.sstable_activity;
cqlsh> select * from system.schema_columnfamilies;
cqlsh> select * from system.schema_columns;
cqlsh> select * from system.schema_triggers;
cqlsh> select * from system.schema_usertypes;
cqlsh> select * from system.size_estimates;
cqlsh> select * from system.schema_keyspaces;
cqlsh> select * from mykeyspace.users;
cqlsh> select * from system_traces.sessions;
cqlsh> select * from system_traces.events;

Python

Python & Cassandra - Best Friends

sudo pip install ipython-cql
sudo pip install virtualenvwrapper
source /usr/local/bin/virtualenvwrapper.sh
git clone https://github.com/rustyrazorblade/python-presentation
cd python-presentation
mkvirtualenv tutorial
pip install -r requirements.txt
ipython notebook

MariaDB

MariaDB的Cassandra存储引擎，允许MariaDB通过标准SQL语法使用Cassandra集群。

Docker

Best Practices for Running DataStax Enterprise within Docker

Spark

Mesos

Cassandra on Mesos

控制台

cassandra-webconsole

厂商

案例

Netflix用Apache Cassandra替代甲骨文数据库

目前，Cassandra对于Netflix而言是首选数据库，因为它们几乎满足了Netflix的所有需求。Netflix已经将95%的数据存储在Cassandra上，包括客户账户信息、影片评分、影片元数据、影片书签和日志等。Netflix在750多个节点上运行着50多个Cassandra集群。高峰时，Netflix每秒要处理50,000多个读取和100,000写入操作。Netflix平均每天要处理21亿次的读取与43亿次的写入操作。

DataStax也为其他各种行业建立了不同版本的Cassandra工具。DataStax已经筹资8400万美元，目前有员工300多人，正准备IPO。埃利斯称，他们已经有500多家客户，包括“财富100强”中的25家大公司。
这款数据库曾被Facebook抛弃现正帮苹果壮大 Apple's, with over 75,000 nodes storing over 10 PB of data.
SoundCloud's Activity Feed and Real-Time Stats Powered by Apache Cassandra
Nexgate Chooses Cassandra over MongoDB for their Multi-Master NoSQL Solution
Urban Airship Utilizing Cassandra to Connect Hundreds of Millions of Devices for Breaking News Alerts
Apache Cassandra Powers Yakaz for 10 Million Unique Visitors Every Month
i2O Water Switches to Cassandra from Microsoft SQL Server, Saving Over 100 Million Liters of Water per Day Across the World
AppDynamics Utilizes Cassandra for App Monitoring and Metrics Tracking
Coursera’s Adoption of Cassandra
Spotify是怎样从Postgres切换至Cassandra的？

迁移

数据库迁移

文档

图集

Column
Row
Column family
Keyspace
Super
CFS文件系统
Cassandra+Kafka+Akka
OpsCenter架构
DevCenter
个性化推荐引擎
写路径
读路径
关系型和Cassandra
Cassandra中的表
eBay用户模型

链接

<discussion>characters_max=300</discussion>

@@ 第28行： / 第28行： @@
 ==版本==
+*3.x
+*2.x
 [http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-2-json-support What’s New in Cassandra 2.2: JSON Support]
+*1.x
 ==指南==