월별 글 목록: 2011년 11월월

YAML 포맷을 편하게 다루기 위해서 필요한 것들..

YAML 파싱 라이브러리 : http://code.google.com/p/snakeyaml/
YAML 포맷을 수정하기 편하게 해 주는 에디터(Eclipse Plugin) : http://code.google.com/p/yedit/
YAML 포맷을 체크할 수 있는 사이트 : http://yaml-online-parser.appspot.com/, http://codebeautify.org/yaml-validatorhttp://www.yamllint.com/

흠.. 더 필요한게 또 뭐가 있을까??

Cassandra에서 설정으로 사용하는 YAML 포맷에 대해서..

카산드라는 0.7 버전부터인가? 기본 설정 파일 포맷으로 YAML 포맷을 사용하기 시작했다. YAML 파일은 위키의 YAML 포맷에 대한 정의를 살펴보면, 사람이 쉽게 읽을 수 있는 데이터 직렬화 양식이라고 한다. 흠 그래.. 쪼매 쉽게 읽히긴 한다.

아래에서 0.6 버전에서 사용했던 XML 포맷과 현재(1.0.2)에서 사용하고 있는 YAML 포맷을 살펴보자. 어떤 설정이 더 가독성이 있을까? 라는 질문에 나는 XML 포맷이라고 답하겠다.

단순히 포맷만 보면, XML이 YAML 포맷에 비해서 비 효율적이고 파싱에 대한 비용도 많이 들겠다. 하지만, 카산드라에서 아래의 용도는 클러스터/머신의 상황에 맞게 카산드라가 잘 돌게 하기 위한 설정이다. 설정이라고 하면, 포맷에 맞춰서 설정한 값의 가독성이 매우 중요한 포인트라고 생각한다. 그런 면에서 나는 꼭 설정에 XML을 사용할 필요는 없겠지만, 가급적이면 YAML보다 쉽고(비교적 쉽게 느껴지는 주관) 가독성이 좋은 XML을 사용해 줬으면 하지만, 지금은 YAML이 기본 포맷이다. YAML 포맷을 숙지하고 사용해야 겠지만, 아쉬운 느낌이다.. 아.. XML 포맷이 좋아지다니..

– XML 포맷

<Storage>
<ClusterName>Test Cluster</ClusterName>
<AutoBootstrap>false</AutoBootstrap>
<Keyspaces>
<Keyspace Name=”Keyspace1″>
<ColumnFamily CompareWith=”BytesType” Name=”Standard1″ RowsCached=”10%” KeysCachedFraction=”0″/>
<ColumnFamily CompareWith=”UTF8Type” Name=”Standard2″/>
<ColumnFamily CompareWith=”TimeUUIDType” Name=”StandardByUUID1″/>
<ColumnFamily ColumnType=”Super”
CompareWith=”UTF8Type”
CompareSubcolumnsWith=”UTF8Type”
Name=”Super1″
RowsCached=”1000″
KeysCachedFraction=”0″
Comment=”A column family with supercolumns, whose column and subcolumn names are UTF8 strings”/>
<ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
<ReplicationFactor>1</ReplicationFactor>
<EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
</Keyspace>
</Keyspaces>  <Authenticator>org.apache.cassandra.auth.AllowAllAuthenticator</Authenticator>
<Partitioner>org.apache.cassandra.dht.RandomPartitioner</Partitioner>
<InitialToken></InitialToken>
<CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory>
<DataFileDirectories>
<DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory>
</DataFileDirectories>
<CalloutLocation>/var/lib/cassandra/callouts</CalloutLocation>
<StagingFileDirectory>/var/lib/cassandra/staging</StagingFileDirectory>
<Seeds>
<Seed>127.0.0.1</Seed>
</Seeds>
<RpcTimeoutInMillis>5000</RpcTimeoutInMillis>
<CommitLogRotationThresholdInMB>128</CommitLogRotationThresholdInMB>
<ListenAddress>localhost</ListenAddress>
<StoragePort>7000</StoragePort>
<ThriftAddress>localhost</ThriftAddress>
<ThriftPort>9160</ThriftPort>
<ThriftFramedTransport>false</ThriftFramedTransport>
<DiskAccessMode>auto</DiskAccessMode>
<SlicedBufferSizeInKB>64</SlicedBufferSizeInKB>
<FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
<FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
<ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
<MemtableThroughputInMB>64</MemtableThroughputInMB>
<BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
<MemtableOperationsInMillions>0.3</MemtableOperationsInMillions>
<MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
<ConcurrentReads>8</ConcurrentReads>
<ConcurrentWrites>32</ConcurrentWrites>
<CommitLogSync>periodic</CommitLogSync>
<CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS>
<GCGraceSeconds>864000</GCGraceSeconds>
</Storage>

– YAML 포맷

cluster_name: ‘Test Cluster’
initial_token:
hinted_handoff_enabled: true
max_hint_window_in_ms: 3600000 # one hour
hinted_handoff_throttle_delay_in_ms: 50
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
authority: org.apache.cassandra.auth.AllowAllAuthority
partitioner: org.apache.cassandra.dht.RandomPartitioner
data_file_directories:
– /var/lib/cassandra/data
commitlog_directory: /var/lib/cassandra/commitlog
saved_caches_directory: /var/lib/cassandra/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000

seed_provider:
– class_name: org.apache.cassandra.locator.SimpleSeedProvider

parameters:
– seeds: “127.0.0.1”

flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6
concurrent_reads: 32
concurrent_writes: 32
memtable_flush_queue_size: 4
sliced_buffer_size_in_kb: 64
storage_port: 7000
ssl_storage_port: 7001
listen_address: localhost
rpc_address: localhost
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
incremental_backups: false
snapshot_before_compaction: false
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 16
compaction_preheat_key_cache: true
rpc_timeout_in_ms: 10000
endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 600000
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 128
encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra

Android Oftener 프로젝트..

안드로이드(Android) Oftener 프로젝트..

이 프로젝트는 간단하게 내가 자주 전화를 걸고, 문자를 보내는 사람들을 화면에 많이 보낸 순으로 정렬해 주고, 전화와 문자를 바로 보낼 수 있게 해 주는 유틸리티성 앱 프로젝트이다. 이 프로젝트는 간단하게 개발할 수 있어서 3~4일에 걸쳐서 개발을 했고, 디자이너(ideamplifier_at_gmail.com)인 존님의 도움을 받아서 디자인을 입혔다. 개인적으로, 번들 Favorite이 좀 불편해서 개발을 했다.

아래는 캡쳐를 한 화면이다..

마켓 링크 : https://market.android.com/details?id=net.sjava.oftener

Cassandra와 Hadoop, HBase, MongoDB와의 차이점..

아래는 http://www.datastax.com/faq#intro-3에 있는 내용이다. 설명이 간단 명료해서 쉽게 차이점을 이해할 수 있다. ^^

How does Cassandra differ from Hadoop?

The primary difference between Cassandra and Hadoop is that Cassandra targets real-time/operational data, while Hadoop has been designed for batch-based analytic work.

There are many different technical differences between Cassandra and Hadoop, including Cassandra’s underlying data structure (based on Google’s Bigtable), its fault-tolerant, peer-to-peer architecture, multi-data center capabilities, tunable data consistency, all nodes being the same (no concept of a namenode, etc.) and much more.

How does Cassandra differ from HBase?

HBase is an open-source, column-oriented data store modeled after Google Bigtable, and is designed to offer Bigtable-like capabilities on top of data stored in Hadoop. However, while HBase shared the Bigtable design with Cassandra, its foundational architecture is much different.

A Cassandra cluster is much easier to setup and configure than a comparable HBase cluster. HBase’s reliance on the Hadoop namenode equates to there being a single point of failure in HBase, whereas with Cassandra, because all nodes are the same, there is no such issue.

In internal performance tests conducted at DataStax (using the Yahoo Cloud Serving Benchmark – YCSB), Cassandra offered literally 5X better performance in writes and 4X better performance on reads than HBase.

How does Cassandra differ from MongoDB?

MongoDB is a document-oriented database that is built upon a master-slave/sharding architecture. MongoDB is designed to store/manage collections of JSON-styled documents.

By contrast, Cassandra uses a peer-to-peer, write/read-anywhere styled architecture that is based on a combination of Google BigTable and Amazon Dynamo. This allows Cassandra to avoid the various complications and pitfalls of master/slave and sharding architectures.
Moreover, Cassandra offers linear performance increases as new nodes are added to a cluster, scales to terabyte-petabyte data volumes, and has no single point of failure.