您的位置:首页 > 业界新闻

[读]互联网应用服务扩展的一点经验

2011-05-13 13:15 405 查看
互联网应用服务扩展的一点经验

http://blog.rebill.info/archives/wangdi-internet-service.html

FreeWheel:互联网MRM视频广告投放发布平台: B2B:content owner ->content distributiors

广告应用服务器: 匹配(用户请求和现有广告的匹配)

Log processor: map-reduce(hapdoop), ETL--->OLAP(数据仓库)

Pusher: OLTP DB cache in memory, Pull from DB, 预处理和数据准备。mmap structured memory dump

总结:

商业模式和业务决定设计方案
服务的高可靠性: 99.99% uptime
峰值比: 5:1 peak to Mean.

1.应用服务扩展
1.1 无状态的应用服务器:
把需要状态的信息都编码到url,消除服务器间网络通信依赖
应用服务可重启


1.2 复制和多层次cache.

Master->slave读写分离: 避免Master"长"写锁block slave读锁
Cache的expire time需要认真考虑

Dot server: 无逻辑的server, 为了避免广告应用服务器集群全部单机造成对外服务失效。
Dot server对请求返回一个cache的无广告标准输出.


日志处理:
使用google protocol buffer避免自己定义格式和写parser, 同时binary log减少日志体积, 扩展字段方便
小公司尽量少去重新造轮子.


2.数据仓库扩展:
De-Normalization 反范式, 允许冗余, SQL逻辑简单, 查询性能好, 标准BI工具建模容易
Pivot: 合并相同key的多行数据到一行, 提高
Long tail roll up(长尾成一个item)

Benchmarking:
提取mysql slow query log多次平均测量值, 每月选择top slow query优化
InnoDB buffer设置70%机器内存
不要为了优化而优化, 只有在需要时才考虑:Table partition(分表-垂直分割) and sharding(按客户分库-水平分割)

3.运营原则
系统容量扩展规划: 为峰值预留50%容量, 当系统平均负载>50%, 是扩容的信号。

N+1 Data center: 数据中心不同地理位置分布,备用ISP和CDN


监控:
1. 应用check live
2. 服务异常警报: 错误,延时等
3. 数据库master-slave同步
4. Slow query日报
5. 当日业务运营情况日报

多阶段部署: 建一个和生产环境等比例缩小的Lab, 拓扑结构和生产环境相同, 使用生产环境的真实数据做集成测试。
分阶段部署, 分批分时升级


测试: DEV vs QA: 1:1

以自动化回归测试为核心。

Netlog: What we learned about scalability & high availability
http://www.slideshare.net/folke/netlog-what-we-learned-about-scalability-high-availability-430211
Apache+PHP+eAccelerator+Keepalived(for HA)
Ngnix+Lighttpd+CDN: static files(css/js/image/photo/video)
Search: Sphinx, mysql full-text search is very slow.

DB partitioning(sharding): Divide data on primary key,
How: Mysql partitioning since 5.1

Memcached for sessoin/query result/processed data/generated html
Cache with TTL/Cache forever with invalidate/Cache forever with update


Global locking: use memcache as locking mechanism

Flooding detection by useing memcache[很通用的高效flooding判断方法]
User can only redo action A after a timeout
a guestbook message can only be posted once every
2 minutes
User can not do action A more than X times in T
minutes
only 12 failed login attempts per hour are allowed






Scalability, Availability & Stability Patterns
http://www.slideshare.net/jboner/scalability-availability-stability-patterns

Scalable Web Architectures: Common Patterns and Approaches
http://www.slideshare.net/techdude/scalable-web-architectures-common-patterns-and-approaches
应用架构设计的3个目标: Scale, HA, Performance.

What is scalability for ?
1. Traffic growth
2. Dataset growth
3. Maintainability


Scalability two kinds:
1. Vertical(get bigger): 有些时候增加一些硬件(内存)的代价要小于重新设计软件或者切分数据
比如Mysql性能不够时可以先加一些内存试试.

2. Horizontal(get more)

Share nothing的server容易扩展.

Queuing: with queue, it is easy to parallel in asynchronus method

Database is the toughest part to scale. Dual Intel64 system wtth 16GB+ of RAM can get you a long way.

Mysql: Master-Master+multi-slave(as hot/hot) is good for HA.
design schema/access to avoid collision(hashing users to servers)
No auto-inc columns for hot/hot


Data Federation:
Simple things first: Vertical partitioning + sharding+ central lookup

Multi-site HA:

GSLB: global server load balancing, easiest are DNS
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐