Win 2 Computer Science: 04/01/2008

Friday, April 11, 2008

python安全转编码

在对string进行decode时，有时会遇到错误的字串，导致解码失败，所以写了个安全解码函数
第一个是利用exception中的信息写的，第二个是最初用土办法写的
简单测了下，前者较后者大概有10%性能提高
def conv(s, decoding='gbk', encoding=''):
　while True:
　　try:
　　　ustr = s.decode(decoding)
　　except Exception, e:
　　　s = s[:e.start]+s[e.end:]
　　else:
　　　if encoding:
　　　　return ustr.encode(encoding)
　　　else:
　　　　return ustr

def conv(s, decoding='gbk', encoding=''):
　flag = False
　l = []
　i = 0
　while i < len(s):
　　if flag:
　　　try:
　　　　u = (ch+s[i]).decode(decoding)
　　　except:
　　　　flag = False
　　　　i+=1
　　　else:
　　　　flag = False
　　　　l.append(u)
　　　　i+=1
　　elif ord(s[i]) > 0x80:
　　　ch = s[i]
　　　flag = True
　　　i+=1
　　else:
　　　l.append(s[i].decode('gbk'))
　　　i+=1
　if not encoding:
　　result = ''.join(l)
　else:
　　result = ''.join(l).encode(encoding)
　return result

pydev的注释快捷键

Ctrl+3 行注释
Ctr+\ 去行注释
Ctrl+Shift+3 去行注释

Ctrl+4 块注释
Ctrl+5 去块注释

Ctrl+9 折叠全部
Ctrl+0 展开全部

Ctrl+- 折叠
Ctrl+= 展开

Ctrl+Shift+Up 上一函数
Ctrl+Shift+Down 下一函数

Ctrl+Shift+O 整理导入顺序

python中自动识别文本编码

Universal Encoding Detector
http://chardet.feedparser.org/
源自Mozilla中的auto-detection code
试用了下，比较有效

顺便看了下http://www.feedparser.org/
发现还有个Universal Feed Parser
用于对RSS 和 ATOM数据的解析

Example: Using the detect function

The detect function takes one argument, a non-Unicode string. It returns a dictionary containing the auto-detected character encoding and a confidence level from 0 to 1.

>>> import urllib
>>> rawdata = urllib.urlopen('http://yahoo.co.jp/').read()
>>> import chardet
>>> chardet.detect(rawdata)
{'encoding': 'EUC-JP', 'confidence': 0.99}

Example: Detecting encoding incrementally

import urllib
from chardet.universaldetector
import UniversalDetector usock = urllib.urlopen('http://yahoo.co.jp/')
detector = UniversalDetector()
for line in usock.readlines():
detector.feed(line)
if detector.done: break
detector.close()
usock.close()
print detector.result
{'encoding': 'EUC-JP', 'confidence': 0.99}

Example: Detecting encodings of multiple files

import glob
from charset.universaldetector import UniversalDetector
detector = UniversalDetector()
for filename in glob.glob('*.xml'):
print filename.ljust(60),
detector.reset()
for line in file(filename, 'rb'):
detector.feed(line)
if detector.done: break
detector.close()
print detector.result

pydev中使用pylint

1.安装pylint
下载并安装pylint, logilab-astng, logilab-common
2.配置使用pylint
Eclipse中
(1) Window -> preferences -> Pydev -> Pylint
选中"Use pylint?"
在输入lint.py的地址,例如"C:\Python25\Lib\site-packages\pylint\lint.py"
(2) Project->Properties->PyDev-PYTHONPATH
增添项目的源文件目录到"Project Source Folders"
(3) 选中Project->Build Automatically,这样保存修改时pylint就会自动check项目中的代码,否则就要用Ctrl+B手动build并触发pylint

PS:command模式使用lint.py
lint.py --files-output=y --reports=y src/ (需要先把lint.py所在目录添加到PATH)
会生成两个pylint_开头的文件

python中的数组

一维
a = [0] * 100

二维
a = [ [0]*10 for j in range(10)]
错误写法
a = [[0]*10]*10
原因：
a[1]...a[9]都是a[0]的引用

另外，在写自己module时会有中间量，但在import这个module时，又不希望这些中间量可见
总结下方法:
1.将中间量命名为_X这种格式
当中间量为module时： import string as _string
2.定义__all__：
__all__ = 对外可见的名字列表
__all__ = ['a', 'b']

test code display

for i in range(3):
　print "Hello,my world"
TEST OK!
以后就在blogspot上贴啦

Win 2 Computer Science