[Epub]-数字出版物制作-网页版-[2]
2015-12-09 18:45
483 查看
需求:把上一篇做网页的步骤自动化,这里我用的是python,使用beautifulsoup库处理html,以及Pillow库处理图片(进行缩放等)。
以上代码首先找到放置文字的div块,然后读取每一段文字,在insert_img方法中判断其中有没有图片关键字,根据返回值确定,如果有的话生成图片div并插入在文字部分之前,以下是insert_img方法:
这里的count,是用来判断图片应该插在左边还是右边的一个参数。
处理文字时:
完成后,再将所有代码写入html文件:
素材准备
假定数字出版物的章节中包括文字,图片,视频音频四个部分,分别保存在四个文件夹中,编写代码自动读取这些素材,用beautifsoup创建标签包裹内容,再插入到模板html中去最终生成一个完整的html页面,文件结构大致如下:. ├── base.txt #模板文件的纯文本形式 ├── Charpter1.html #生成的html页面 ├── Charpter2.html ├── main.py #主程序 ├── pics #图片素材文件夹 │ ├── cover.jpg │ ├── pic1.jpg │ ├── pic2.jpg │ ├── pic3.jpg │ ├── pic4.jpg │ ├── pic5.jpg │ ├── pic6.jpg │ └── small_pic6.jpg ├── README.md #说明文档 ├── requirement.txt #库需求文档 ├── sounds #音频素材文件夹 │ └── sound1.mp3 ├── text #文本素材文件夹 │ ├── Charpter1.txt │ └── Charpter2.txt ├── videos #视频素材文件夹 │ └── video1.mp4
读取模板和文字内容
# read html template text files with open('base.txt', 'r+') as f: text = f.read() # read html template temp = BeautifulSoup(text, "lxml") # open file and read paragraphs with open(os.path.join('./text/', filename), 'r+') as f: paras = [p.strip() for p in f.readlines() if len(p) > 3] # replace cover img cover = temp.find('img', {'id': 'cover'}) cover['src'] = './pics/cover.jpg' # handle title title = temp.find('h3') title.string = paras[0]
插入图片
插入图片需要判断图片在文章的哪个部分,所以需要在文字中标示出来# handle paras textbox = temp.find('div', {'id': 'text'}) count = [0,0] for i in range(1, len(paras)): new_p = temp.new_tag('p') new_br = temp.new_tag('br') new_p.string = paras[i] # handle img in text img_result = insert_img('pic', paras[i], temp, count) new_img_div, count = img_result[0], img_result[1] if new_img_div: textbox.append(new_img_div) textbox.append(new_p) textbox.append(new_br)
以上代码首先找到放置文字的div块,然后读取每一段文字,在insert_img方法中判断其中有没有图片关键字,根据返回值确定,如果有的话生成图片div并插入在文字部分之前,以下是insert_img方法:
def insert_img(img_keyword, para, temp, count): """ :param img_keyword:word for search in text to show here should be a picture, such as 'img', 'pic', '图片' :param para:one paragraphs in a chapter. :param temp: template of html :param count: count for img at left or right side :return new_div: create a tag of the picture, to insert into html. """ if img_keyword in para: # search pic id in current para, like 'pic1','img1' pic_id = re.search(img_keyword + r'(\d+)', para).group() print '==========insert img ' + pic_id + '==========' # get path of the pic, like './pics/pic1' pic_url = [ url for url in os.listdir('./pics') if url.startswith(pic_id)][0] # use pillow lib to open the pic im = Image.open(os.path.join('./pics', pic_url)) # decide where to locate the pic # rules: 1. if picture's width > 1/3 of the browser width # and picture's width > height: locate it center # 2. if picture's width > 1/3 of the browser width # and picture's width < height:zoom the pic and locate it at side # 3. if picture's width <1/3 of the browser width : locate it at side # 4. when locate pictures at side ,put it at left first, then right. if im.size[0] > 400 and im.size[0] > im.size[1]: # create a div to put the img new_div = temp.new_tag('div') # create a img tag new_pic = temp.new_tag('img', src='./pics/' + pic_url) # add class to div new_div['class'] = 'pic_in_text_center' # add img to div new_div.append(new_pic) elif im.size[0] > 400 and im.size[0] < im.size[1]: new_pic_url = 'small_' + pic_url im = change_img_size(im, new_pic_url) im.save(os.path.join('./pics', new_pic_url)) if count[0] > count[1]: new_div = temp.new_tag('img', src='./pics/' + new_pic_url) new_div['class'] = 'pic_in_text_right' count[1] += 1 else: new_div = temp.new_tag('img', src='./pics/' + new_pic_url) new_div['class'] = 'pic_in_text_left' count[0] += 1 else: if count[0] > count[1]: new_div = temp.new_tag('img', src='./pics/' + pic_url) new_div['class'] = 'pic_in_text_right' count[1] += 1 else: new_div = temp.new_tag('img', src='./pics/' + pic_url) new_div['class'] = 'pic_in_text_left' count[0] += 1 return new_div, count else: return None, count
这里的count,是用来判断图片应该插在左边还是右边的一个参数。
插入视频音频
和插入图片类似:def insert_sound(sound_keyword, para, temp): """ :param sound_keyword:word for search in text to show here should be a sound file, such as 'sound', 'music', '音乐' :param para:one paragraphs in a chapter. :param temp: template of html :return new_div: create a tag of the sound, to insert into html. """ if sound_keyword in para: # search sound id in current para, like 'sound1','img1' sound_id = re.search(sound_keyword + r'(\d+)', para).group() print '==========insert sound ' + sound_id + '==========' # get path of the sound, like './sounds/sound1' sound_url = [ url for url in os.listdir('./sounds') if url.startswith(sound_id)][0] new_div = temp.new_tag('audio', src='./sounds/' + sound_url, controls="controls") new_div['class'] = 'sound_in_text' return new_div else: return None def insert_video(video_keyword, para, temp): """ :param video_keyword:word for search in text to show here should be a video file, such as 'video', 'music', '音乐' :param para:one paragraphs in a chapter. :param temp: template of html :return new_div: create a tag of the video, to insert into html. """ if video_keyword in para: # search video id in current para, like 'video1','img1' video_id = re.search(video_keyword + r'(\d+)', para).group() print '==========insert video ' + video_id + '==========' # get path of the video, like './videos/video1' video_url = [ url for url in os.listdir('./videos') if url.startswith(video_id)][0] new_div = temp.new_tag( 'video', src='./videos/' + video_url, controls="controls", width="600", height="450" ) new_div['class'] = 'video_in_text' return new_div else: return None
处理文字时:
for i in range(1, len(paras)): new_p = temp.new_tag('p') new_br = temp.new_tag('br') new_p.string = paras[i] # handle img in text img_result = insert_img('pic', paras[i], temp, count) new_img_div, count = img_result[0], img_result[1] if new_img_div: textbox.append(new_img_div) new_sound_div = insert_sound('sound', paras[i], temp) if new_sound_div: textbox.append(new_sound_div) new_video_div = insert_video('video', paras[i], temp) if new_video_div: textbox.append(new_video_div) textbox.append(new_p) textbox.append(new_br)
完成后,再将所有代码写入html文件:
with open(filename[:-4] + '.html', 'w+') as f: f.write(temp.prettify("utf-8")) print '==========finish ' + filename + '=========='
相关文章推荐
- Shell Script demo_01
- uml制图工具比较: graphviz, umlet, visio
- Redis 删除匹配通配符的key
- bzoj 1036 [ZJOI2008]树的统计Count 线段树+树链剖分
- json 处理
- 判断php数组是否为空遇到的坑
- iOS运行时机制
- Android实现点击事件和长点击事件共存
- Linux常用到的命令
- tem
- 写了一个百度网盘资源搜索程序
- UI/UIWindow/UIView
- 跟着岐哥学WebApp(一)
- 这样查看告警邮件要慢一点……
- WebView的写法
- Spanny字符串样式处理使用心得。
- Linq中join 多个IEnumerable集合连接成一个表
- storm安装及启动
- 【html】【11】函数名称约束规范
- java xml 特殊字符处理(dom4j)