您的位置:首页 > 其它

[Epub]-数字出版物制作-网页版-[2]

2015-12-09 18:45 483 查看
需求:把上一篇做网页的步骤自动化,这里我用的是python,使用beautifulsoup库处理html,以及Pillow库处理图片(进行缩放等)。

素材准备

假定数字出版物的章节中包括文字,图片,视频音频四个部分,分别保存在四个文件夹中,编写代码自动读取这些素材,用beautifsoup创建标签包裹内容,再插入到模板html中去最终生成一个完整的html页面,文件结构大致如下:

.
├── base.txt #模板文件的纯文本形式
├── Charpter1.html #生成的html页面
├── Charpter2.html
├── main.py #主程序
├── pics #图片素材文件夹
│   ├── cover.jpg
│   ├── pic1.jpg
│   ├── pic2.jpg
│   ├── pic3.jpg
│   ├── pic4.jpg
│   ├── pic5.jpg
│   ├── pic6.jpg
│   └── small_pic6.jpg
├── README.md #说明文档
├── requirement.txt #库需求文档
├── sounds #音频素材文件夹
│   └── sound1.mp3
├── text #文本素材文件夹
│   ├── Charpter1.txt
│   └── Charpter2.txt
├── videos #视频素材文件夹
│   └── video1.mp4


读取模板和文字内容

# read html template text files
with open('base.txt', 'r+') as f:
text = f.read()
# read html template
temp = BeautifulSoup(text, "lxml")

# open file and read paragraphs
with open(os.path.join('./text/', filename), 'r+') as f:
paras = [p.strip() for p in f.readlines() if len(p) > 3]

# replace cover img
cover = temp.find('img', {'id': 'cover'})
cover['src'] = './pics/cover.jpg'

# handle title
title = temp.find('h3')
title.string = paras[0]


插入图片

插入图片需要判断图片在文章的哪个部分,所以需要在文字中标示出来

# handle paras
textbox = temp.find('div', {'id': 'text'})
count = [0,0]
for i in range(1, len(paras)):
new_p = temp.new_tag('p')
new_br = temp.new_tag('br')
new_p.string = paras[i]
# handle img in text
img_result = insert_img('pic', paras[i], temp, count)
new_img_div, count = img_result[0], img_result[1]
if new_img_div:
textbox.append(new_img_div)
textbox.append(new_p)
textbox.append(new_br)


以上代码首先找到放置文字的div块,然后读取每一段文字,在insert_img方法中判断其中有没有图片关键字,根据返回值确定,如果有的话生成图片div并插入在文字部分之前,以下是insert_img方法:

def insert_img(img_keyword, para, temp, count):
"""
:param img_keyword:word for search in text to show here should be a picture, such as 'img', 'pic', '图片'
:param para:one paragraphs in a chapter.
:param temp: template of html
:param count: count for img at left or right side
:return new_div: create a tag of the picture, to insert into html.
"""
if img_keyword in para:
# search pic id in current para, like 'pic1','img1'
pic_id = re.search(img_keyword + r'(\d+)', para).group()
print '==========insert img ' + pic_id + '=========='
# get path of the pic, like './pics/pic1'
pic_url = [
url for url in os.listdir('./pics') if url.startswith(pic_id)][0]
# use pillow lib to open the pic
im = Image.open(os.path.join('./pics', pic_url))
# decide where to locate the pic
# rules: 1. if picture's width > 1/3 of the browser width
# and picture's width > height: locate it center
# 2. if picture's width > 1/3 of the browser width
# and picture's width < height:zoom the pic and locate it at side
# 3. if picture's width <1/3 of the browser width : locate it at side
# 4. when locate pictures at side ,put it at left first, then right.
if im.size[0] > 400 and im.size[0] > im.size[1]:
# create a div to put the img
new_div = temp.new_tag('div')
# create a img tag
new_pic = temp.new_tag('img', src='./pics/' + pic_url)
# add class to div
new_div['class'] = 'pic_in_text_center'
# add img to div
new_div.append(new_pic)
elif im.size[0] > 400 and im.size[0] < im.size[1]:
new_pic_url = 'small_' + pic_url
im = change_img_size(im, new_pic_url)
im.save(os.path.join('./pics', new_pic_url))
if count[0] > count[1]:
new_div = temp.new_tag('img', src='./pics/' + new_pic_url)
new_div['class'] = 'pic_in_text_right'
count[1] += 1
else:
new_div = temp.new_tag('img', src='./pics/' + new_pic_url)
new_div['class'] = 'pic_in_text_left'
count[0] += 1
else:
if count[0] > count[1]:
new_div = temp.new_tag('img', src='./pics/' + pic_url)
new_div['class'] = 'pic_in_text_right'
count[1] += 1
else:
new_div = temp.new_tag('img', src='./pics/' + pic_url)
new_div['class'] = 'pic_in_text_left'
count[0] += 1
return new_div, count
else:
return None, count


这里的count,是用来判断图片应该插在左边还是右边的一个参数。

插入视频音频

和插入图片类似:

def insert_sound(sound_keyword, para, temp):
"""
:param sound_keyword:word for search in text to show here should be a sound file, such as 'sound', 'music', '音乐'
:param para:one paragraphs in a chapter.
:param temp: template of html
:return new_div: create a tag of the sound, to insert into html.
"""
if sound_keyword in para:
# search sound id in current para, like 'sound1','img1'
sound_id = re.search(sound_keyword + r'(\d+)', para).group()
print '==========insert sound ' + sound_id + '=========='
# get path of the sound, like './sounds/sound1'
sound_url = [
url for url in os.listdir('./sounds') if url.startswith(sound_id)][0]
new_div = temp.new_tag('audio', src='./sounds/' + sound_url, controls="controls")
new_div['class'] = 'sound_in_text'
return new_div
else:
return None

def insert_video(video_keyword, para, temp):
"""
:param video_keyword:word for search in text to show here should be a video file, such as 'video', 'music', '音乐'
:param para:one paragraphs in a chapter.
:param temp: template of html
:return new_div: create a tag of the video, to insert into html.
"""
if video_keyword in para:
# search video id in current para, like 'video1','img1'
video_id = re.search(video_keyword + r'(\d+)', para).group()
print '==========insert video ' + video_id + '=========='
# get path of the video, like './videos/video1'
video_url = [
url for url in os.listdir('./videos') if url.startswith(video_id)][0]
new_div = temp.new_tag(
'video',
src='./videos/' + video_url,
controls="controls",
width="600",
height="450"
)
new_div['class'] = 'video_in_text'
return new_div
else:
return None


处理文字时:

for i in range(1, len(paras)):
new_p = temp.new_tag('p')
new_br = temp.new_tag('br')
new_p.string = paras[i]
# handle img in text
img_result = insert_img('pic', paras[i], temp, count)
new_img_div, count = img_result[0], img_result[1]
if new_img_div:
textbox.append(new_img_div)
new_sound_div = insert_sound('sound', paras[i], temp)
if new_sound_div:
textbox.append(new_sound_div)
new_video_div = insert_video('video', paras[i], temp)
if new_video_div:
textbox.append(new_video_div)
textbox.append(new_p)
textbox.append(new_br)


完成后,再将所有代码写入html文件:

with open(filename[:-4] + '.html', 'w+') as f:
f.write(temp.prettify("utf-8"))
print '==========finish ' + filename + '=========='
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: