买股票或者买基金,我们经常听到说要低买高卖,但是什么是高,什么是底,究竟用什么衡量比较好?答案是PE,PE代表了估值水平的高低,观看一个股票或者基金的PE变化,就可以得出当前的估值水平是高还是低,决定是否进行交易。
数据获取¶
从东财choice软件手动导出数据到excel,并且保存到data目录
In [1]:
!ls data/
数据提取¶
先从文件中读取所有excel,并将里面各个sheet单独保存成csv文件供后面分析
In [2]:
import pandas as pd
from streamz import Stream
In [3]:
def parse_file(filename):
return filename.split('/')[-1].split('.')[0],pd.ExcelFile(filename)
def parse_sheet(tuple_excel_file):
filename,excel_file=tuple_excel_file
return [(sheetname,filename,excel_file.parse(sheetname)) for sheetname in excel_file.sheet_names]
def get_sheets(excel_file):
return excel_file.sheet_names
def to_csv(tuple_sheet):
sheetname,filename,df = tuple_sheet
df.to_csv('sheets/'+filename+'_'+sheetname+'.csv')
def notNaNdf(tuple_sheet):
sheetname,filename,df = tuple_sheet
return not df.empty
In [4]:
source = Stream.filenames('data/')\
.map(parse_file)\
.map(parse_sheet).flatten()\
.filter(notNaNdf)\
.sink(to_csv)
source.start()
In [5]:
!ls sheets/
初步分析¶
选取非行业和产业的PE数据,绘图查看PE(估值高低)的走势
In [6]:
l = []
source = Stream.filenames('sheets/')\
.filter(lambda x:'PE' in x and '行业' not in x and '产业' not in x and '历史' not in x)\
.rate_limit(1).map(pd.read_csv)
l = source.sink_to_list()
source.start()
In [7]:
def df_process(df):
df.fillna(0,inplace=True)#填充空值为0
# df.set_axis(df.iloc[0]],'columns',inplace=True)#设置第一行为列名
df.set_axis(df.iloc[1],'columns',inplace=True)#设置第二行为列名
df.drop(df.index[[0,1,2]],inplace=True)#删除前三行无用数据
#行名变更
cl =list(df.columns)
cl[0] = 'first'
cl[1] = 'date'
df.columns = cl
df.drop(df.index[-1],inplace=True)#删除行尾的版权申明
df = df.set_index(pd.to_datetime(df.date))#转换时间列,并将其设置为索引
df.drop(['first','date'],axis=1,inplace=True)#删除无用列
return df
In [8]:
ll = [df_process(i) for i in l]
In [9]:
#合并数据
from functools import reduce
df = reduce(lambda x, y: pd.merge(x, y, on = 'date',how='outer'), ll)
In [10]:
df.columns
Out[10]:
In [11]:
#df.drop('399967.SZ',inplace=True,axis=1)#去掉某个不想要的指数
df.drop('中证军工',inplace=True,axis=1)
In [12]:
df = df.sort_index()
df = df.iloc[1:,:]#去掉首行1970年数据
df = df.applymap(float)
df.fillna(0,inplace=True)#空值替换为0
df.describe().T
Out[12]:
In [13]:
df = df.loc['2005-01-01':]
df = df.resample('3M').mean()
In [14]:
from hvplot import pandas
df.hvplot()