在经典的动量选股策略中,有一个比较经典的特征,就是机构持股比例,一般来说机构持股比例高的股票跟基本面因子的相关度更高
在聚宽和优矿的数据中都没有机构持股比例这个数据,但是在东方财富网站上有这个数据,例如合兴包装的机构持股统计在这个网页可以看到
http://data.eastmoney.com/zlsj/detail/2018-09-30-0-002228.html

为了能够在量化策略中使用这个因子,决定手动写代码从网站抓取这个数据,
python抓这个数据使用requests_html的HTMLSession或者用urllib+ bs4都可以实现
使用urllib+ bs4的代码如下
[cc lang="python"]
import urllib
import bs4
#下载页面
def get_html(url):
html = urllib.request.urlopen(url).read()
html = html.decode('gbk')
return html
#解析页面抓取需要的数据
def get_page(url):
html = get_html(url)
soup = bs4.BeautifulSoup(html, "html.parser")
table = soup.find('table', class_='tab1')
holdElement = table.select('td')
if (holdElement == None):
print("hold empty");
#基金持股家数
try:
fund_num = int(holdElement[2].text)
except:
fund_num = 0
#基金持股股数
try:
fund_hold_vol = float(holdElement[3].text)
except:
fund_hold_vol = 0.0
#基金持股市值
try:
fund_hold_value = float(holdElement[4].text)
except:
fund_hold_value = 0.0
#基金持股占总市值比例
try:
fund_ratio = float(holdElement[5].text)
except:
fund_ratio = 0.0
#基金持股占流通市值比例
try:
fund_ratio_in_circu = float(holdElement[6].text)
except:
fund_ratio_in_circu = 0.0
#机构汇总总家数
try:
total_num = int(holdElement[38].text)
except:
total_num = 0
#机构汇总持股股数
try:
total_hold_vol = float(holdElement[39].text)
except:
total_hold_vol = 0.0
#机构汇总持股市值
try:
total_hold_value = float(holdElement[40].text)
except:
total_hold_value = 0.0
#机构汇总占总股本比例
try:
total_ratio = float(holdElement[41].text)
except:
total_ratio = 0.0
#机构汇总占流通股本比例
try:
total_ratio_in_circu = float(holdElement[42].text)
except:
total_ratio_in_circu = 0.0
return (code, end_date, fund_num, fund_hold_vol, fund_hold_value, fund_ratio, fund_ratio_in_circu,
total_num, total_hold_vol,total_hold_value, total_ratio, total_ratio_in_circu)
[/cc]
拿到这些数据后,再写入mysql或者csv,就可以在策略中使用了
[cc lang="python"]
end_dates=('2017-03-31','2017-06-30','2017-09-30','2017-12-31','2018-03-31','2018-06-30')
#end_date='2018-09-30'
for end_date in end_dates:
for code in df_stocks['code']:
url = "http://data.eastmoney.com/zlsj/detail/%s-0-%s.html"
args = get_page(url % (end_date,code[:-5]))
#查询数据库判断是否存在记录
sql_select = "SELECT * FROM institutional_hold WHERE code = '%s' AND end_date='%s' "
cur = connect.cursor()
record_count = cur.execute(sql_select % (code, end_date))
if record_count > 0:
continue
column_str = """code, end_date, fund_num, fund_hold_vol, fund_hold_value, fund_ratio,
fund_ratio_in_circu,total_num, total_hold_vol,total_hold_value, total_ratio,
total_ratio_in_circu"""
insert_str = "'%s','%s',%d,%.2f,%.2f,%.2f,%.2f,%d,%.2f,%.2f,%.2f,%.2f"
sql_insert = "INSERT INTO institutional_hold (%s) VALUES (%s)" % (column_str, insert_str)
try:
cur.execute(sql_insert % args)
connect.commit()
except:
connect.rollback()
print(str(code) + ' ' + str(end_date), end= '\r')
[/cc]
三把刷子
暂无评论