—Python—
Python 的各种类库为快速处理数据提供了诸多便利, 其中包括以下几个:
SciPy 下包含的Matplotlib 用于绘制2D图, NumPy 用于科学计算, 以及pandas用于数据分析。
例子就是用其中的 pandas 和 matplotlib 来抓取Cisco在过去一年内的股票情况
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from matplotlib.finance import quotes_historical_yahoo_ohlc
from datetime import date
import pandas as pd
today= date.today()
start=(today.year-1, today.month, today.day)
quotes = quotes_historical_yahoo_ohlc('CSCO', start, today)
fields = ['date','open','high','low','close','volume']
list1 = []
for i in range(0, len(quotes)):
x = date.fromordinal(int(quotes[i][0]))
y = x.strftime('%Y-%m-%d')
list1.append(y)
quotesdf = pd.DataFrame(quotes, columns = fields)
quotesdf['trade date'] = pd.Series(list1)
quotesdf = quotesdf.drop(['date'], axis=1)
print quotesdf
结果如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
open high low close volume trade date
0 25.899843 26.354058 25.783874 25.909507 30490200 2015-11-17
1 26.093126 26.247752 25.832194 26.209096 27015700 2015-11-18
2 26.189767 26.721293 26.141445 26.450699 27417400 2015-11-19
3 26.663308 26.846927 26.537674 26.643980 26502800 2015-11-20
4 26.721294 26.904912 26.421706 26.508683 24684600 2015-11-23
5 26.334730 26.518349 26.093127 26.354058 32859200 2015-11-24
6 26.402377 26.470025 26.093125 26.325064 22472500 2015-11-25
7 26.334728 26.557003 26.325064 26.402377 9532300 2015-11-27
8 26.421706 26.557004 26.286409 26.334729 30736700 2015-11-30
9 26.286409 26.721293 26.286409 26.643980 31406300 2015-12-01
10 26.566668 26.962897 26.450699 26.518348 29178200 2015-12-02
11 26.701964 26.759951 25.919171 26.044806 25783800 2015-12-03
12 26.044806 26.624651 26.044806 26.557003 28143700 2015-12-04
13 26.634316 26.634316 26.344393 26.566668 15350200 2015-12-07
14 26.257415 26.373386 26.112453 26.238087 18635700 2015-12-08
15 26.054468 26.431369 25.764545 25.832194 24127400 2015-12-09
16 25.841859 26.131783 25.764546 25.870851 23305200 2015-12-10
17 25.600256 25.600256 25.252347 25.281340 34295700 2015-12-11
18 25.416636 25.629247 25.088056 25.600255 32642100 2015-12-14
19 25.822529 26.199432 25.716226 25.948164 30393700 2015-12-15
20 26.093125 26.373386 25.783873 26.325064 22731500 2015-12-30
.. ... ... ... ... ... ...
222 31.410000 31.680000 31.410000 31.590000 11808600 2016-10-05
223 31.570000 31.629999 31.209999 31.480000 14077100 2016-11-08
247 31.040001 31.490000 30.700001 31.360001 38428900 2016-11-09
248 31.410000 31.760000 30.809999 31.000000 38345000 2016-11-10
249 30.930000 31.469999 30.920000 31.360001 23109200 2016-11-11
250 31.430000 31.670000 31.350000 31.370001 22912700 2016-11-14
251 31.270000 31.850000 31.270000 31.700001 24118800 2016-11-15
[252 rows x 6 columns]
到这里基本上已经拿到了过去一年思科的全部数据, 但是数字看了太烦了, 要好看还是得用图表, 于是再花了点时间,把图给整出来了。 代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
from matplotlib.finance import quotes_historical_yahoo_ohlc
import matplotlib.pyplot as plt
from datetime import date
import pandas as pd
today= date.today()
start=(today.year-1, today.month, today.day)
quotes = quotes_historical_yahoo_ohlc('CSCO', start, today)
fields = ['date','open','high','low','close','volume']
list1 = []
for i in range(0, len(quotes)):
x = date.fromordinal(int(quotes[i][0]))
y = x.strftime('%Y-%m-%d')
list1.append(y)
quotesdf = pd.DataFrame(quotes, columns = fields)
quotesdf['trade date'] = pd.Series(list1)
quotesdf = quotesdf.drop(['date'], axis=1)
fig = plt.figure(1) # 创建图表1
ax = fig.add_subplot(111, frameon=False)
#处理x轴的显示与间隔的关系,把刻度转化成时间
alist = range(0, len(quotesdf['close']), 50)
tlist = []
for a in alist:
print a
t = quotesdf['trade date'][a]
tlist.append(t)
ax.set_xticks(range(0, len(quotesdf['close']), 50))
ax.set_xticklabels(tlist)
plt.plot(range(0, len(quotesdf['close'])), quotesdf['close'])
plt.xdata = (quotesdf['trade date'])
plt.show()
于是就看到了漂亮的折线图: