Skip to content

Commit cf01420

Browse files
committed
add initial posts for python weekly #258
1 parent 40f8410 commit cf01420

5 files changed

Lines changed: 1240 additions & 0 deletions

File tree

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
原文:[Python Weekly Issue 258](http://us2.campaign-archive1.com/?u=e2e180baf855ac797ef407fc7&id=dadedf0a62&e=148158c7b4)
2+
3+
---
4+
5+
欢迎来到Python周刊第258期。让我们直奔主题。
6+
7+
# 来自赞助商
8+
9+
[![](https://gallery.mailchimp.com/e2e180baf855ac797ef407fc7/images/7394541b-6b55-4fde-8756-6b7547029f1b.png)](https://hired.com/?utm_source=newsletters&utm_medium=pythonweekly&utm_campaign=q3-16)
10+
11+
你时髦,精明,高效。但是为什么还用老方法找工作?[试试Hired吧](https://hired.com/?utm_source=newsletters&utm_medium=pythonweekly&utm_campaign=q3-16),在4,000+家高科技公司面前闪亮登场,并提供个性化支持,助你找到理想的工作。
12+
13+
14+
# 文章,教程和讲座
15+
16+
[使用Python探索Git](https://www.youtube.com/watch?v=CB9p8n3gugM)
17+
18+
在这个演讲中,我们以磁盘上的Git数据结构的一个简单的解释开始。然后,开始直播编码,读取这些数据结构,接着重构一个用于任意git仓库的`git log`命令,而无需使用`git`命令。完成后,我们应该拥有自己能用的命令,对于任意仓库,任意分支,它和`git log`功能一致。我们将简单地从`HEAD`开始,一路直达数据结构。
19+
20+
[在Django中使用IBM Watson API](https://www.epilis.gr/en/blog/2016/08/18/ibm-watson-apis-django/)
21+
22+
花几分钟,构建一个使用IBM Watson API来分析评论的Django应用。
23+
24+
[Tensorflow上的RNNs,实用指南和未公开特性](http://www.wildml.com/2016/08/rnns-in-tensorflow-a-practical-guide-and-undocumented-features/)
25+
26+
这篇文章重温Tensorflow上的RNNs的使用最佳实践,特别是在官网上没有得到很好记录的特性。
27+
28+
[Lists和Tuples大对决](http://nedbatchelder.com/blog/201608/lists_vs_tuples.html)
29+
30+
常见的Python初学者问题:列表和元组之间有何区别?答案是,有两个不同的差异,以及两者之复杂的相互作用。还有就是技术差异和文化差异。
31+
32+
[Podcast.__init__ 第71集 - 和Radim Řehůřek聊聊Gensim](https://podcastinit.com/radim-rehurek-gensim.html)
33+
34+
能够了解一段文本的上下文通常被认为是人工智能领域。然而,主题建模和语义分析可以用来让计算机确定不同的消息和文章是否是关于同样的事情。本周,我们和Radim Řehůřek聊聊他关于GenSim的工作,GenSim是一个Python库,用来进行非结构化文本的无监督分析,并应用机器学习模型到自然语言理解的问题上。
35+
36+
[页面扫描](https://mzucker.github.io/2016/08/15/page-dewarping.html)
37+
38+
一篇显示了如何扁平化弯曲页面上的图像的文章。
39+
40+
[Python中的线性分类介绍](http://www.pyimagesearch.com/2016/08/22/an-intro-to-linear-classification-with-python/)
41+
42+
这篇文章讨论了参数化学习和线性分类的基础知识。虽然简单,但是线性分类可以被看成更高级的机器学习算法基本构架模块,自然扩展到神经网络和卷积神经网络。
43+
44+
[使用Python和LLVM的,用于TensorFlow计算图形的JIT本地代码生成](http://blog.christianperone.com/2016/08/jit-native-code-generation-for-tensorflow-computation-graphs-using-python-and-llvm/)
45+
46+
[Python JIT来了](https://lwn.net/Articles/691070/)
47+
48+
[用于格式化和数据清理的便捷Python库](https://blog.modeanalytics.com/python-data-cleaning-libraries/)
49+
50+
[生成梦幻地图](http://mewo2.com/notes/terrain/)
51+
52+
[如何用Python和Flask构建和部署一个Facebook Messenger机器人,一个教程](http://tsaprailis.com/2016/06/02/How-to-build-and-deploy-a-Facebook-Messenger-bot-with-Python-and-Flask-a-tutorial/)
53+
54+
55+
# 好玩的项目,工具和库
56+
57+
[Kyoukai](https://github.com/SunDwarf/Kyoukai)
58+
59+
Kyōkai是一个快速的异步Python服务器端Web框架。它建立在asyncio和用于非常快速的web服务器的Asphalt框架之上。
60+
61+
[undebt](https://github.com/Yelp/undebt)
62+
63+
Undebt是一个快速、简单、可靠的工件,用于执行大规模的自动化代码重构(Yelp的使用工具). Undebt允许你使用标准而直接的Python来定义复杂的查找替换规则,使用一个简单的命令就可以快速应用到整个代码库。
64+
65+
[PyGradle](https://github.com/linkedin/pygradle) 
66+
67+
PyGradle构建系统是一个Gradle插件集,它可以用来构建Python工件。由PyGradle生成的工件与由Python的setuptools库生成的工件向前及向后兼容。
68+
69+
[PyCNN](https://github.com/ankitaggarwal011/PyCNN)
70+
71+
在Python中使用细胞神经网络进行图像处理。
72+
73+
[Enforce](https://github.com/RussBaz/enforce)
74+
75+
Enforce是一个简单的Python 3.5 (或者更高版本)应用,它基于类型提示(PEP 484)强制运行时类型检查。
76+
77+
[pybble](https://github.com/hiway/pybble)
78+
79+
Pebble的Python(几乎)支持
80+
81+
[ray](https://github.com/felipevolpone/ray)
82+
83+
一个帮助你提供精心设计的Python API的框架。
84+
85+
[picotui](https://github.com/pfalcon/picotui)
86+
87+
仅需最小化依赖的,轻量、纯Python文本用户界面控件工具箱。
88+
89+
[chartpy](https://github.com/cuemacro/chartpy)
90+
91+
简单的使用Python API封装器来使用matplotlib, plotly, bokeh等绘制图表。
92+
93+
[ulmo](https://github.com/ulmo-dev/ulmo)
94+
95+
干净、简单、快速地访问公共水文和气候数据。
96+
97+
[rex](https://github.com/shellphish/rex)
98+
99+
Shellphish的自动利用引擎,最初是为了Cyber Grand挑战赛创建的。
100+
101+
[posio](https://github.com/abrenaut/posio)
102+
103+
使用Websockets的多人地理游戏。
104+
105+
[ansible-django-stack](https://github.com/jcalazan/ansible-django-stack)
106+
107+
使用Nginx, Gunicorn, PostgreSQL, Celery, RabbitMQ, Supervisor, Virtualenv, 和Memcached设置一个Django应用的Ansible Playbook。还包含了一个用于配置VirtualBox虚拟机的Vagrantfile。
108+
109+
[3D-R2N2](https://github.com/chrischoy/3D-R2N2)
110+
111+
使用递归神经网络的单/多视图图像进行体元重建。
112+
113+
114+
# 最新发布
115+
116+
[py.test 3.0](http://docs.pytest.org/en/latest/changelog.html) 
117+
118+
[PyDev 5.2.0](http://pydev.blogspot.com.br/2016/08/pydev-520-released-static-type.html)
119+
120+
121+
# 近期活动和网络研讨会*
122+
123+
[在线活动:Python生成器:将你的循环翻过来](https://www.crowdcast.io/e/generators/register)
124+
125+
我们会讨论Python中的生成器是什么,以及如何使用生成器来替代列表。我们还会讨论使用生成器来简化循环逻辑,以及如何使用生成器把while循环转换成for循环。
126+
127+
[SoCal Python 2016年八月聚会 - Los Angeles, CA](https://www.meetup.com/socalpython/events/233187442/)
128+
129+
将会有以下演讲:
130+
131+
* 利用Python魔术方法和延迟对象的力量
132+
* The Two Trains and Other Refactoring Analogies

Python Weekly/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,4 @@
1717
- [Issue 255](./Python Weekly Issue 255.md)
1818
- [Issue 256](./Python Weekly Issue 256.md)
1919
- [Issue 257](./Python Weekly Issue 257.md)
20+
- [Issue 258](./Python Weekly Issue 258.md)
Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
[Home](https://blog.modeanalytics.com/)
2+
[Product](https://about.modeanalytics.com/product/) [Data
3+
Sources](https://about.modeanalytics.com/data-sources/)
4+
[Customers](https://about.modeanalytics.com/customers/)
5+
[Company](https://about.modeanalytics.com/company/)
6+
[Jobs](https://about.modeanalytics.com/jobs/)
7+
[Resources](https://about.modeanalytics.com/resources/) [SQL
8+
School](http://sqlschool.modeanalytics.com)
9+
[Playbook](https://about.modeanalytics.com/playbook/) [Sign
10+
In](https://modeanalytics.com/signin)
11+
12+
[
13+
14+
](javascript://)
15+
16+
[ ![](https://about.modeanalytics.com/about/img/mode-logo.png)
17+
](https://modeanalytics.com)
18+
19+
[Product](https://about.modeanalytics.com/product/)
20+
[Pricing](https://about.modeanalytics.com/pricing/)
21+
[Community](https://community.modeanalytics.com/)
22+
23+
[ More ![](https://blog.modeanalytics.com/images/triangle.png)
24+
](javascript://)
25+
26+
[Data Sources](https://about.modeanalytics.com/data-sources/)
27+
[Customers](https://about.modeanalytics.com/customers/)
28+
[Company](https://about.modeanalytics.com/company/)
29+
[Jobs](https://about.modeanalytics.com/jobs/)
30+
[Blog](https://blog.modeanalytics.com) [Help](http://help.modeanalytics.com)
31+
32+
[Sign Up](http://modeanalytics.com/signup) [Sign
33+
In](http://modeanalytics.com/signin)
34+
35+
[Mode Blog](https://blog.modeanalytics.com/)
36+
37+
# [Handy Python Libraries for Formatting and Cleaning
38+
Data](https://blog.modeanalytics.com/python-data-cleaning-libraries/)
39+
40+
August 23, 2016 | [Melissa Bierly](http://www.twitter.com/melissa_bierly) --
41+
Content Marketing at Mode
42+
43+
The real world is messy, and so too is its data. So messy, that a [recent
44+
survey](http://visit.crowdflower.com/data-science-report.html) reported data
45+
scientists spend 60% of their time cleaning data. Unfortunately, 57% of them
46+
also find it to be the least enjoyable aspect of their job.
47+
48+
Cleaning data may be time-consuming, but lots of tools have cropped up to make
49+
this crucial duty a little more bearable. The Python community offers a host
50+
of libraries for making data orderly and legible—from styling DataFrames to
51+
anonymizing datasets.
52+
53+
Let us know which libraries you find useful—we're always looking to prioritize
54+
which libraries to add to [Mode Python
55+
Notebooks](https://about.modeanalytics.com/python/).
56+
57+
![Scrub that Data](https://blog.modeanalytics.com/images/post-images/python-
58+
data-cleaning-libraries.png) _Too bad cleaning isn't as fun for data
59+
scientists as it is for this little guy._
60+
61+
## Dora
62+
63+
Dora is designed for exploratory analysis; specifically, automating the most
64+
painful parts of it, like feature selection and extraction, visualization,
65+
and—you guessed it—data cleaning. Cleansing functions include:
66+
67+
* Reading data with missing and poorly scaled values
68+
* Imputing missing values
69+
* Scaling values of input variables
70+
71+
**Created by:** [Nathan Epstein](https://twitter.com/epstein_n)
72+
**Where to learn more:** <https://github.com/NathanEpstein/Dora>
73+
74+
## datacleaner
75+
76+
Surprise, surprise, datacleaner cleans your data—but only once it's in a
77+
[pandas DataFrame](https://community.modeanalytics.com/python/tutorial/pandas-
78+
dataframe/). From creator Randy Olson: “datacleaner is not magic, and it won't
79+
take an unorganized blob of text and automagically parse it out for you.”
80+
81+
It will, however, drop rows with missing values, replace missing values with
82+
the mode or median on a column-by-column basis, and encode non-numeric
83+
variables with numerical equivalents. This library is fairly new, but since
84+
DataFrames are fundamental to analysis in Python, it's worth checking out.
85+
86+
**Created by:** [Randy Olson](https://twitter.com/randal_olson)
87+
**Where to learn more:** <https://github.com/rhiever/datacleaner>
88+
89+
## PrettyPandas
90+
91+
DataFrames are powerful, but they don't produce the kind of tables you'd want
92+
to show your boss. PrettyPandas makes use of the [pandas Style
93+
API](http://pandas.pydata.org/pandas-docs/stable/style.html) to transform
94+
DataFrames into presentation-worthy tables. Create summaries, add styling, and
95+
format numbers, columns, and rows. Added bonus: robust, easy-to-read
96+
[documentation](http://prettypandas.readthedocs.io/en/latest/).
97+
98+
**Created by:** [Henry Hammond](https://twitter.com/henryhammond92)
99+
**Where to learn more:** <https://github.com/HHammond/PrettyPandas>
100+
101+
## tabulate
102+
103+
tabulate lets you print small, nice-looking tables with just one function
104+
call. It's handy for making tables more readable with column alignment by
105+
decimal, number formatting, headers, and more.
106+
107+
One of the coolest features is the ability to output data in a variety of
108+
formats like HTML, PHP, or Markdown Extra, so you can continue working with
109+
your tabular data in another tool or language.
110+
111+
**Created by:** Sergey Astanin
112+
**Where to learn more:** <https://pypi.python.org/pypi/tabulate>
113+
114+
## scrubadub
115+
116+
Data scientists in fields like healthcare and finance regularly have to
117+
anonymize datasets. scrubadub removes [personally identifiable information
118+
(PII)](https://en.wikipedia.org/wiki/Personally_identifiable_information) from
119+
free text, such as:
120+
121+
* Names (proper nouns)
122+
* Email addresses
123+
* URLs
124+
* Phone numbers
125+
* username/password combinations
126+
* Skype usernames
127+
* Social security numbers
128+
129+
The documentation does a good job of showing ways in which you might want to
130+
customize scrubadub's behavior, like defining new PII types or excluding
131+
certain kinds of PII from being scrubbed.
132+
133+
**Created by:** [Datascope Analytics](http://datascopeanalytics.com/)
134+
**Where to learn more:** <http://scrubadub.readthedocs.io/en/stable/index.html>
135+
136+
## Arrow
137+
138+
Let's be honest: working with dates and times in Python is a pain. Local
139+
timezones aren't automatically recognized. It takes several lines of
140+
unpleasant code to convert timezones and timestamps.
141+
142+
Arrow aims to fix these problems and plug functionality gaps to help you
143+
handle dates and times with less code and fewer imports. Unlike Python's
144+
standard library, Arrow is time-zone aware and UTC by default. You can convert
145+
timezones or parse strings using one line of code.
146+
147+
**Created by:** [Chris Smith](https://twitter.com/crsmithdev)
148+
**Where to learn more:** <http://arrow.readthedocs.io/en/latest/>
149+
150+
## Beautifier
151+
152+
Beautifier's mission is simple: clean and prettify URLs and email addresses.
153+
You can parse emails by domain and username; URLs by domain and parameters
154+
(e.g. UTMs or tokens).
155+
156+
**Created by:** [Sachin Philip Mathew](https://twitter.com/sachin_philip)
157+
**Where to learn more:** <https://github.com/sachinvettithanam/beautifier>
158+
159+
## ftfy
160+
161+
ftfy (fixes text for you) takes in bad Unicode outputs good Unicode.
162+
Basically, it fixes all the junk characters. `“quotesâ€\x9d` becomes
163+
`"quotes"`; `ü` becomes `ü`; `&lt;3` becomes `<3`. If you work with text on
164+
a daily basis, this library is, as one user says, “a handy piece of magic.”
165+
166+
**Created by:** [Luminoso](http://www.luminoso.com/)
167+
**Where to learn more:** <https://github.com/LuminosoInsight/python-ftfy>
168+
169+
## Further resources for wrangling data
170+
171+
Here are a couple of our favorite reads on munging/wrangling/cleansing data.
172+
173+
* [What every data scientist should know about data anonymization](https://github.com/krasch/presentations/blob/master/pydata_Berlin_2016.pdf) (Katharina Rasch)
174+
* [Cleaning data in Python](https://data.library.utoronto.ca/cleaning-data-python) (University of Toronto Map &amp; Data Library)
175+
* [Data Cleaning with Python - MoMA's Artwork Collection](https://www.dataquest.io/blog/data-cleaning-with-python/) (Dataquest)
176+
177+
### Recommended articles
178+
179+
* [Cohort Analysis That Helps You Look Ahead](https://blog.modeanalytics.com/cohort-analysis-helps-look-ahead/?utm_medium=recommended&utm_source=blog&utm_content=data_cleaning)
180+
* [10 Useful Python Data Visualization Libraries for Any Discipline](https://blog.modeanalytics.com/python-data-visualization-libraries/?utm_medium=recommended&utm_source=blog&utm_content=data_cleaning)
181+
* [Thinking in SQL vs Thinking in Python](https://blog.modeanalytics.com/learning-python-sql/?utm_medium=recommended&utm_source=blog&utm_content=data_cleaning)
182+
183+
Category: [Community](https://blog.modeanalytics.com/archive/community)
184+
185+
## Keep your finger on the pulse of analytics.
186+
187+
Each week we publish a roundup of the best analytics and data science content
188+
we can find. Sign up here:
189+
190+
Thanks! Keep an eye on your email for the next issue of the Analytics
191+
Dispatch!
192+
193+
Please enable JavaScript to view the [comments powered by
194+
Disqus.](https://disqus.com/?ref_noscript)
195+
196+
### Next Article
197+
198+
## [Analytics Dispatch 037: End the language
199+
war](https://blog.modeanalytics.com/analytics-dispatch-037/)
200+
201+
![](https://about.modeanalytics.com/about/img/mode-logo.png)
202+
203+
Product
204+
205+
[Overview](https://about.modeanalytics.com/product/)
206+
[SQL](https://about.modeanalytics.com/online-sql-editor/)
207+
[Python](https://about.modeanalytics.com/python/)
208+
[Reporting](https://about.modeanalytics.com/reporting/)
209+
[Pricing](https://about.modeanalytics.com/pricing/)
210+
[Customers](https://about.modeanalytics.com/customers/) [Data
211+
Sources](https://about.modeanalytics.com/data-sources/)
212+
[Security](https://about.modeanalytics.com/security/)
213+
214+
Resources
215+
216+
[Community](https://community.modeanalytics.com) [Learn
217+
SQL](https://community.modeanalytics.com/sql) [Learn
218+
Python](https://community.modeanalytics.com/python) [Open Source
219+
SQL](https://about.modeanalytics.com/playbook/) [Retention
220+
Analytics](https://about.modeanalytics.com/improving-retention-rates/) [CRM
221+
Analytics](https://about.modeanalytics.com/sales-analytics/) [Help +
222+
Support](http://help.modeanalytics.com)
223+
224+
Company
225+
226+
[About](https://about.modeanalytics.com/company/)
227+
[Careers](https://about.modeanalytics.com/jobs/)
228+
[Press](https://about.modeanalytics.com/press/)
229+
[Blog](http://blog.modeanalytics.com)
230+
231+
Contact Us
232+
233+
415-689-7436
234+
235+
208 Utah St. Suite 300
236+
237+
San Francisco CA 94103
238+
239+
[ ![Facebook](https://blog.modeanalytics.com/images/social-logos/facebook.png)
240+
](https://www.facebook.com/ModeAnalytics) [
241+
![Twitter](https://blog.modeanalytics.com/images/social-logos/twitter.png)
242+
](https://twitter.com/modeanalytics) [
243+
![LinkedIn](https://blog.modeanalytics.com/images/social-logos/linkedin.png)
244+
](https://www.linkedin.com/company/mode-analytics) [
245+
![GitHub](https://blog.modeanalytics.com/images/social-logos/github.png)
246+
](https://github.com/mode)
247+
248+
(C) Mode Analytics, Inc. 2015 [terms of
249+
service](https://about.modeanalytics.com/tos/) [privacy
250+
policy](https://about.modeanalytics.com/privacy/)
251+

0 commit comments

Comments
 (0)