Saturday, September 13, 2014

Sensorial android app

 
Sensorial on Google Play

Ow, it's been like forever since i write my last post (too bad for me right).
Since it has been too long, don't think that i was doing nothing (actually play PS4 too some time of my life... :P).

Those last months were a blast! I learn java/android programming and thought: "Why not release some app on the Play Store?". So that what i did! This is my first blog post explain some of the core functions i use in my app, so it can help others.

So, let's start with an idea, you must have an idea to build an app, even if that idea has already been implemented for someone else right? The idea behind Sensorial is very simple: Give you data about sensors that we can find in our daily life.


You must thought, "IoT"[1] right? That's one path... why not. But since we are on the IT space, i learn that baby steps is a great thing, so im starting small. For now the app will give you some sensors information available using the hardware you have on your device and some collected over the internet using your geolocation to get the better value for you in that point of time and space.

So that's what is implemented right now, but the original idea was way more simple, my father just wanted to replace he's old app that give him values of speed, distance and altitude. Very simple right? So i implement some other features that in the end he found very usefull.

But enough of history, lets talk "techie".

In this first post im just listing the libraries used to build Sensorial.
The first one was ButterKnife[2], and what a great lib uhm? It takes out all the trouble of writing over and over the famous Android findViewById[3] method of the Activity.
ButterKnife was my choice because it looks very simple to use (but in the end Eclipse Bundle impose some difficult) and have some good documentation. If you don't know the library or know but never used give that a change, it's good.

Ok, said one lib right, just to be clear im using 7 libraries! You are think, "That's too much! It must be a performance killer!". Well... it's not that much actually, the performance of the app is good, and usage is not bad, some of those libraries have more performance that the original android code! You could try for yourself if want to, anyway, all those libraries add some high level interface on the programming make it easy to do some tasks.
The others libraries im using are the following:

  •  GSON[4]: Great lib to work with json format, since some of the internet services that im using to get sensor info return in json, it was a must have. If you look in the internet the performance is much better that the default android parser. Also you could check jackson, it claims to have better performance that GSON!
  • Http-Request[5]: The android code to make a http request is disgusting, this lib is great, it reminds me python request (which i find awesome!).
  • Joda Time[6]: if you work with date on Java you suffer whole life until Joda was created... :D it is excellent when it comes to parser and transform string into date objects and vice versa.
  • SugarOrm[7]: Awesome, i was checking for orm in android and found this precious lib... what it makes speciall? The simplicity... if you are looking for "Simple is better than Complex", use this and be happy. A plus is that the lib is active on github!!!!
  • ListViewAnimations[8]: This was a life saver for me... android has its ListView component, and im using to make the list of sensors in the main activity. But how one can implement some list special operations? Like Drag and Drop, Swipe to dismiss and stuff like that? Well, enter ListViewAnimations... this lib has all!! And some more! If you are using ListView, check this out and be a little more happy when coding for your users and improve their experiences!
All those libraries will help you a little more to build a great and easy to use app, you should check Sensorial to get a little idea what you could build.
Sensorial is a paid app, but im thinking in release a free version also, the problem in release a free version is that one of the service im using is a paid one, and if the app has too much requests to that service, the app could be down for a few hours...

Anyway, in the next posts i will explain some problems and solutions found to build Sensorial, and perhaps my next android app. 
Sensorial is far from finish its development life cycle, im have much more ideas to implement on it to leave there on the Play Store without updates!


Cheers!!!

[1]: Internet of Thins (IoT)
[2]: ButterKnife
[3]: findViewById
[4]: GSON
[5]: Http-Request
[6]: Joda Time
[7]: SugarORM
[8]: ListViewAnimations
[9]: Sensorial on Google Play

Monday, December 16, 2013

Video Reviewː "The Death of the universe - Renée Hlozek"


Interesting animated video about the possible way that our universe could and... nice to see and to learn a little more.

I like very much about the theory about dark matter, but something tells (call it gut felling) that in the end, the universe full with dark matter will colapse in itself, giving origin to a new big bang... 


Others videos worth watching are from Michio Kaku[1] and from the series "How the Universe Works"[2], documentaries from the Discovery Channel.
There are a lot of others videos and papers about it, and you could find even more scientific papers...

Explore!!

Monday, November 11, 2013

Web Scraping with Scrapy

Web Scraping with Scrapy
How to extract data from websites


Months without posting nothing, and than... BAM!!! Posting a lot of stuff, this is for you guys to see that im still doing stuff but the time to put all that in a blog post is limit, so let me take the opportunity given in the present time, and write...
The following months i’ve been using a lot the Scrapy Framework (those who follow me on twitter should realize that), and this article is about that... use the scrapy framework to extract relevant data.
Extract data means that we want to take unordered info from one (or more than one) website, parser that and use it for our wishs.
Wikipedia[1] has a good explanation about what i just said above:


“Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the World Wide Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding a fully-fledged web browser, such as Internet Explorer or Mozilla Firefox.
Web scraping is closely related to web indexing, which indexes information on the web using a bot or web crawler and is a universal technique adopted by most search engines. In contrast, web scraping focuses more on the transformation of unstructured data on the web, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. Web scraping is also related to web automation, which simulates human browsing using computer software. Uses of web scraping include online price comparison, contact scraping, weather data monitoring, website change detection, research, web mashup and web data integration.” - from Wikipedia

Well, there’s a lot of tools we can use to extract data from websites, but i find scrapy very good and easy one.

What’s Scrapy[2]?
It’s a framework. A framework that’ll help you extract data and others end. It’s written in python[3], which means it’s far from great!! hahahahaaha :D
Scrapy there a lot of clients[4] already using it, which makes the framework even more tested, check the list on their website. One client use scrapy for Data Mining[5], this is the stuff you could do, or perhaps just scrap the photos of one site you want to!

How to use...
To use scrapy there’s some basics stuff to do, for the simplicity of this article, im using GNU/Linux, in particular Linux Mint 15[6].
Since linux is great! Python is pre-install on the Mint distro, so, no need to run anykind of installation procedure.
Scrapy is a third party framework, so we need to install, i recommend the use of pip[7] to install python packages. If you don’t know pip, take a few minutes to understand how it works, and the wonderfull it can do for you.
To install pip on the system (if already haven’t install) use synaptic to search for it’s package, after the installation type on the shell to install Scrapy:

~ $ pip install scrapy

This way, the framework will install and be ready to use. BeautifulSoup4 is another great tool to handle HTML, it can parse documents and access items in easy way, good tool to make some pos-processing on items that Scrapy collect and record on a database.
To install it type:

~ $ pip install beautifulsoup4

For fast example, i’ll do the scrapping of the news from the website of PMC[9] (Prefeitura Municipal de Campinas).
A default scrapy project will be created to do the job, you can find info about this in the documentation of the Scrapy[10].
To start a new Scrapy project do:

~$ scrapy startproject noticias_pmc

A estructure will be created on the folder that you execute the command (in this case a folder is created in the root directory of the default logged user).

  • noticias_pmc/
    • scrapy.cfg
    • noticias_pmc/
      • __init__.py
      • items.py
      • pipelines.py
      • settings.py
      • spiders/
        • __init__.py

scrapy.cfg: the project configuration file
noticias_pmc/: the project’s python module, you’ll later import your code from here.
noticias_pmc/items.py: the project’s items file.
noticias_pmc/pipelines.py: the project’s pipelines file.
noticias_pmc/settings.py: the project’s settings file.
noticias_pmc/spiders/: a directory where you’ll later put your spiders.”

First step is define a estructure item (the information the we want to extract and put on use). Open the file noticias_pmc/items.py:

<code>
from scrapy.item import Item, Field

class NewsPmcItem(Item):
title = Field()
data = Field()
text = Field()
image_urls = Field()
images = Field()
</code>

Done! Now let’s build out spider! To do data, inside the folder noticias_pmc/spiders/ created a file named NewsPMCSpider.py. Noticed that Scrapy use the xpath[11] syntax to locate elements inside the parser HTML, another library’s, as the installed BeautifulSoup4 can use others means to access those elements.

<code>
# -*- coding: utf-8 -*-

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from noticias_pmc.items import NoticiasPmcItem
import urlparse

# Class, CrawlSpider is the super class (in python we do this way)
class NewsPMCSpider(CrawlSpider):
# name our spider
name = 'noticias_pmc'
# allowed domains, we don’t want the spider to read the entire web, do we??
allowed_domains = ['campinas.sp.gov.br']
# wich url we should start the read
# Rules, which urls format we sould read, the callback that will parse the response and follow to tell our spider to keep going to another urls!
rules = (
# Extract links and parse them with the spiders method parse_item
Rule(SgmlLinkExtractor(allow=['http://campinas.sp.gov.br/noticias.php', 'http://campinas.sp.gov.br/noticias-integra.php']), callback='parse_item', follow=True),
)

# This do all the work
def parse_item(self, response):
# Create a news items!
item = NewsPmcItem()
# Parse the response of the server, so we can access the elements
hxs = HtmlXPathSelector(response)
# xpath to find and get the elments, ah! we want only the string of the text here (no html tags!)
titulo = hxs.select('//div[@class="itens"]/h3').select('string()').extract()
# If there’s a title, it may be a valide news!
if titulo:
# Get the news date
data = hxs.select('//div[@class="itens"]/p[@class="data"]').select('string()').extract()
# The body text
texto = hxs.select('//div[@class="itens"]/p[@align="justify"]').select('string()').extract()
# Clean up
item['titulo'] = titulo[0].strip()
item['data'] = data[0].strip()
item['texto'] = "".join(texto).strip()
# Make the parser of images that scrapy will save automatic on the folder defined on settings.py
item['image_urls'] = self.parse_imagens(response.url, hxs.select('//div[@id="sideRight"]/p/a/img'))
return item

def parse_imagens(self, url, imagens):
image_urls = []
for imagem in imagens:
try:
# Image path
src = imagem.select('@src').extract()[0]
# If it is a relative path we must put the prefix http://www.campinas.sp.gov.br before the link
if 'http' not in src:
src = urlparse.urljoin(url, src.strip())
image_urls.append(src)
except:
pass
return image_urls
</code>

Before running our spider, we must change two other files, the settings.py and pipelines.py.
Add the following line on settings.py (no matter where):

<code>
# Nome da classe no arquivo de pipilines que irá fazer o parser das imagens
ITEM_PIPELINES = ['noticias_pmc.pipelines.MyImagesPipeline', ]
# O diretório no qual as imagens serão armazenadas
IMAGES_STORE = '<caminho interno>/noticias_pmc/images'
</code>

And on pipelines.py paste the MyImagesPipeline class:

<code>
from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.http import Request

class MyImagesPipeline(ImagesPipeline):

def get_media_requests(self, item, info):
try:
if item['image_urls']:
for image_url in item['image_urls']:
yield Request(image_url)
except:
pass

def item_completed(self, results, item, info):
item['image_urls'] = [{'url': x['url'], 'path': x['path']} for ok, x in results if ok]
return item
</code>

Done again! Let’s now run our spider an see the results. When you run scrapy will show you the url read and the values catch and put on the items class the pages found!
Inside the Scrapy project folder, type:

~$ scrapy crawl noticias_pmc

Look!!! a spider on the web.... hahahahahahahhahahaha

If you have some problem using Scrapy, leave a message, if i could help, i will!!! There’s much more on Scrapy documentation take a minute (more than one) and read it!!!


[2] Scrapy: http://scrapy.org/
[4] Companies using Scrapy: http://scrapy.org/companies/
[6] Linux Mint: http://www.linuxmint.com/