Using Deep Learning to heal people suffering from cancer

DL is cool

Sometimes, we are happily using Deep Learning for futiles things like generating faces or changing horses into zebras. But most of the time, it’s a powerful tool that can help saving lives.

At the INSA of Rouen, I worked in a team of student implementing a solution based on an article published by researchers, some of them being my teachers. The article is called IODA: An input/output deep architecture for image labeling and was written by Julien Lerouge, Romain Herault, Clément Chatelain, Fabrice Jardin and Romain Modzelewski. Image labeling is the act of determining zones in an image and saying : ‘this zone corresponds to the sky’ or ‘this zone corresponds to a pedestrian’. But what’s fantastic with their work is that it also does image segmentation (it also detects where are the frontiers of the zones).

Example of image segmentation

Example of image segmentation

Read more

Parallelizing queries in PostgreSQL with Python

At Geoblink we run more than 20000 queries to generate just one of our several ~100Gb PostgreSQL databases from scratch from our raw data files. If we try to run them in sequential order, the database generation would take too much time. In order to reduce the generation time we parallelize several queries. Doing that by hand would be impossible so we use a nice script written in Python to generate and run the queries.

In this post I will show an example of how to do it in Python. Read more

PostgreSQL: Foreign keys with condition ON UPDATE CASCADE

Foreign keys are a key feature in Relational Databases, ensuring integrity and coherence of data. They allow doing transactions ON CASCADE, which means that changes on the primary key/unique constraint they reference is also applied. This has many advantages as the complexity of the database grows.

However, there might be cases when using ON CASCADE is risky because you can lose the track of what’s actually being changed (specially when deleting). So in general is a good practice for updates, but one must be careful to use it in some cases, especially for deletes. Maybe we created a demo account and we don’t want to allow non-expert users to delete the user account, due to the lose of all relative data.

In this post we are going to compare different alternatives to the ON CASCADE constraint and their performances.

Read more

Working with Graph Databases at Geoblink

Nowadays a lot of companies choose graph databases to save a lot of information, but what kind of information?

Graph databases are great to save relationships, and they are very fast at calculating how the different elements inside are related. A very good example could be social networks, or a family structure. In these cases we have people as “nodes”, and  how the people are related as “relationships”. So storing this in a graph database is easy, right?

When we work with tons of information, the first step is to make a decision about which graph database to use. There are a lot, but one of the most popular ones is Neo4j. With Neo4j we are able to build a big data system because we can build clusters with all our information, and the relationship’s structure.

The skeleton of a graph database are nodes and relationships, so the most important thing is to be very clear about how the information has to be saved. We can save many types of nodes and the same with the relationships, so the type of nodes and types of relationships will be “labels”.

GraphDB
Read more

No SQL Tech Talk: distributed databases for a distributed world

We recently gave a Tech Talk at Google Campus Madrid about No SQL databases covering different aspects in a general way, like their history, what problems they fix and the different types you can group them by.

More specifically, we talked about:

  • The very origin of modern databases, the IMS system by IBM.
  • The first relational databases and how they shaped the Web 1.0.
  • The growing need for distributed databases that were fast and consistent in the Web 2.0.
  • The CAP theorem.
  • Categories of No SQL databases:
    • Key-value stores (Redis, Riak)
    • Column stores (HBase, Cassandra)
    • Document stores (MongoDB, CouchDB)
    • Graph databases (Neo4j, FlockDB)
    • Search Engines (strictly not No SQL but kind of related, like Solr and ElasticSearch)

You can find the video of the talk (in Spanish) here and the slides (in English) here.

Using Machine Learning at Geoblink to improve the quality of our database

One of the major problems we are dealing with at Geoblink is spotting the duplicated information (point of interest or any other location) on our map. Turns out, we do have something in common with Google Maps…

Just like Google Maps, we want to keep our map up-to-date with the latest information. However, due to the variety of sources of data we use, we sometimes encounter that the same physical point is represented on our map by two or more markers. One of the reasons behind that is how creative people often get when it comes to formatting an address: addresses coming from different data sources might be formatted very differently, so sometimes it becomes extremely difficult to tell whether two different addresses represent the same physical point or not.

Read more

Geoblink named one of the top 50 Startups worldwide 2017

We were recently included in a list put together by Bloomberg with the top 50 most promising unheard startups for this year, under the category Artificial Intelligence/Geospatial business intelligence.

https://www.bloomberg.com/graphics/2017-fifty-best-startups/

The list shows startups from other sectors that are getting a lot of attention these days like Augmented Reality, Drones or Autonomous Driving.

We are very proud to have been included in this list by such a prestigious publication! This powers up our will to create one of the best Geospatial product in the industry.

 

Screen Shot 2017-05-18 at 20.59.24

 

 

Working in a SaaS product department

Speed is the key of success in a startup like Geoblink: we have to attract new clients and we have to make current clients happy as we have to stand out among emerging competitors. So a fast process to develop new features is needed.

Agile methodology

We lean on Agile methodology to assure fast development of every task, which is an alternative to traditional project management. We use Scrum as way of applying Agile methodology. Every company does this differently, at Geoblink we use this framework:

  • Sprint: our unit of time is called sprint. A sprint lasts two weeks, and during every sprint we have some meetings often seen in Agile methodology.
  • Daily stand up: the Core and Data teams meet up during 15 min every morning so developers quickly go over what they did yesterday, what are they doing today. Dependencies are also shared to avoid blocking anyone of the team.
  • Sprint status: at the middle of the sprint we meet to agree which tasks will be completed before the end of the sprint or not.
  • Sprint preparation: before the end of the sprint, the leads of each team (Core, Data and Product) meet up in order to define which tasks should be done during the next sprint.
  • Company Demo: due to the high rate of changes that the product suffers, it is very important to communicate them to the rest of the team.
  • Retrospective: at the end of every sprint the Tech team talks about what went well, what did not and what should be improved. This is very useful to avoid repeating failures or inefficiencies.

Read more

Proactive monitoring with Monit + Sengrid + Slack

The monitoring of resources is something essential to have visibility into the health of your system. There are many software solutions out there and one of my favourites is Nagios, but it requires an investment of time and knowledge.
When it comes to finding an agile and flexible solution, I would choose Monit. The syntax of configuration is easy and does not require a complex setup to have it running.

What is Monit?

Monit is a small Open Source utility for managing and monitoring Unix systems.

What can Monit do?

When it detects a problem it can send you alerts (as most solutions do). But this is not the most important thing. Monit can act if an error situation occurs and can restart services, execute custom scripts, etc. This makes Monit a proactive monitoring tool and therefore has always been among my favorite tools.

With Monit you get out of the box:

  • Automatic email alerts at event triggers
  • Automatic process maintenance
  • Capability to act on out-of-bounds values for CPU, RAM, storage and more
  • Monitoring of running services, and the ability to start, kill or restart them
  • Web and CLI interfaces for status monitoring

This post does not aim at covering everything that can be done with Monit. The official documentation is enough and there are numerous sites to extend this information, so I will focus on how we use it here at Geoblink.

Read more

React Native or going mobile without knowing Swift

It’s no secret that we love Hackathons here at Geoblink. During the 9 months that I’ve been part of the team, I’ve had the chance to participate in two of them. For me, the main advantage in having these kind of events in-house is that I get to try new technologies, frameworks or tools that I’ve been craving for a while. That was the case in the last one we did, when a coworker and I had the chance to test React Native, the Facebook’s library based on React that wants to “bring modern web techniques to mobile”.

This isn’t supposed to be an in-depth tutorial on how to build an app using React Native, but a brief explanation on how this library could help us (as well as you and your team) to build robust mobile applications when your stack is purely based in web technologies, Javascript in particular.

Read more