tall eye

the higher the point of view, the more you can see

0 notes &

The 4 basic flows of HTTP Caching

Every web developer probably had used cache one day in their web apps or APIs to avoid redundant data traffic, network bottlenecks, protect your server from load spikes or simply long network latencies. The concept of caching is usually well understood and easily applicable in practice thanks to open source tools. However, to build a good cache strategy, i.e. a strategy that defines what can be cached, how long can you cache something and what policy you’ll follow after some resource is staled, is a hard and incremental process that need:

  1. knowledge of how your resources are consumed by users;
  2. understanding of how HTTP caching protocol works (HTTP headers everywhere);
  3. patience to solve problems when tools don’t honor the protocol (believe me, this is very common)

The first one is up to you, obviously. The third one is an inherently issue of every computer system and you should be used to that too. Time and experience will help you to deal with this burden, but there’s always mailing lists and web search to save the day.

This post is an illustrative contribution to item 2 and presents the four basic flows of HTTP caching that can be implicitly extracted from the RFC. These flows shows how clients, server and caching software will behave depending on the status of the resource being requested and its location in the topology. And by “basic” I mean that they could vary depending on the cache strategy, so use them as a starting point. Let’s go!

Cache Miss

Cache Hit

Insteresting fact: this animation length is half of the “Cache miss” length. :-)

Cache Revalidation (Condition False)

Cache Revalidation (Condition True)


If you are waiting for a detailed explanation of each flow, I’m sorry, I won’t do that. It will be much more interesting to learn by yourself. The only thing I will mention is that HTTP headers are the elements that give this control power over the resources.

This content was included in a talk I presented at Rubyconf Brazil 2013 about HTTP Caching (intro and good practices). The slides are available at Slideshare. You can also access the 4 four basic flows as a tweet list.

Filed under http cache caching rubyconfbr flow vine

0 notes &

An easy way to build an autocomplete search for API endpoint docs

There are some APIs that we use often and having easy and fast access to them can save us from repetitive tasks. You know, as software engineers, we love to automate micro tasks to save that precious seconds of our lives :-).

The solution described in this post is simple, but uses a lot of tools and services. If you have patience, the final result is very useful. Here’s the list of things we use:

  • APIfy to scrape the docs page (Scrapify is used by APIfy)
  • XCPath Bookmarklet to don’t make us learn XPath again
  • curl to retrieve the json version of the API docs
  • jq to scrape this json and generate XML \o/
  • grep to filter based on our search query
  • Alfred workflows to implement autocomplete (Powerpack needed)

There are some very known tools in this list, and maybe some new to you. My recommendation is to keep attention and perceive the real value of APIfy and jq. These are really useful tools that can be used for a lot of other applications. Of course the other ones are interesting as well.

My wish was: be able to trigger a search in Alfred that shows me a list of Twitter REST API endpoints and when hitting enter, the browser would open the API endpoint doc.

The result was:

#productivity #win

Getting the data we need

APIfy is a really great tool to easily scrape an HTML page and exposes the relevant content as a json representation. The website has very descriptive tutorials that show how to create an API, I recommend them. But the process it’s uses is very simple:

For each field your API returns:

  • Use XCPath Bookmarklet to get the XPATH or CSS of the element value you want;

image

  • Do a mapping between the field name and value in APIfy editor;

image

You can check the brand new API created to expose all Twitter API endpoints as json at TwitterRESTAPIv1_1 APIfy page. And of course, check the json representation returned by the API.

It’s very very useful. Make sure to check other APIs and how they are done, or try the Scrapify gem used to build APIfy.

Converting and filtering the data

Now that we have a json representation with each API endpoint, description and link to the docs, we can manipulate this json to be in the format needed to the next step in our flow.

jq, as website says, it is a lightweight and flexible command-line JSON processor. It’s a mandatory tool if you deal with a lot of json APIs, since it could transform raw API responses to beautified colorful representations and also allow you to filter, process, transform the data the way you want it. There’s a very good tutorial on the website.

In our case, we need to convert the json to XML… XML??? Why?!?

Well, this is the format chosen by Alfred to show autocomplete results. Basically, anytime a user trigger the workflow, a script is executed with the query and the response should be a XML with items that match the query and have other metadata such as what URL should be open if the user choose that item.

Let’s show the jq command we use:

And now let’s break it in pieces to understand what it does.

Surrounding the commands are two echo commands to add the root XML element needed to be input in Alfred. The second command is the request to the API we created that is pipelined to jq.

The jq command iterates over all entries in the response and do a string interpolation setting the resource of the endpoint and the link to it to some xml values and properties. It’s important to note that jq isn’t just to convert json to XML, it has a very powerful query language that enable you to do practically anything you want with your json. Check their tutorial.

The XML generated by jq is pipelined to a grep command, that filters for “search” using a regular expression. The final result is the following XML (already filtered by endpoints that matches the “search” keyword):

This is the input Alfred needs to show the items in autocomplete, as we saw in the beginning of this post. After testing the command in the terminal, then we need to use a Script Filter module of Alfred Workflows to run that code after triggering the keyword:

Note that instead of doing the request everytime, I just saved the json on my computer since I’m not expecting that the docs will be changed so frequently, and also because I don’t want to abuse the APIfy service.

The final version of the workflow will look like this:

That’s it! Now we have everything working:

image

This combination of tools could help you to build any search to your most used APIs. And of course that there are some minor improvements to the command showed above, but the beauty of this is what we could accomplish with a short period of time thanks to these awesome tools.

Filed under alfred jq xpath json grep curl twitter api autocomplete search productivity

1 note &

Open Friday Hacks of March, 2013

Post written by me at Engineering blog.

abril-engineering:

by Luis Cipriani

Every first Friday of each month at Abril Mídia Digital is Open Friday, a day the teams have to turn ideas into real projects, to study and practice new technologies, to hack internal products and to integrate with other teams. It’s a day without restrictions of scope, anyone can do anything, but all knowledge must be shared with co-workers or people outside the company.

Open Friday happens since 2010 and we had several projects that nowadays are used in production, or got open sourced code, or was presented in conferences. By far, we are having a very good experience with the results from these days.

From now on we hope to share each month the results from it. The Monday mourning right after the Open Friday, the teams gather in a room to share the created initiatives. So let’s share some of the 21 projects done in February, 1st, that were presented today for all teams:

  • will.js: a framework to create web components and asset on-demand loading;
  • ralio: a command line interface for Rally management software;
  • Generic API: a generic resource domain to temporarily fulfill some very specific Platform requirements;
  • A recommendation system for IBA;
  • Fosformol: A Scrum Retrospective organizer;
  • Several performance tunnings or benchmarks (Lua + nginx, Hot Engine);
  • A POC of a responsive design interface for vejasp.abril.com.br
  • A parser for Semantic Search;
  • A mobile app to ease project management with Rally;
  • A POC for image search with Lire;
  • Improvements in Ruby course slides.

image

Above is a picture of the today’s lighting talks.

You can see that some projects already are available as an open source project, so fell free to contribute. If you are interested in any other project that wasn’t shared with public code, please leave your comment and we can start a conversation with the project owner.

Stay tuned to this blog to see next month’s projects.

Filed under openfriday hackday hacks abril tech

1 note &

Spreading the knowledge with our experiences with REST

Post written by me at Abril engineering blog

abril-engineering:

by Luis Cipriani

For three years we have been developing what we believe is the Content Management System that could fit the enormous list of requirements Abril Mídia need to fulfill to achieve goals such as creating value from the content produced by the publishing houses; speeding up the process of online publishing and enabling a fast adoption of user interaction solutions in our Internet products.

This high diversity of requirements (coming from more than 60 publishing houses) has a relevant impact in the complexity of the systems architecture. We had previous experiences showing that any monolithic or short-term solutions were not a good idea, the only plausible solution would be to change the mindset to a long-term solution and dive in a process to find a more adequate solution.

We came up with a solution that rely a lot on REST principles and constraints, started the implementation and continually reviewed the solution strategy.

In August, 2012, we gave a talk at QConSP conference, in Sao Paulo, to share the implemented solution and lessons learned. Furthermore, the talk present a complete review of the REST constraints and how we applied it; and I believe that study cases presentations are the best to learn complex or common misunderstood concepts in conferences.

The talk had a very good feedback from the conference attendants, the room became smaller for the crowd attending the session and since a lot of people weren’t able to view the talk, the conference organizers asked us to do another session, later in the same day. And that session was filmed by the folks at InfoQ Brazil (sorry English readers/listeners, but this one is in Portuguese).

Shame on me to take a long time to write a post about it here, but the good news are that Luiz Rocha and I sent a chapter proposal (now in English, and with more details about the solution) to the next REST book (REST: Advanced Research Topics and Practical Applications), published by Springer. At the moment I am writing this post, the proposed chapter is conditionally accepted (we just need to fix some simple review points). \o/ The estimated date of publishing is Summer 2013.

Follow us for updates and posts showing more and more how we do our stuff at Abril and please, share your comments.

2 notes &

Explaining Semantic Web

Post written by me at Abril Engineering blog.

abril-engineering:

Few weeks ago I gave an introduction to Semantic Web at the Sao Paulo Ruby User Group (GURU, in portuguese). Since Semantic Web is a very broader term that entail tons of technologies and tools, I decided to focus more on explaining the single main reason that make companies apply it in their projects:

Data Integration

We, as software engineers and product managers, face a lot of problems, specially in big companies, when we need to integrate data coming from several sources. Anytime I needed to do this, I realize that I kept rumbling to myself: “Why we don’t have only one universal metadata for our data? How we let this happen?”

The big triumph provided by the Semantic Web set of technologies, specifications and tools is to enable you, as owner of highly variable data and metadata, to organize it in a way that you can find any information and derive knowledge as easily as a SQL-like query. This is possible because we start to represent the data in the simplest way possible: a triple (subject -> predicate -> object).

For example:

  • Abril_Engineering_blog  >  is_owned_by  >  Abril
  • Abril_Engineering_blog  >  is_hosted_by  >  Tumblr
  • Luis_Cipriani  >  work_at  >  Abril

By establishing relationships with all your data based on a controlled and known metadata (better known as ontology), you create the most flexible format for representing a data, and from this point, you can derive anything the quality of your data would allow.

At Abril we have a perfect environment to use this strategy for data integration, since we produce a lot of content that we are able to extract structure, such as, people, events, venues, articles, user comments, etc. Some projects are on their way and others are about to start. We hope to be talking about it sooner.

Meanwhile, check the presentation slides below, it has more detail about the technologies and cases:

Filed under semantic web data integration abril bigdata

1 note &

Learning with Tech Cine

Post written by me at Abril Engineering blog:

abril-engineering:

by Luis Cipriani

As software developers, we are always looking for new technologies and teaching ourselves on the latest techniques and ways to improve the quality or productivity of our work. Thanks to the Internet and the contributions of millions, it’s very easy to find good sources of content freely available, or even greater content at viable prices.

But sometimes we do this by ourselves and often we don’t have time or people to discuss what we learned, to evaluate how we can apply the technique in a practical way. 

Given this, we decided to spread the knowledge in our office, and the result is Tech Cine!

Every Thursday morning, we join all the developers together in the middle of the office to watch a tech video (no more than 50 minutes) and then we reserve 15 to 30 minutes after the video to discuss:

  • the main/relevant points of interest
  • if the technique worth the investment
  • if we have similar cases inside company
  • how could we apply that technology in our projects

The first 2 Tech Cines discussions were very productive, with a good participation of developers. We started with the episode 1 of Clean Code video series and the second was a video from 2012 Strata Conference in Santa Clara. 

Another thing that we did that proved to be interesting is that since we can’t send every developer to international conferences, we bought the whole video collection at the conference website, and now everybody can have access to the conference rich content! \o/

Filed under office learning knowldge