We are Best in Data Services

we bring the proper people along to challenge esconsultupblished thinking and drive transformation.

Read More
Telecommunications
Network Performance, IPTV, LCD
Manufacturing
Electronics, Ship Building
Transportation
Scheduling, Performance

SERVICE WE PROVIDE

We are Data Service Experts

From data integration to Data Science, we have all data related services you may need.

Data and Application Integration

From Data Hub, ESB to Application orchestration, we bring the seamless harmony for enterprises

Read More

Data Warehousing and BI

Data Warehousing, Data Lake, Data Analytics and Business Intelligence

Read More

Data Sicence and AI

Data Science, Machine Learning and Artificial Intelligence

Read More

Data Platforms

Our Products

We provide matured and testified data service platforms to suite your latest and critical business needs. These products includes: Open Intelligent Data Platform, Customer 360 Analytics, Recommender System and Enterprise Data Hub etc.

TESTIMONIALS

Our Clients Says

Expert in Data Service Domain with the right solutions and the right cost!

Williams Moore
Director, BI and Analytics

Bring enterprise data from different silos together and seamlessly link them for a 360 views of our whole business, which has been hindering our business for a long time!

Our Blog

Latest News

Tech insights from the industry insiders

Data Lake, Data River and Data Droplets

I think you already have heard about data lakes. They used be called data directories. As you would expect, Data Rivers end up their “streams” in the lake. Here we go with data ponds:Connected Data Ponds: The Evolution of Data Lakes – HortonworksA lot has been said about Data Lakes over the past five years. The call to action from our industry to customers was to…hortonworks.com

Data ponds are subsets of data lakes that are separated for privacy (i.e. PII), governance, technology or costs.

Data droplets are the basic element. They describe information and dimensions about the subject. Here you can read more about these ontologies.

Then, we have data swamp. Larger organizations have this issue as a more severe one. The image below explains the differences:

Image for post
Image from: DatAvail.com

There are many reason behind a data swamp, below are a few:

  • No policy for the metadata, definition, or the process
  • Missing life-cycle for the data in the lake
  • No stakeholder in the organization for the data
  • Missing documentation about the preparation/usage process of the data

Bigger companies have started to find a solution for this issue. Metacat from Netflix help to understand the metadata in different services, or if you want to keep it simple with an user interface, CKAN data portal can help you manage and govern your data.

User-based Collaborative Filtering

Image for post

User-Based Collaborative Filtering (UB-CF)

Imagine that we want to recommend a movie to our friend Stanley. We could assume that similar people will have similar taste. Suppose that me and Stanley have seen the same movies, and we rated them all almost identically. But Stanley hasn’t seen ‘The Godfather: Part II’ and I didIf I love that movie, it sounds logical to think that he will too. With that, we have created an artificial rating based on our similarity.

Well, UB-CF uses that logic and recommends items by finding similar users to the active user (to whom we are trying to recommend a movie). A specific application of this is the user-based Nearest Neighbor algorithm. This algorithm needs two tasks:

1.Find the K-nearest neighbors (KNN) to the user a, using a similarity function w to measure the distance between each pair of users:

Image for post

2.Predict the rating that user a will give to all items the k neighbors have consumed but a has not. We Look for the item j with the best predicted rating.

In other words, we are creating a User-Item Matrix, predicting the ratings on items the active user has not see, based on the other similar users. This technique is memory-based.

Image for post

Filling the blanks
  • Easy to implement.
  • Context independent.
  • Compared to other techniques, such as content-based, it is more accurate.
  • Sparsity: The percentage of people who rate items is really low.
  • Scalability: The more K neighbors we consider (under a certain threshold), the better my classification should be. Nevertheless, the more users there are in the system, the greater the cost of finding the nearest K neighbors will be.
  • Cold-start: New users will have no to little information about them to be compared with other users.
  • New item: Just like the last point, new items will lack of ratings to create a solid ranking.

Trusted By Worldwide Businesses. Try Today!

looking For Professional Approach & Qaulity Services!