## Sunday, November 22, 2015

### Elsevier: Machine Learning Content Discoverability

## Tuesday, November 17, 2015

### Something cool made with our API

Very cool

blog.sciencedirect.com/posts/reach-for-the-stars-how-one-developer-uses-sciencedirect-apis-to-achieve-more-for-nasa

For 20 years, the Smithsonian/NASA Astrophysics Data System (ADS) has kept all professional astronomers worldwide up-to-date via their digital library of 12 million records which provides links to ScienceDirect and other platforms for full-text retrieval. The ADS maintains relationships with all major publishers and offers users access to four million full-text article links with some of those links originating in 40 full-text Elsevier journals on ScienceDirect.

In order to increase visibility of - and encourage linking to – their subscribed full-text (especially articles written by NASA researchers), NASA had the idea to add thumbnails of graphics appearing within the article to the abstract view of a publication. To do this, they turned to the ScienceDirect Object retrieval and Object search APIs to mine the images and then linked them to the corresponding articles on ScienceDirect. Until now, the ADS has been able to implement this feature for 32,000 publications.

####
A view of the ADS abstract page

####
A view of of the ADS graphics page with thumbnails linking to the full-text of the article

The redesigned ADS remains in beta release and can be easily accessed while more infornation about the ADS in general, is also available.

####
Example of ScienceDirect article page with images

ScienceDirect APIs are designed to help developers retrieve and integrate full-text content from publications on ScienceDirect into their websites or applications. Visit the ScienceDirect API page to learn more, watch videos and get started.

blog.sciencedirect.com/posts/reach-for-the-stars-how-one-developer-uses-sciencedirect-apis-to-achieve-more-for-nasa

For 20 years, the Smithsonian/NASA Astrophysics Data System (ADS) has kept all professional astronomers worldwide up-to-date via their digital library of 12 million records which provides links to ScienceDirect and other platforms for full-text retrieval. The ADS maintains relationships with all major publishers and offers users access to four million full-text article links with some of those links originating in 40 full-text Elsevier journals on ScienceDirect.

In order to increase visibility of - and encourage linking to – their subscribed full-text (especially articles written by NASA researchers), NASA had the idea to add thumbnails of graphics appearing within the article to the abstract view of a publication. To do this, they turned to the ScienceDirect Object retrieval and Object search APIs to mine the images and then linked them to the corresponding articles on ScienceDirect. Until now, the ADS has been able to implement this feature for 32,000 publications.

####
A view of the ADS abstract page

####
A view of of the ADS graphics page with thumbnails linking to the full-text of the article

“My experience with the ScienceDirect API was exemplary. A well-designed API with a very efficient and friendly support team to back it up!”

- Edwin Henneken, IT Specialist for the Smithsonian/NASA Astrophysics Data System, employed at the Smithsonian Astrophysical Observatory in Cambridge, Massachusetts.

The redesigned ADS remains in beta release and can be easily accessed while more infornation about the ADS in general, is also available.

####
Example of ScienceDirect article page with images

ScienceDirect APIs are designed to help developers retrieve and integrate full-text content from publications on ScienceDirect into their websites or applications. Visit the ScienceDirect API page to learn more, watch videos and get started.## Friday, November 13, 2015

### Boson Higgs

According wikipedia "

However, scholar returns results

.

ScienceDirect returns fresher and relevant results,

**On 4 July 2012,**the discovery of a new particle with a mass between 125 an was announced; physicists suspected that it was the Higgs boson"However, scholar returns results

**from 1960 and 1990, which is 22 years**before the Scientific discovery. One result is from Elsevier.

ScienceDirect returns fresher and relevant results,

## Thursday, November 12, 2015

### Instantaneous Recommendation: real time suggestions for your Academic Library

One of my most favorite features shipped during the last round is a form of instantaneous recommendations. This feature suggests in real time new relevant papers as soon as my library is updated.

So suppose that I add a few papers about deep learning to my library and that this is the first time I have papers about this research topic in my library.

The suggestions are immediately updated and I see papers about Deep Neural Networks for speech recognition, Convolutional Networks, and LCVSR

and relevant papers published by Yann LeCun

I believe that this feature is useful to explore a subject if you are not familiar with the topic, and to make sure that your next paper has a solid "Related Works" section where the most important papers for your research activity are mentioned.

So suppose that I add a few papers about deep learning to my library and that this is the first time I have papers about this research topic in my library.

The suggestions are immediately updated and I see papers about Deep Neural Networks for speech recognition, Convolutional Networks, and LCVSR

and relevant papers published by Yann LeCun

I believe that this feature is useful to explore a subject if you are not familiar with the topic, and to make sure that your next paper has a solid "Related Works" section where the most important papers for your research activity are mentioned.

## Wednesday, November 11, 2015

### Stats is bigdata

**Feature: Stats**

If you are a published author, Mendeley’s “Stats” feature provides you with a unique, aggregated view of how your published articles are performing in terms of citations, Mendeley sharing, and (depending on who your article was published with) downloads/views. You can also drill down into each of your published articles to see the statistics on each item you have published. This powerful tool allows you to see how your work is being used by the scientific community, using data from a number of sources including Mendeley, Scopus, NewsFlo, and ScienceDirect.

Stats gives you an aggregated view on the performance of your publications, including metrics such as citations, Mendeley readership and group activity, academic discipline and status of your readers, as well as any mentions in the news media – helping you to understand and evaluate the impact of your published work. With our integration with ScienceDirect, you can find information on views (PDF and HMTL downloads), search terms used to get to your article, geographic distribution of your readership, and links to various source data providers.

Please keep in mind that Stats are only available for some published authors whose works are listed in the Scopus citation database. To find out if your articles are included, just visit www.mendeley.com/stats and begin the process of claiming your Scopus author profile. If not, please be patient as we work further on this feature.

## Tuesday, November 10, 2015

### Satisfying the exploratory search needs : poster query {dyscalculia}

**{dyscalculia}**is severe difficulty in making arithmetical calculations, as a result of brain disorder.This is scientific term for a cognitive problem associated to 3%-6% of the world population. Therefore, many people are interested in better understanding the topic.

Google Scholar returns Elsevier content from

__1992 and 1985 and from Wiley 1996.__

ScienceDirect finds fresh Elsevier's content for

**{dyscalculia}**including books, and articles. All the results are from 2015, and 2016 (pre-print)

## Monday, November 9, 2015

### New research features on Mendeley.com - Recommends

(posted in http://blog.mendeley.com/academic-features/new-research-features-on-mendeley-com/

Mendeley’s Data Science team have been working to crack one of the hardest “big data” problems of all: How to recommend interesting articles that users might want to read? For the past six months they have been working to integrate 6 large data sets from 3 different platforms to create the basis for a recommender system. These data sets often contain tens of millions of records each, and represent different dimensions which can all be applied to the problem of understanding what a user is looking for, and providing them with a high-quality set of recommendations.

With the (quite literally) massive base data set in place, the team then tested over 50 different recommender algorithms against a “gold standard” (which was itself revised five times for the best possible accuracy). Over 500 experiments have been done to tweak our algorithms so they can deliver the best possible recommendations. The basic principle is to combine our vast knowledge of what users are storing in their Mendeley libraries, combined with the richness of the citation graph (courtesy of Scopus), with a predictive model that can be validated against what users actually did. The end result is a tailored set of recommendations for each user who has a minimum threshold of documents in their library.

We are happy to report that two successive rounds of qualitative user testing have indicated that 80% of our test users rated the quality of their tailored recommendations as “Very good” (43%) or “Good” (37%), which gives us confidence that the vast majority of Mendeley reference management users will receive high-quality recommendations that will save them time in discovering important papers they should be reading.

For those who are new to Mendeley, we have made it easy for you to get started and import your documents – simply drag-and-drop your papers, and get high-quality recommendations.

On our new “Suggest” page you’ll be getting improved article suggestions, driven by four different recommendation algorithms to support different scientific needs:

*Popular in your discipline*– Shows you the seminal works, for all time, in your field*Trending in your discipline*– Shows you what articles are popular right now in your discipline*Based on the last document in your library*– Gives you articles similar to the one you just added*Based on all the documents in your library*– Provides the most tailored set of recommended articles by comparing the contents of your library with the contents of all other users on Mendeley.

Suggestions you receive will be frequently recalculated and tailored to you based on the contents of your library, making sure that there is always something new for you to discover. This is no insignificant task, as we are calculated over 25 million new recommendations with each iteration. This means that even if you don’t add new documents to your library, you will still get new recommendations based on the activity of other Mendeley users with libraries similar to yours.

To find your recommended articles, check out www.mendeley.com/suggest and begin the discover new papers in your field!

## Sunday, November 8, 2015

### Academic Search and Relevance: basic normalization for matching

One more post about Academic Search and Relevance. This time around is back to the basics: there is little you can do for relevance if you do not match the article first. In order to do so, you need to assume that users will make mistakes while they write. So you need to be proactive and correct those mistakes on their behalf. Let's see a few examples

ScienceDirect simply match it regardless of the mistake.

while Google matches only with the exact term. In this case, they are not able to correct the user's mistake automatically

Here the mistake is made on purpose for simulating a user with a different keyboard. The search should automatically support normalization, which does not. ScienceDirect does.

Here the idea is to search a specific item related to prostate cancer and named {ARN-509}, By mistake, it is written as {ARN \space -509} and no match is given.

ScienceDirect simply match it regardless of the mistake.

while Google matches only with the exact term. In this case, they are not able to correct the user's mistake automatically

## Saturday, November 7, 2015

### TOC for my new book: A collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark

Table of Contents

1. Why is Cross Validation important? 11

Solution 11

Code 11

2. Why is Grid Search important? 12

Solution 12

Code 12

3. What are the new Spark DataFrame and the Spark Pipeline? And how we can use the new ML library for Grid Search 13

Solution 13

Code 14

4. How to deal with categorical features? And what is one-hot-encoding? 16

Solution 16

Code 17

5. What are generalized linear models and what is an R Formula? 18

Solution 18

Code 18

6. What are the Decision Trees? 19

Solution 19

Code 21

7. What are the Ensembles? 22

Solution 22

8. What is a Gradient Boosted Tree? 22

Solution 22

9. What is a Gradient Boosted Trees Regressor? 23

Solution 23

Code 23

10. Gradient Boosted Trees Classification 24

Solution 24

Code 25

11. What is a Random Forest? 26

Solution 26

Code 26

12. What is an AdaBoost classification algorithm? 27

Solution 27

13. What is a recommender system? 28

Solution 28

14. What is a collaborative filtering ALS algorithm? 29

Solution 29

Code 30

15. What is the DBSCAN clustering algorithm? 32

Solution 32

Code 32

16. What is a Streaming K-Means? 33

Solution 33

Code 34

17. What is Canopi Clusterting? 34

Solution 34

18. What is Bisecting K-Means? 35

Solution 35

19. What is the PCA Dimensional reduction technique? 36

Solution 36

Code 37

20. What is the SVD Dimensional reduction technique? 38

Solution 38

Code 38

21. What is Latent Semantic Analysis (LSA)? 39

Solution 39

22. What is Parquet? 39

Solution 39

Code 39

23. What is the Isotonic Regression? 40

Solution 40

Code 40

24. What is LARS? 41

Solution 41

25. What is GMLNET? 42

Solution 42

26. What is SVM with soft margins? 43

Solution 43

27. What is the Expectation Maximization Clustering algorithm? 44

Solution 44

28. What is a Gaussian Mixture? 45

Solution 45

Code 45

29. What is the Latent Dirichlet Allocation topic model? 46

Solution 46

Code 47

30. What is the Associative Rule Learning? 48

Solution 48

31. What is FP-growth? 50

Solution 50

Code 50

32. How to use the GraphX Library? 50

Solution 50

33. What is PageRank? And how to compute it with GraphX 51

Solution 51

Code 52

Code 52

34. What is Power Iteration Clustering? 54

Solution 54

Code 54

35. What is a Perceptron? 55

Solution 55

36. What is an ANN (Artificial Neural Network)? 56

Solution 56

37. What are the activation functions? 57

Solution 57

38. How many types of Neural Networks are known? 58

39. How can you train a Neural Network 59

Solution 59

40. What application have the ANNs? 59

Solution 59

41. Can you code a simple ANNs in python? 60

Solution 60

Code 60

42. What support has Spark for Neural Networks? 61

Solution 61

Code 62

43. What is Deep Learning? 63

Solution 63

44. What are autoencoders and stacked autoencoders? 68

Solution 68

45. What are convolutional neural networks? 69

Solution 69

46. What are Restricted Boltzmann Machines, Deep Belief Networks and Recurrent networks? 70

Solution 70

47. What is pre-training? 71

Solution 71

48. An example of Deep Learning with nolearn and Lasagne package 72

Solution 72

Code 73

Outcome 73

Code 74

49. Can you compute an embedding with Word2Vec? 75

Solution 75

Code 76

Code 77

50. What are Radial Basis Networks? 77

Solution 77

Code 78

51. What are Splines? 78

Solution 78

Code 78

52. What are Self-Organized-Maps (SOMs)? 78

Solution 78

Code 79

53. What is Conjugate Gradient? 79

Solution 79

54. What is exploitation-exploration? And what is the armed bandit method? 80

Solution 80

55. What is Simulated Annealing? 81

Solution 81

Code 81

56. What is a Monte Carlo experiment? 81

Solution 81

Code 82

57. What is a Markov Chain? 83

Solution 83

58. What is Gibbs sampling? 83

Solution 83

Code 84

59. What is Locality Sensitive Hashing (LSH)? 84

Solution 84

Code 85

60. What is minHash? 85

Solution 85

Code 86

61. What are Bloom Filters? 86

Solution 86

Code 87

62. What is Count Min Sketches? 87

Solution 87

Code 87

63. How to build a news clustering system 88

Solution 88

64. What is A/B testing? 89

Solution 89

65. What is Natural Language Processing? 90

Solution 90

Code 90

Outcome 92

66. Where to go from here 92

Appendix A 95

67. Ultra-Quick introduction to Python 95

68. Ultra-Quick introduction to Probabilities 96

69. Ultra-Quick introduction to Matrices and Vectors 97

70. Ultra-Quick summary of metrics 98

Classification Metrics 98

Clustering Metrics 99

Scoring Metrics 99

Rank Correlation Metrics 99

Probability Metrics 100

Ranking Models 100

71. Comparison of different machine learning techniques 101

Linear regression 101

Logistic regression 101

Support Vector Machines 101

Clustering 102

Decision Trees, Random Forests, and GBTs 102

Associative Rules 102

Neural Networks and Deep Learning 103

1. Why is Cross Validation important? 11

Solution 11

Code 11

2. Why is Grid Search important? 12

Solution 12

Code 12

3. What are the new Spark DataFrame and the Spark Pipeline? And how we can use the new ML library for Grid Search 13

Solution 13

Code 14

4. How to deal with categorical features? And what is one-hot-encoding? 16

Solution 16

Code 17

5. What are generalized linear models and what is an R Formula? 18

Solution 18

Code 18

6. What are the Decision Trees? 19

Solution 19

Code 21

7. What are the Ensembles? 22

Solution 22

8. What is a Gradient Boosted Tree? 22

Solution 22

9. What is a Gradient Boosted Trees Regressor? 23

Solution 23

Code 23

10. Gradient Boosted Trees Classification 24

Solution 24

Code 25

11. What is a Random Forest? 26

Solution 26

Code 26

12. What is an AdaBoost classification algorithm? 27

Solution 27

13. What is a recommender system? 28

Solution 28

14. What is a collaborative filtering ALS algorithm? 29

Solution 29

Code 30

15. What is the DBSCAN clustering algorithm? 32

Solution 32

Code 32

16. What is a Streaming K-Means? 33

Solution 33

Code 34

17. What is Canopi Clusterting? 34

Solution 34

18. What is Bisecting K-Means? 35

Solution 35

19. What is the PCA Dimensional reduction technique? 36

Solution 36

Code 37

20. What is the SVD Dimensional reduction technique? 38

Solution 38

Code 38

21. What is Latent Semantic Analysis (LSA)? 39

Solution 39

22. What is Parquet? 39

Solution 39

Code 39

23. What is the Isotonic Regression? 40

Solution 40

Code 40

24. What is LARS? 41

Solution 41

25. What is GMLNET? 42

Solution 42

26. What is SVM with soft margins? 43

Solution 43

27. What is the Expectation Maximization Clustering algorithm? 44

Solution 44

28. What is a Gaussian Mixture? 45

Solution 45

Code 45

29. What is the Latent Dirichlet Allocation topic model? 46

Solution 46

Code 47

30. What is the Associative Rule Learning? 48

Solution 48

31. What is FP-growth? 50

Solution 50

Code 50

32. How to use the GraphX Library? 50

Solution 50

33. What is PageRank? And how to compute it with GraphX 51

Solution 51

Code 52

Code 52

34. What is Power Iteration Clustering? 54

Solution 54

Code 54

35. What is a Perceptron? 55

Solution 55

36. What is an ANN (Artificial Neural Network)? 56

Solution 56

37. What are the activation functions? 57

Solution 57

38. How many types of Neural Networks are known? 58

39. How can you train a Neural Network 59

Solution 59

40. What application have the ANNs? 59

Solution 59

41. Can you code a simple ANNs in python? 60

Solution 60

Code 60

42. What support has Spark for Neural Networks? 61

Solution 61

Code 62

43. What is Deep Learning? 63

Solution 63

44. What are autoencoders and stacked autoencoders? 68

Solution 68

45. What are convolutional neural networks? 69

Solution 69

46. What are Restricted Boltzmann Machines, Deep Belief Networks and Recurrent networks? 70

Solution 70

47. What is pre-training? 71

Solution 71

48. An example of Deep Learning with nolearn and Lasagne package 72

Solution 72

Code 73

Outcome 73

Code 74

49. Can you compute an embedding with Word2Vec? 75

Solution 75

Code 76

Code 77

50. What are Radial Basis Networks? 77

Solution 77

Code 78

51. What are Splines? 78

Solution 78

Code 78

52. What are Self-Organized-Maps (SOMs)? 78

Solution 78

Code 79

53. What is Conjugate Gradient? 79

Solution 79

54. What is exploitation-exploration? And what is the armed bandit method? 80

Solution 80

55. What is Simulated Annealing? 81

Solution 81

Code 81

56. What is a Monte Carlo experiment? 81

Solution 81

Code 82

57. What is a Markov Chain? 83

Solution 83

58. What is Gibbs sampling? 83

Solution 83

Code 84

59. What is Locality Sensitive Hashing (LSH)? 84

Solution 84

Code 85

60. What is minHash? 85

Solution 85

Code 86

61. What are Bloom Filters? 86

Solution 86

Code 87

62. What is Count Min Sketches? 87

Solution 87

Code 87

63. How to build a news clustering system 88

Solution 88

64. What is A/B testing? 89

Solution 89

65. What is Natural Language Processing? 90

Solution 90

Code 90

Outcome 92

66. Where to go from here 92

Appendix A 95

67. Ultra-Quick introduction to Python 95

68. Ultra-Quick introduction to Probabilities 96

69. Ultra-Quick introduction to Matrices and Vectors 97

70. Ultra-Quick summary of metrics 98

Classification Metrics 98

Clustering Metrics 99

Scoring Metrics 99

Rank Correlation Metrics 99

Probability Metrics 100

Ranking Models 100

71. Comparison of different machine learning techniques 101

Linear regression 101

Logistic regression 101

Support Vector Machines 101

Clustering 102

Decision Trees, Random Forests, and GBTs 102

Associative Rules 102

Neural Networks and Deep Learning 103

### The art of news clustering: modern metrics for the Reserchers

Team shipped another cool feature. Nowadays, Modern researcher is not limited to academic papers and the labs. Nowadays break-through research is mentioned by the news sources and there are articles published in generalist magazines and newspapers talking about the progress made by Science in all the disciplines.

One key aspect is to have fast algorithms based on machine learning and data analysis for grouping related articles as soon as they are published. In this way, Data science can help to infer the importance of each piece of information.

My group recently acquired Newsflo an innovative company in London and, together, we shipped an engine for clustering news articles mentioning Academic papers and research . This engine is progressively shipped in all Elsevier's products. Here it is the integration with myresearchdashboard.com

One key aspect is to have fast algorithms based on machine learning and data analysis for grouping related articles as soon as they are published. In this way, Data science can help to infer the importance of each piece of information.

My group recently acquired Newsflo an innovative company in London and, together, we shipped an engine for clustering news articles mentioning Academic papers and research . This engine is progressively shipped in all Elsevier's products. Here it is the integration with myresearchdashboard.com

## Friday, November 6, 2015

### Search terms as an automatic way to annotate scientific articles

Another feature has been shipped by the team.

Search terms are an automatic way to annotate scientific articles. Here we show the aggregated (e.g. anonymized) queries which were submitted by the user for retrieving my article

## Thursday, November 5, 2015

### How to build a news clustering system

(excerpt from my new book, question #65)

News clustering is a hard problem to be solved.
News articles are typically arriving to our clustering engine in a continuous
streaming fashion. Therefore, a plain vanilla batch approach is not feasible.
For instance, the simple idea of using k-means cannot work for two reasons.
First, it is not possible to know the number of clusters a-priori because the
topics are dynamically evolving. Second, the articles themselves are not
available a-priori. Therefore, more sophisticate strategies are required.

One initial idea is to split the data in
mini-batches (perhaps processed with Spark Streaming) and to clusters the
content of each mini-batch independently. Then, clusters of different epochs (e.g.
mini-batches) can be chained together.

An additional intuition is to start with
k-seeds and then extend those initial k-clusters whenever a new article that is
not similar enough to the initial groups arrives. In this way, new clusters are
dynamically created when needed. In one additional variant, we could think
about re-clustering all the articles after a number of epochs under the
assumption that this will improve our target metric.

In addition to that, we can have a look to the
data and perhaps notice that many articles are near-duplicates. Hence, we could
aim at reducing the computational complexity by applying pseudo-linear techniques
such as minHash shingling.

More sophisticate methods will aim at ranking
the articles by importance. This is an even harder problem again because of the
dynamic nature of the content and the absence of links, which could have allowed
PageRank-type computations. If that is not possible, then a two-layer model
could be considered where the importance of a news article depends on the importance
of the originating news sources, which in turns depends on the importance of
the emitted articles. It can be proven that this recurrent definition has a fixed-point
solution[1].

Even more sophisticate methods can aim at extracting
entities from the clusters and this is typically achieved by running a topic
model detection on the evolving mini-batches.

[1] Ranking a stream of news, WWW '05 Proceedings of the 14th
international conference on World Wide Web, G. M. Del Corso, A. Gulli, F.
Romani.

## Wednesday, November 4, 2015

### Disrupting the Academic Research Arena with Recommenders

We leave in a world where the

For overcoming this limitation, Netflix, Spotify, Google Play, Apple Genius, Amazon they use recommender based technologies that are used to suggest fresh and relevant information to the users with no need of explicitly submitting queries. You can watch your favourite movie, listen your song, read your news articles, and discovery new items to buy even if you are not aware of what is relevant for you in advance.

Surprisingly enough, recommenders are still not yet largely adopted by the Research Communities. How many new and fresh papers are relevant for your research discipline and how long it takes to discover them? Traditionally, discovery is based on word-of-mouth communications where someone in your community will suggest what paper to read and what the new research trends are. But this requires time, and time is fundamental in research. That's why we worked hard to create a break-through technology with our team in London. We needed to solve this problem and help the communities.

First, recommendations are based on what I have read previously and stored in my library. It's clear that I have an interest in data mining and usage statistics. Plus, there a surprising article related to some new types of research topics that was considering recently. That's the serendipity effect. Then, there are also recommendations based on my research discipline (Computer Science)

More important, experimental data showed that freshness is very important for research. So, we developed a special set of recommenders focused on my own very recent research activity. In my case, this is related to different methodologies for sampling the Web size, and search - of course. Then, we also show what is trending in my discipline right now.

Obviously, we encourage the users to interact with our system and fine tune the suggestion so that the quality of the personalized recommendations can improve over time. The more you interact, the merrier the suggestions will be.

So, Try this cool technology which I believe will disrupt the way in which research is done and will help researchers to save time Antonio

**quantity of available information is hugely massive**. How many movies, songs, news articles, apps are out of there and what is the best way to find content relevant for every single user?**Search**is not*the only*solution. Search assumes that you are already aware of what you are looking for. Perhaps, you*already*heard the latest song made by Nicky Jam and you search few words, or you want to see the latest movie of Paolo Sorrentino and so you search the title. However, the problem is that you need to know in advance what you are looking for and, then, explicitly submit a query to*pull*(retrieve) the content. What if there is some piece of information which is very relevant but you are not aware of it? Search will not necessarily help.For overcoming this limitation, Netflix, Spotify, Google Play, Apple Genius, Amazon they use recommender based technologies that are used to suggest fresh and relevant information to the users with no need of explicitly submitting queries. You can watch your favourite movie, listen your song, read your news articles, and discovery new items to buy even if you are not aware of what is relevant for you in advance.

Surprisingly enough, recommenders are still not yet largely adopted by the Research Communities. How many new and fresh papers are relevant for your research discipline and how long it takes to discover them? Traditionally, discovery is based on word-of-mouth communications where someone in your community will suggest what paper to read and what the new research trends are. But this requires time, and time is fundamental in research. That's why we worked hard to create a break-through technology with our team in London. We needed to solve this problem and help the communities.

**So, Team shipped an Academic Recommendation engine**which adopts sophisticate machine learning algorithm to learn how to discover scientific articles that are relevant for you. Moreover,**Recommendation is personalized**and it is based on your own scientific interests.**What is cool is that the algorithms makes recommendations tailored on you, the Researcher.****Let's see how this works. Browse mendeley.com/suggest/**First, recommendations are based on what I have read previously and stored in my library. It's clear that I have an interest in data mining and usage statistics. Plus, there a surprising article related to some new types of research topics that was considering recently. That's the serendipity effect. Then, there are also recommendations based on my research discipline (Computer Science)

More important, experimental data showed that freshness is very important for research. So, we developed a special set of recommenders focused on my own very recent research activity. In my case, this is related to different methodologies for sampling the Web size, and search - of course. Then, we also show what is trending in my discipline right now.

Obviously, we encourage the users to interact with our system and fine tune the suggestion so that the quality of the personalized recommendations can improve over time. The more you interact, the merrier the suggestions will be.

So, Try this cool technology which I believe will disrupt the way in which research is done and will help researchers to save time Antonio

## Tuesday, November 3, 2015

### What is a recommender system?

## (excerpt from my new book)

**Collaborative filtering**approaches learn a model from a user's past behaviour (items previously purchased or clicked and/or numerical ratings attributed to those items) as well as similar choices made by other users. The learned model is then used to predict items (or ratings for items) that the user may have an interest in. Note that in some situations rating and choices can be explicitly made, while in other situations those are implicitly inferred by users’ actions. Collaborative filtering has two variants:

**User based collaborative filtering**:user’ interest is taken into account by looking for users who are somehow similar to her. Each user is represented by a profile and different kinds of similarity metrics can be defined. For instance, a user can be represented by a vector and the similarity could be the cosine similarity**Item based collaborative filtering:**user’s interest is directly taken into account by aggregating similar classes of interest

**Content-based filtering**approaches learn a model based a series of features of an item in order to recommend additional items with similar properties. For instance, a content based filtering system can recommend an article similar to other articles seen in the past, or it can recommend a song with a sound similar to ones implicitly liked in the past.

More sophisticate recommenders can also leverage additional structural information. For instance an item can be referred by other items and those can contribute to enrich the set of features. As an example, think about a scientific publication which is referred by other scientific publication. In this case, the citation graph is a very useful source of information for recommendations.

## Monday, November 2, 2015

### Benchmarking your Academic Profile is BigData Computation

Exciting day today.

**Let's ship it.**Team worked on a**modern BigData pipeline**built on Apache Spark for**helping the Researchers to benchmark their Academic Profiles**
Here it is me checking how am I doing and it's clear that I've moved to the industry since have no recent publication and few recent citations.

First, an overall summary of Antonio's gulli views & citations over time

Then an in-depth view of one selected article with its citations

Check this out on http://mendeley.com/stats/

## Sunday, November 1, 2015

### Academic Search and Relevance: deep searching what you are looking for

Let's see some more examples of Academic Search and Relevance. This time around from my domain of expertise which is Machine Learning. Again side-by-side comparison and we will show why directly matching the users' needs is important.

Here I am interested in finding a specific innovation discovered in deep learning. As discussed in a previous post autoencoders are deep learning machines which are able to auto-learn what are the important features in a dataset with no human intervention. The machine will pick the right features on your behalf with no handcraft work.

Google returns the seminal paper from 2006 which is considered the starting point for the renaissance of Neural Networks and their evolution into modern Deep Learning systems.

However, this paper

Therefore, I'd consider this a DSAT because it is not immediately satisfying my very specific search needs.

So Google Scholar is not returning a very relevant result

ScienceDirect is instead returning a very relevant and recent results discussing about Deep Learning and Autoencoders.

**{deep learning autoencoders}**Here I am interested in finding a specific innovation discovered in deep learning. As discussed in a previous post autoencoders are deep learning machines which are able to auto-learn what are the important features in a dataset with no human intervention. The machine will pick the right features on your behalf with no handcraft work.

Google returns the seminal paper from 2006 which is considered the starting point for the renaissance of Neural Networks and their evolution into modern Deep Learning systems.

However, this paper

**DOES NOT**talk about Autoencoders, Instead, it talks about deep believe nets a slightly related topic. At the time of that paper Autoencoders where NOT YET popular for Deep Learning (and even Deep Learning was not invented as a new word yet).Therefore, I'd consider this a DSAT because it is not immediately satisfying my very specific search needs.

So Google Scholar is not returning a very relevant result

ScienceDirect is instead returning a very relevant and recent results discussing about Deep Learning and Autoencoders.

Subscribe to:
Posts (Atom)