TheBigDB of facts

TheBigDB is a very loosely structured database of facts,free and open to everybody


Through a very simple API you can browse the database and access facts such as:

  • { nodes: [“Gold”, “atomic radius”, “144 pm”] }
  • { nodes: [“Bill Clinton”, “job”, “President of the United States”], period: { from: “1993-01-20 12:00:00″, to: “2001-01-20 11:59:59″ } }
  • { nodes: [“Apple”, “average weight”, “150g”] }

That’s it. Really.

Anyone can create, upvote or downvote a statement.

There are no datatypes, namespaces, lists or domains. Just nodes, one after the other, with a simple and easy to use API to search through them.

Probability, The Analysis of Data

Probability, The Analysis of Data – Volume 1

is a free book available online, it provides educational material in the area of data analysis.

  • The project features comprehensive coverage of all relevant disciplines including probability, statistics, computing, and machine learning.
  • The content is almost self-contained and includes mathematical prerequisites and basic computing concepts.
  • The R programming language is used to demonstrate the contents. Full code is available, facilitating reproducibility of experiments and letting readers experiment with variations of the code.
  • The presentation is mathematically rigorous, and includes derivations and proofs in most cases.
  • HTML versions are freely available on the website Hardcopies are available at affordable prices.

Majestic Million help building linkdaq ( ), is now publishing free data under a free creative commons license.

The Majestic Million database is a list  of the top 1 million website in the world, ordered by the number of referring subnets. A subnet is a bit complex – but to a layman it is basically anything within an IP range, ignoring the last three digits of the IP number.


Regular Updates

We have set up a CRON job so that every day we will recompile the data, which is based on our Fresh index. It is POSSIBLE that on two consecutive days, the data is the same, if the Fresh Index did not update for some reason, but usually the data will change daily. Please do not try to download the data more than once every 24 hours, otherwise you will simply end up getting banned or we will have to reconsider giving the data away or putting it behind a walled garden.

Download Location

Be wary scrapers… this will do your head in if you weren’t expecting it… The Majestic Million CSV can regularly be downloaded here:


LinkDAQ is a fun (hopefully) trading game based off of the top 50,000 websites in the world.

Play on


Data wiki on googlelabs

DataWiki is a wiki for structured data.

With DataWiki it should be easy to:

  • create and edit structured data
  • create simple mashup applications in a few minutes
  • define formats in terms of others, e.g. Missing Person reports = vCard (who) + GeoRSS (last seen) + string (current status note)
  • share information with other systems via built-in federation
  • enable easy input/output from a variety of endpoints, e.g. via Twitter, ODK or SMS from a remote location

The first version is available within the googlelabs: