Category Archives: Scientific Method

How we edit science part 5: so what is science?

Tim Dean, The Conversation

We take science seriously at The Conversation and we work hard at reporting it accurately. This series of five posts is adapted from an internal presentation on how to understand and edit science by Australian Science & Technology Editor, Tim Dean. We thought you would also find it useful. The Conversation


The first four posts in this series covered the scientific method and practical tips on how to report it effectively. This post is more of a reflection on science and its origins. It’s not essential reading, but could be useful for those who want to situate their science reading or writing within a broader historical and conceptual context.

Fair warning: it’s going to get philosophical. That means you might find it frustratingly vague or complicated. If you find yourself getting infuriated at the inability to settle on clear definitions or provide clear answers to important questions, that’s a perfectly natural (and probably quite healthy) response.

These issues have been intensively debated for hundreds, if not thousands, of years, without resolution. We’d likely have given up on them by now, except that these concepts have an unfortunate tendency to influence the way we actually do things, and thus retain some importance.

The foundations of science

Explaining what science is, and entertaining all the debates about how it does or should work, would take up an entire book (such as this one, which I highly recommend). Rather than tackling such issues head-on, this section will give a broad overview of what science is.

While it doesn’t get mentioned often outside of scientific circles, the fact is there is no one simple definition of science, and no single definitive method for conducting it.

However, virtually all conceptions of science lean on a couple of underlying philosophical ideas.

Francis Bacon (not the artist) was one of the leading voices to reform ‘natural philosophy’ into an observation-led endeavour, which ultimately evolved into science.
richardhe51067/Flickr, CC BY

The first is a commitment to learning about the world through observation, or empiricism. This is in contrast to alternative approaches to knowledge, such as rationalism – the notion that we can derive knowledge about the world just by thinking about it hard enough – or revelation – that we can learn from intuition, insight, drug-induced hallucinations, or religious inspiration.

Another philosophical basis of science is a commitment to methodological naturalism, which is simply the idea that the best way to understand the natural world is to appeal to natural mechanisms, laws, causes or systems, rather than to supernatural forces, spirits, immaterial substances, invisible unicorns or other deities.

This is why scientists reject the claim that ideas like creationism or intelligent design fall within the purview of science. Because these ideas posit or imply supernatural forces, no matter how scientific they try to sound, they break methodological naturalism, so they aren’t science.

(As a side point, science doesn’t assume or imply the stronger claim of philosophical or ontological naturalism. This is the idea that only natural things exist – which usually means things that exist in spacetime – and that there are no supernatural entities at all.

This is a strictly philosophical rather than scientific claim, and one that is generally agreed to be beyond the ken of science to prove one way or the other. So, if cornered, most scientists would agree it’s possible that intangible unicorns might exist, but if they don’t exist in spacetime or causally interact with things that do, then they’re irrelevant to the practice of science and can be safely ignored. See Pierre Laplace’s apocryphal – but no less cheeky – response to Napoleon, who remarked that Laplace had produced a “huge book on the system of the world without once mentioning the author of the universe”, to which Laplace reputedly replied: “Sire, I had no need of that hypothesis.”)

This is where we come to the role of truth in science: there isn’t any. At least in the absolute sense.

Instead, science produces facts about the world that are only held to be true with a certainty proportional to the amount of evidence in support of them. And that evidence can never give 100% certainty.

There are logical reasons for this to be the case, namely that empiricism is necessarily based on inductive rather than deductive logic.

Another way to put it is that no matter how certain we are of a particular theory, and no matter how much evidence we’ve accrued to support it, we must leave open the possibility that tomorrow we will make an observation that contradicts it. And if the observation proves to be reliable (a high bar, perhaps, but never infinitely high), then it trumps the theory, no matter how dearly it’s held.

The Scottish philosopher David Hume couched the sceptical chink in empiricism’s armour of certainty like this: all we know about the world comes from observation, and all observation is of things that have happened in the past. But no observation of things in the past can guarantee that things in the future will operate in the same way.

This is the “problem of induction”, and to this day there is no decisive counter to its scepticism. It doesn’t entirely undermine science, though. But it does give us reason to stop short of saying we know things about the world with absolute certainty.

Scientific progress

The steady accumulation of evidence is one reason why many people believe that science is constantly and steadily progressing. However, in messy reality, science rarely progresses smoothly or steadily.

Rather, it often moves in fits and spurts. Sometimes a new discovery will not only change our best theories, it will change the way we ask questions about the world and formulate hypotheses to explain them.

Sometimes it means we can’t even integrate the old theories into the new ones. That’s what is often called a “paradigm shift” (another term to avoid when reporting science).

For instance, sometimes a new observation will come along that will cause us to throw out a lot of what we once thought we knew, like when the synthesis of urea, of all things, forced a rewrite of the contemporary understanding of what it means to be a living thing.

That’s progress of a sort, but it often involves throwing out a lot of old accepted facts, so it can also look regressive. In reality, it’s doing both. That’s just how science works.

Science also has its limits. For one, it can’t say much about inherently unobservable things, like some of the inner workings of our minds or invisible unicorns.

That doesn’t mean it can only talk about things we can directly observe at the macroscopic scale. Science can talk with authority about the microscopic, like the Higgs boson, and the distant, like the collision of two black holes, because it can scaffold those observations on other observations at our scale.

But science also has limits when it comes to discussing other kinds of things for which there is no fact of the matter, such as like questions of subjective preference. It’s not a scientific fact that Led Zeppelin is the greatest band ever, although I still think it’s a fact.

There are similar limits when it comes to moral values. Science can describe the world in detail, but it cannot by itself determine what is good or bad (someone please tell Sam Harris – oh, they have). To do that, it needs an injection of values, and they come from elsewhere. Some say they come from us, or from something we worship (which many people would argue means they still come from us) or from some other mysterious non-natural source. Arguments over which source is the right one are philosophical, not scientific (although they can be informed by science).

Science is also arguably not our only tool for producing knowledge. There are other approaches, as exemplified by the various non-scientific academic disciplines, like history, sociology and economics (the “dismal science”), as well as other domains like art, literature and religion.

That said, to the extent that anyone makes an empirical claim – whether that be about the movement of heavenly bodies, the age of Earth, or how species change over time – science has proven to be our best tool to scrutinise that claim.

Tim Dean, Editor, The Conversation

This article was originally published on The Conversation. Read the original article.

How we edit science part 4: how to talk about risk, and words and images not to use

Tim Dean, The Conversation

We take science seriously at The Conversation and we work hard at reporting it accurately. This series of five posts is adapted from an internal presentation on how to understand and edit science by Australian Science & Technology Editor, Tim Dean. We thought you would also find it useful.


You may have heard the advice for pregnant women to avoid eating soft cheeses. This is because soft cheeses can sometimes carry the Listeria monocytogenes bacteria, which can cause a mild infection. In some cases, the infection can be serious, even fatal, for the unborn child.

However, the infection is very rare, affecting only around 65 people out of 23.5 million in Australia in 2014. That’s 0.0003% of the population. Of these, only around 10% are pregnant women. Of these, only 20% of infections prove fatal to the foetus.

We’re getting down to some very small numbers here.

Even among the “high risk” foods, like soft unpasteurised cheese or preserved meats, Listeria occurs in them less than 2% of the time. And even then the levels are usually too low to cause an infection.

So why the advice to avoid soft cheeses? Because the worst case scenario of a listeriosis infection is catastrophic. And pregnant women are 13 times more likely to contract listeriosis than an otherwise healthy adult. Thirteen times!

Now, it’s entirely reasonable for a pregnant woman to choose to avoid soft cheeses if she wants to lower her risk as much as possible.

But when it comes to talking about or reporting risk, there’s clearly a vast conceptual – or perceptual – disconnect between the impact of “13 times more likely” and the absolute probability of contracting listeriosis and having complications during a pregnancy.

If we talked about every risk factor in our lives the way health authorities talk about soft cheeses, we’d likely don a helmet and kneepads every morning after we get out of bed. And we’d certainly never drive a car.

The upshot of this example is to emphasise that our intuitions about risk are often out of step with the actualities. So journalists need to take great care when reporting risk so as not to exacerbate our intuitive deficits as a species.

Relatively risky

For one, use absolute rather than relative risk wherever possible. If you say eating bacon increases your chances of developing colorectal cancer by 18%, it’s hard to know what that means without knowing the baseline chance of developing colorectal cancer in the first place.

Many readers who skim the headline will believe that eating bacon gives you an 18% chance of developing cancer. That would be alarming.

But if you couch it in absolute terms, things become more clear. For example, once you hear that the probability of you developing colorectal cancer some time before the age of 85 is 8.3%, then it’s far more salient to tell people that eating bacon nudges that lifetime probability up to 9.8%. And then it’s also worth mentioning that the majority of people who develop colorectal cancer live on for five or more years.

Bacon is suddenly a lot less of a death sentence, and is largely returned to its prior status as the king of foods.

Examples and analogies also help, but avoid referencing extreme events, like lightning strikes or shark attacks, as they only muddy (or bloody) the water.

And take great caution around the precautionary principle. This amplification of our intuitive inability to grok risk, and our current hyper risk-averse culture, states that in the absence of evidence that something is safe, then we should treat it as dangerous.

It effectively places the onus of proof of safety on the proponents of something new rather than requiring the recipients to prove it’s harmful first.

There are circumstances when the precautionary principle is warranted, such as when a possible outcome is so catastrophic that we ought to play things safe, such as when assessing a new nuclear reactor design, or a new untested technology where there’s some plausible theoretical risk.

But it’s often misused – at least in a rational sense – to put the kibosh on new technologies that are perceived to be unpalatable by some. Popular targets are genetically modified food, nanotechnology and mobile phone radiation.

Strictly speaking, the precautionary principle requires some plausible suspicion of risk. So, as a rule of thumb, if a technology has been in use for some time, and there is no reliable evidence of harm, then the onus increasingly falls on those who believe it’s unsafe to provide evidence to that effect.

That doesn’t mean these things are guaranteed safe. In fact, nothing can be guaranteed safe. Even the IARC list of carcinogens has only one substance in the lowest risk category of “probably not carcinogenic to humans”. That’s a chemical called caprolactam. And even there we’re out of luck; it’s mildly toxic.

But many of us drink alcohol, eat pickles and walk around in the sun – all of which are known to the carcinogenic (although pickles are only group 2B, “possibly carcinogenic”, thankfully), and most of us won’t die as a result.

Risk needs to be put in context. And we should never cultivate the expectation in our audience that things can be made absolutely safe.

That said, we should also keep a critical eye on the safety of new technologies and report on them when the evidence suggests we should be concerned. But risk always needs to be reported responsibly in order to not cause undue alarm.

Balance and debate

Balance if oft touted as a guiding principle of journalistic practice. However, it’s often misapplied in the domain of science.

Balance works best when there are issues at stake involving values, interpretation, trade-offs or conflicts of interest. In these cases there is either no fact of the matter that can arbitrate between the various views or there is insufficient information to make a ruling one way or another.

In these cases, a reporter does their job by allowing the various invested parties to voice their views and, given appropriate care and background information, the reader can decide for themselves whether any have merit.

This might be appropriate in a scientific context if there is a debate about the interpretation of evidence, or which hypothesis is the best explanation for it in the absence of conclusive evidence.

But it’s not appropriate when directly addressing some empirical question, such as which is the tallest mountain in the world. In that case, you don’t call for a debate or take a straw poll, you go out and measure it.

It’s also not appropriate when comparing the views of a scientist and a non-scientist on some scientific issue. An immunologist ought not be balanced with a parent when it comes specifically to discussing the safety or efficacy of vaccines.

Many reporters like to paint a vignette using an individual example. This can be a useful tool to imbue the story with emotional salience. But it also risks the introduction of emotive anecdote into a subject that ought to be considered on the weight of the evidence.

However, balance is called for when going beyond the scientific evidence and speaking to its societal or policy implications. Science can (and should) inform policy debates, to the extent they rely on empirical facts, but policy is also informed by values and involve costs and trade-offs.

Scientists can certainly weigh in on such issues – they are citizens too. But one thing to be wary of is scientists who step outside of their areas of expertise to advocate for some personally held belief. They are entitled to do so. But they are no longer an expert in that context.

Words of caution

There are many words that have a technical sense in a scientific context, or which are used by scientists to mean something different from the vernacular sense. Take care when you use these words, and be sure not to conflate the technical and vernacular uses.

Proof

We often see headlines saying “science has proven X”, but you actually rarely hear scientists use this term in this context. This is because “proof” has a specific technical meaning, particularly in mathematics.

Proof means an argument that has established the truth of some statement or proposition. Proofs are popular in maths. But in science, certainty is always just out of reach.

Instead of “proof”, use “evidence”, and instead of “proven” use “have demonstrated” or “have shown”.

Valid

Like “proof”, “valid” has a technical meaning in logic. It relates to the structure of an argument. In logic, a valid argument is structured such that the premises imply the conclusion.

However, it’s entirely possible to have a valid argument that reaches a false conclusion, particularly if one of the premises is false.

If I say that all dogs have four legs, and Lucky is a dog, therefore Lucky has four legs, that’s a valid argument. But Lucky might have been involved in a tragic incident with a moving car, and only have three legs. Lucky is still a dog, and would resent the implication she’s not, and would like to have words with you about the premise “all dogs have four legs”.

Avoid using “valid” to mean “true”. If you do use it, keep it in reference to the structural features of a statement rather than its truth. So a “valid position” is one that is at least rational and/or internally consistent, even if other “valid positions” might hold it to be false.

Cure

A cure is pretty final. And hearing about them raises hopes, especially where there isn’t one already.

Most research we cover won’t actually “cure” anything. Most of the time it will “treat” it instead.

Revolution/breakthrough

Besides the usual warning against hyperbole, these terms ought to be used very carefully and sparingly.

There are sometimes breakthroughs, particularly if there has been a longstanding problem that has just been solved. The discovery of gravitational waves was a breakthrough, as it provided solid evidence for a theory that had been bumping around for a century.

Revolutions, on the other hand, are a much bigger deal. Generally speaking, a revolution means a discovery that doesn’t just update our theory about how something works, but it changes the very way we look at the world.

The favourite example is the shift from classical (Newtonian) physics to (Einsteinian) relativity. Relativity didn’t just change the way we calculate how planets move, it changed the way we think about space and time themselves, altering the very assumptions we hold when we conduct experiments.

Another example might be when looking at a large domain of research that has had a profound change on the way a particular discipline works. You could say that evolution has revolutionised biology, or that DNA sequencing has caused a revolution in genetics.

Few discoveries are revolutionary. The less frequently you use the term, the more impact it’ll have when you do.

Debate

As mentioned above, debates are for values, not empirical fact. If there’s a debate, be sure to frame it appropriately and not to imply that there can be a genuine debate between fact and opinion.

Images

Illustrating some science stories is easy. It’s a story about a frog; use a picture of a frog. But other stories are harder, and there will be temptation to cut corners and/or rely on well trodden clichés. Here are some to avoid.

Almost no scientists look like this.
Shutterstock

Only use images of Einstein when talking about Albert Einstein. Few scientists these days look like Einstein, and his face should not be the stereotype for genius. Definitely never use clipart illustrations of Einstein-like characters to represent “scientist”.

I’m ashamed to say I actually used this image once.
Shutterstock

Some researchers are old white men. But certainly not all. If you’re using a generic scientist, don’t lean on the old white guy in lab coat, be creative.

Some researchers wear lab coats. Not all are white. Use photos of people in lab coats if they’re actually of the researchers in the story. Don’t just go to a stock image library and pick a random model wearing a white lab coat and pat yourself on the back for a job well done.

That’s no solution, that’s cordial.
Shutterstock

Avoid stock shots of people pouring colourful liquid into flasks. If it looks like cordial, then it probably is.

This one has it all. Einstein hair, mad scientist, white lab coat, beakers of thoroughly unscientific fluid. Never use it, unless it’s to mock these clichés.
Shutterstock

And, above all, avoid the “mad scientist”. It’s one of the most corrosive, and pervasive, tropes about science, and we would all benefit from eroding its influence.

Fancy DNA image is wrong. Scientists will notice.
Shutterstock

Also, be mindful that some images you might consider to be generic actually pack in a lot of technical detail. For example, DNA has a clockwise twist (imagine turning a screwdriver clockwise, the motion your hand makes as it moves forward forms a clockwise helix like DNA). The image above is wrong.

So don’t just chuck any old stock image of DNA into an image. Far too many – even from reputable sources – are wrong. And definitely don’t take DNA and have the designer mirror it because it looks better that way. And above all, don’t then stick that image on the cover of a magazine targeted at geneticists. You’ll get letters. Trust me.

Tim Dean, Editor, The Conversation

This article was originally published on The Conversation. Read the original article.

How we edit science part 3: impact, curiosity and red flags

Tim Dean, The Conversation

We take science seriously at The Conversation and we work hard at reporting it accurately. This series of five posts is adapted from an internal presentation on how to understand and edit science by Australian Science & Technology Editor, Tim Dean. We thought you would also find it useful.


The first two parts of this guide were a brief (no, seriously) introduction to what science is, how it works and some of the errors that can seep into the scientific process. This section will speak more explicitly about how to report, edit (and read) science, and some of the pitfalls in science journalism.

It’s primarily intended as a guide for journalists, but it should also be useful to those who consume science articles so you can better assess their quality.

What’s news?

The first question to ask when considering reporting on some scientific discovery is whether it’s worth reporting on at all.

If you randomly pick a scientific paper, the answer will probably be “no”. It doesn’t mean the study isn’t interesting or important to someone, but most science is in the form of incremental discoveries that are only relevant to researchers in that field.

When judging the broader public importance of a story, don’t only rely on university press releases.

While they can be a useful source of information once you decide to run a story, they do have a vested interest in promoting the work of the scientists at their institution. So they may be inclined to oversell their research, or simplify it to make it more likely to be picked up by a journalist.

In fact, there’s evidence that a substantial proportion of poorly reported science can be traced back to poorly constructed press releases. Many releases are accurate and well researched, but as with any press release, it’s worth double-checking their claims.

University communications teams also don’t necessarily do exhaustive homework on each study they write about, and can sometimes make inaccurate claims, particularly in terms of how new or unique the research is.

I once fielded a snarky phone call from a geneticist who objected to a story I wrote on the first-ever frog genome. Turns out it wasn’t the first ever. The geneticist had sequenced a frog genome a year prior to this paper. But “first ever” was in the university press release, and I neglected to factcheck that claim. My bad; lesson learned. Check your facts.

Impact and curiosity are not space probes

Broadly speaking, there are two kinds of science story: impact and curiosity.

Impact stories have some real-world effect that the reader cares about, such as a new drug treatment or a new way to reduce greenhouse gas emissions.

A curiosity story, on the other hand, has little or no immediate or direct real-world impact. These include just about every astronomy story, and things like palaeontology and stories about strange creatures at the bottom of the sea. Of course, such research can produce real-world benefits, but if the story is about those benefits, then it becomes an impact story.

The main difference between the two is in the angle you take on reporting the story. And that, in turn, influences whether the story is worth taking on. If there’s no obvious impact, and the curiosity factor is low, then it’ll be a hard sell. That doesn’t mean it’s not possible to turn it into a great yarn, but it just take more imagination and energy – and we all know they’re often in short supply, especially when deadlines loom.

NASA and ESA are great sources for illustrative imagery, and astronomy, of course. Most are public domain or Creative Commons, so free to use with appropriate attribution.
NASA/ESA

If the study looks like it has potential to become a good story, then the next thing to check is whether it’s of high quality.

The first thing to look for is where it was published. Tools like Scimago, which ranks journals, can be a helpful start.

If it’s published in a major journal, or a highly specialised one from a major publisher, at least you know it’s cleared a high peer-review bar. If you’ve never heard of the journal, then check Retraction Watch and Beall’s list for signs of dodginess.

If it’s a meta-review – a study that compiles the results of multiple other studies – such as those by Cochrane, then that makes it more reliable than if it’s a single standalone study.

As mentioned above, be wary of pre-press servers, as they haven’t yet been peer-reviewed. Be particularly wary of big claims made in pre-press papers. You might get a scoop, but you might also be publicising the latest zero point energy perpetuation motion ESP hat.

It’s not hard to Google the lead authors (usually those listed first and last in the paper’s author list). Check their profile page on their institution’s website. Check whether the institution is reputable. ANU, Oxford and MIT are. Upstate Hoopla Apologist College is probably not.

Check their academic title, whether they work in a team, and where they sit in that team. Adjunct, honorary and emeritus usually means they’re not actively involved in research, but doesn’t necessarily mean they aren’t still experts. You can also punch their name into Google Scholar to see how many citations they have.

You should also read the abstract and, if possible, the introduction and discussion sections of of the paper. This will give you an idea of the approach taken by the authors.

Red flags

While it’s unlikely that you’ll be qualified to judge the scientific merits of the study in detail, you can look for red flags. One is the language used in the study.

Most scientists have any vestige of personality hammered out of their writing by a merciless academic pretension that a dry passive voice is somehow more authoritative than writing like a normal human being. It’s not, but nevertheless if the paper’s tone is uncomfortably lively, vague or verging on the polemical, then treat it with suspicion.

You can also look for a few key elements of the study to assess its quality. One is the character of the cohort. If it’s a study conducted on US college students (who are known to be “WEIRD”), don’t assume the results will generalise to the broader population, especially outside of the United States.

Another is sample size. If the study is testing a drug, or describing some psychological quirk, and the sample size is under 50, the findings will not be very strong. That’s just a function of statistics.

If you flip a coin only 10 times and it comes up heads 7 times (the p-value, or chance of it coming up 7, 8, 9 or 10, is 0.17, so not quite “significant”), it’s a lot harder to be confident that the coin is biased compared to flipping it 100 times and it coming up heads 70 times (p-value 0.000039, or very very “significant”).

Also check what the study says about causation. Many studies report associations or correlations, such as that college students who drink and smoke marijuana tend to have lower grades than their peers. But correlation doesn’t imply causation.

It might be that there is a common cause for both phenomena. Perhaps those students who are more likely to choose to drink and smoke are predisposed towards distraction, and it’s the distraction that causes the lower grades rather than the content of the distraction per se.

So never imply causation when a study only reports correlation. You can speculate as to causation – many studies do – but do so in context and with appropriate quotes from experts.

Many studies are also conducted on animals, especially medical studies. While it’s tempting to extrapolate these results to humans, don’t.

It’s not the case that we’ve cured cancer because a drug made it disappear in a mouse. It’s not even the case that we’ve cured cancer in mice (which would still be big news in some circles).

What we’ve found is that application of some drug corresponded with a shrinkage of tumours in mice, and that’s suggestive of an interesting interaction or mechanism that might tell us something about how the drug or cancers work, and that might one day inform some new treatment for cancers in people. Try fitting that into a pithy headline. If you can’t, then don’t overhype the story.

Many impact stories also have a long wait until the impact actually arrives. Be wary of elevating expectations by implying the discovery might start treating people right away. Optimistically, most health studies are at least ten years away from practical application, often more.

Generic telescope image set on picturesque Australian outback background. It’s actually CSIRO’s ASKAP antennas at the Murchison Radio-astronomy Observatory in Western Australia. CSIRO has a great image library.
Neal Pritchard

Sources

It’s good practice to link to sources whenever you make an empirical claim. But don’t just Google the claim and link to the first paper or news story you find. Make sure the source backs the claim, and not just part of it. So don’t link to something saying people generally overestimate risk and then link to one paper that shows people overestimate risk in one small domain.

When linking to a source, preferably use the DOI (Digital Object Identifier). It’s like a URL for academic papers, and when you link to it, it will automatically shunt the reader through to the paper on the journal site.

DOIs usually come in the form of a bunch of numbers, like “10.1000/xyz123”. To turn that into a full DOI link, put “https://doi.org/” at the beginning. So “5.771073” becomes https://doi.org/10.1109/5.771073. Go on, click on that link.

As a rule, try to link directly to the journal article rather than a summary, abstract, PubMed listing, blog post or review of the paper elsewhere. Don’t link to a PDF of the paper on the author’s personal website unless the author or the journal has given you explicit permission, as you may be breaching copyright.

And definitely avoid linking to the university press release for the paper, even if that release has the paper linked at the bottom. Just link the paper instead.

This article was updated with corrected p-values on the coin flip example. Thanks to Stephen S Holden for pointing out the error, and for highlighting both the difficulty of statistics and the importance of double checking your numbers.

Tim Dean, Editor, The Conversation

This article was originally published on The Conversation. Read the original article.

How we edit science part 2: significance testing, p-hacking and peer review

Tim Dean, The Conversation

We take science seriously at The Conversation and we work hard at reporting it accurately. This series of five posts is adapted from an internal presentation on how to understand and edit science by Australian Science & Technology Editor, Tim Dean. We thought you would also find it useful.


One of the most common approaches to conducting science is called “significance testing” (sometimes called “hypothesis testing”, but that can lead to confusion for convoluted historical reasons). It’s not used in all the sciences, but is particularly common in fields like biology, medicine, psychology and the physical sciences.

It’s popular, but it’s not without its flaws, such as allowing careless or dishonest researchers to abuse it to yield dubious yet compelling results.

It can also be rather confusing, not least because of the role played by the dreaded null-hypothesis. It’s a bugbear of many a science undergraduate, and possibly one of the most misunderstood concepts in scientific methodology.

The null-hypothesis is just a baseline hypothesis that typically says there’s nothing interesting going on, and the causal relationship underpinning the scientist’s hypothesis doesn’t hold.

It’s like a default position of scepticism about the scientist’s hypothesis. Or like assuming a defendant is innocent until proven guilty.

Now, as the scientist performs their experiment, they compare their results with what the’d expect to see if the null-hypothesis were true. What they’re looking for, though, is evidence that the null-hypothesis is actually false.

An example might help.

Let’s say you want to test whether a coin is biased towards heads. Your hypothesis, referred to as the alternate hypothesis (or H₁), that you want to test is that it is biased. The null-hypothesis (H₀) is that it’s unbiased.

We already know from repeated tests that if you flip a fair coin 100 times, you’d expect it come up heads around 50 times (but it won’t always come up heads precisely 50 times). So if the scientist flips the coin 100 times and it comes up heads 55 times, it’s pretty likely to be a fair coin. But if it comes up heads 70 times, it starts to look fishy.

But how can they tell 70 heads is not just the result of chance? It’s certainly possible for a fair coin to come up heads 70 times. It’s just very unlikely. And the scientist can use statistics to determine how unlikely it is.

If they flip a fair coin 100 times, there’s a 13.6% chance that it’ll come up heads 55 or more times. That’s unlikely, but not enough to be confident the coin is biased.

But there’s only a 0.1% chance that it’ll come up heads 70 or more times. Now the coin is looking decidedly dodgy.

The probability of seeing this particular result is referred to as the “p-value”, expressed in decimal rather than percentage terms, so 13.6% is 0.136 and 0.01% chance is 0.0001.

This is a ‘normal distribution’, showing the probability of getting a particular result. The further out you go on the ‘tails’, the less likely the result.
Wikimedia

Typically, scientists consider a p-value of 0.05 to be a good indication you can reject the null-hypothesis (eg, that the coin is unbiased) and be more confident that your alternative hypothesis (that the coin is biased) is true.

This value of 0.05 is called the “significance level”. So if a result has a p-value that is above the significance level, then the result is considered “significant”.

It’s important to note that this refers to the technical sense of “statistical significance” rather than the more qualitative vernacular sense of “significant”, as in my “significant other” (although statisticians’ partners may differ in this interpretation).

This approach to science is also not without fault.

For one, if you set your significance level at 0.05, and you run the same experiment 20 times, then you’d expect one of those experiments to yield a false result, yet still clear the significance bar. So in a journal with 20 papers, you can also expect roughly one to be wrong.

This is one of the factors contributing to the so-called “replication crisis” in science, particularly in medicine and psychology.

p-hacking

One prime suspect in the replication crisis is the problem of “p-hacking”.

A good experiment will clearly define the null and the alternate hypothesis before handing out the drugs and placebos. But many experiments collect more than just one dimension of data. A trial for a headache drug might also keep an eye on side-effects, weight gain, mood, or any other variable the scientists can observe and measure.

And if one of these secondary factors shows a “significant” effect – like the group who took the headache drug also lost a lot of weight – it might be tempting to shift focus onto that effect. After all, you never know when you’ll come across the next Viagra.

However, if you simply track 20 variables in a study, you’d expect one of them to pass the significance threshold. Simply picking that variable, and writing up the study as if that was the focus all along is dodgy science.

It’s why we sometimes hear stuff that’s too good to be true, like that chocolate can help you lose weight (although that study turned out to be a cheeky attempt to show how easy it is for a scientist to get away with blatant p-hacking).

Some of the threats to reproducible science, including ‘hypothesising after the results are known’ (HARKing) and p-hacking.
Munafo et al, 2017

Publishing

Once scientists have conducted their experiment and found some interesting results, they move on to publishing them.

Science is somewhat unique in that the norm is towards full transparency, where scientists effectively give away their discoveries to the rest of the scientific community and society at large.

This is not only out of a magnanimous spirit, but because it also turns out to be a highly effective way of scrutinising scientific discoveries, and helping others to build upon them.

The way this works is typically by publishing in a peer-review journal.

It starts with the scientist preparing their findings according to the accepted conventions, such as providing an abstract, which is an overview of their discovery, and outlining the method they used in detail, describing their raw results and only then providing their interpretation of those results. They also cite other relevant research – a precursor to hyperlinks.

They then send this “paper” to a scientific journal. Some journals are more desirable than others, i.e. they have a “high impact”. The top tier, such as Nature, Science, The Lancet and PNAS, are popular, so they receive many high quality papers and accept only the best (or, if you’re a bit cynical, the most flashy). Other journals are highly specialist, and may be desirable because they’re held in high esteem by a very specific audience.

If the journal rejects the paper, the scientists move on to the next most desirable journal, and keep at it until it’s accepted or remains unpublished.

These journals employ a peer review process, where the paper is typically anonymised and sent out to a number of experts in the field. These experts then review the paper, looking for potential problems with the methods, inconsistencies in reporting or interpretation, and whether they’ve explained things clearly enough such that another lab could reproduce the results if they wanted to.

The paper might bounce back and forth between the peer reviewers and authors until it’s at a point where it’s ready to publish. This process can take as little as a few weeks, but in some cases it can take months or even years.

Journals don’t always get things right, though. Sometimes a paper will slip through with shoddy method or even downright fraud. A useful site for keeping tabs on dodgy journals and researchers is Retraction Watch.

Open Access

A new trend in scientific publishing is Open Access. While traditional journals don’t charge to accept papers, or pay scientists if they do publish their paper, they do charge fees (often exorbitant ones) to university libraries to subscribe to the journal.

What this means is a huge percentage of scientific research – often funded by taxpayers – is walled off so non-academics can’t access it.

The Open Access movement takes a different approach. Open Access journals release all their published research free of charge to readers, but they often recoup their costs by charging scientists to publish their work.

Many Open Access journals are well respected, and are gaining in prestige in academia, but the business model also creates a moral hazard, and incentives for journals to publish any old claptrap in order to make a buck. This has led to an entire industry of predatory journals.

Librarian Jeffery Beall used to maintain a list of “potential, possible, or probable” predatory publishers, which was the go-to for checking if a journal is legit. However, in early 2017 Beall took the list offline for reasons yet to be made clear. The list is mirrored, but every month that goes by makes it less and less reliable.

Many scientists also publish their research on pre-press servers, the most popular being arXiv (pronounced “archive”). These are clearing houses for papers that haven’t yet been peer-reviewed or accepted by a journal. But they do offer scientists a chance to share their early results and get feedback and criticism before they finalise their paper to submit to a journal.

It’s often tempting to get the jump on the rest of the media by reporting on a paper published on a pre-press site, especially if it has an exciting finding. However, journalists should exercise caution, as these papers haven’t been through the peer-review process, so it’s harder to judge their quality. Some wild and hyperbolic claims also make it to pre-press outlets. So if a journalist is tempted by one, they should run it past a trusted expert first.

The next post in this series will deal with more practical considerations about about to pick a good science story, and how to cite sources properly.

Tim Dean, Editor, The Conversation

This article was originally published on The Conversation. Read the original article.

How we edit science part 1: the scientific method

Tim Dean, The Conversation

We take science seriously at The Conversation and we work hard to report it accurately. This series of five posts is adapted from an internal presentation on how to understand and edit science by our Australian Science & Technology Editor, Tim Dean. We thought you might also find it useful. The Conversation


Introduction

If I told you that science was a truth-seeking endeavour that uses a single robust method to prove scientific facts about the world, steadily and inexorably driving towards objective truth, would you believe me?

Many would. But you shouldn’t.

The public perception of science is often at odds with how science actually works. Science is often seen to be a separate domain of knowledge, framed to be superior to other forms of knowledge by virtue of its objectivity, which is sometimes referred to as it having a “view from nowhere”.

But science is actually far messier than this – and far more interesting. It is not without its limitations and flaws, but it’s still the most effective tool we have to understand the workings of the natural world around us.

In order to report or edit science effectively – or to consume it as a reader – it’s important to understand what science is, how the scientific method (or methods) work, and also some of the common pitfalls in practising science and interpreting its results.

This guide will give a short overview of what science is and how it works, with a more detailed treatment of both these topics in the final post in the series.

What is science?

Science is special, not because it claims to provide us with access to the truth, but because it admits it can’t provide truth.

Other means of producing knowledge, such as pure reason, intuition or revelation, might be appealing because they give the impression of certainty, but when this knowledge is applied to make predictions about the world around us, reality often finds them wanting.

Rather, science consists of a bunch of methods that enable us to accumulate evidence to test our ideas about how the world is, and why it works the way it does. Science works precisely because it enables us to make predictions that are borne out by experience.

Science is not a body of knowledge. Facts are facts, it’s just that some are known with a higher degree of certainty than others. What we often call “scientific facts” are just facts that are backed by the rigours of the scientific method, but they are not intrinsically different from other facts about the world.

What makes science so powerful is that it’s intensely self-critical. In order for a hypothesis to pass muster and enter a textbook, it must survive a battery of tests designed specifically to show that it could be wrong. If it passes, it has cleared a high bar.

The scientific method(s)

Despite what some philosophers have stated, there is a method for conducting science. In fact, there are many. And not all revolve around performing experiments.

One method involves simple observation, description and classification, such as in taxonomy. (Some physicists look down on this – and every other – kind of science, but they’re only greasing a slippery slope.)

Philosopher is out of frame on far right.
xkcd

However, when most of us think of The Scientific Method, we’re thinking of a particular kind of experimental method for testing hypotheses.

This begins with observing phenomena in the world around us, and then moves on to positing hypotheses for why those phenomena happen the way they do. A hypothesis is just an explanation, usually in the form of a causal mechanism: X causes Y. An example would be: gravitation causes the ball to fall back to the ground.

A scientific theory is just a collection of well-tested hypotheses that hang together to explain a great deal of stuff.

Crucially, a scientific hypothesis needs to be testable and falsifiable.

An untestable hypothesis would be something like “the ball falls to the ground because mischievous invisible unicorns want it to”. If these unicorns are not detectable by any scientific instrument, then the hypothesis that they’re responsible for gravity is not scientific.

An unfalsifiable hypothesis is one where no amount of testing can prove it wrong. An example might be the psychic who claims the experiment to test their powers of ESP failed because the scientific instruments were interfering with their abilities.

(Caveat: there are some hypotheses that are untestable because we choose not to test them. That doesn’t make them unscientific in principle, it’s just that they’ve been denied by an ethics committee or other regulation.)

Experimentation

There are often many hypotheses that could explain any particular phenomenon. Does the rock fall to the ground because an invisible force pulls on the rock? Or is it because the mass of the Earth warps spacetime, and the rock follows the lowest-energy path, thus colliding with the ground? Or is it that all substances have a natural tendency to fall towards the centre of the Universe, which happens to be at the centre of the Earth?

The trick is figuring out which hypothesis is the right one. That’s where experimentation comes in.

A scientist will take their hypothesis and use that to make a prediction, and they will construct an experiment to see if that prediction holds. But any observation that confirms one hypothesis will likely confirm several others as well. If I lift and drop a rock, it supports all three of the hypotheses on gravity above.

Furthermore, you can keep accumulating evidence to confirm a hypothesis, and it will never prove it to be absolutely true. This is because you can’t rule out the possibility of another similar hypothesis being correct, or of making some new observation that shows your hypothesis to be false. But if one day you drop a rock and it shoots off into space, that ought to cast doubt on all of the above hypotheses.

So while you can never prove a hypothesis true simply by making more confirmatory observations, you only one need one solid contrary observation to prove a hypothesis false. This notion is at the core of the hypothetico-deductive model of science.

This is why a great deal of science is focused on testing hypotheses, pushing them to their limits and attempting to break them through experimentation. If the hypothesis survives repeated testing, our confidence in it grows.

So even crazy-sounding theories like general relativity and quantum mechanics can become well accepted, because both enable very precise predictions, and these have been exhaustively tested and come through unscathed.

The next post will cover hypothesis testing in greater detail.

Tim Dean, Editor, The Conversation

This article was originally published on The Conversation. Read the original article.