Independence of events in real-life data
Most of statistical methods (if not all) rely on independence of events. How do we know that this assumption is valid in real-life problems like clinical trials or web crawling? What might be the consequences of statistical modelling of data which violate independence assumption, but we do not know about that?
machine-learning estimation inference independence bias
add a comment |
Most of statistical methods (if not all) rely on independence of events. How do we know that this assumption is valid in real-life problems like clinical trials or web crawling? What might be the consequences of statistical modelling of data which violate independence assumption, but we do not know about that?
machine-learning estimation inference independence bias
add a comment |
Most of statistical methods (if not all) rely on independence of events. How do we know that this assumption is valid in real-life problems like clinical trials or web crawling? What might be the consequences of statistical modelling of data which violate independence assumption, but we do not know about that?
machine-learning estimation inference independence bias
Most of statistical methods (if not all) rely on independence of events. How do we know that this assumption is valid in real-life problems like clinical trials or web crawling? What might be the consequences of statistical modelling of data which violate independence assumption, but we do not know about that?
machine-learning estimation inference independence bias
machine-learning estimation inference independence bias
edited 3 hours ago
kjetil b halvorsen
28.5k980208
28.5k980208
asked 4 hours ago
WoofDoggy
1213
1213
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
Often the question is the events independent? is the wrong question. The observation we want to analyze are represented in some model as random variables, and if we should model them as independent is a modeling decision.
A better question to ask is often: is the events exchangeable? This means that the random variables plays a symmetric role, there is apriori (given our state of knowledge) any reason to believe that, say, $X_1$ should probably be larger than $X_2$ or the opposite. This is typically the case in experiments, say, where the variables represents observations on randomly drawn people that we do not know much about (decidedly not to distinguish between them). Simple random sampling without replacement is a typical example which leads to exchangeability (but not independence).
The clue now is that there is a theorem, the deFinetti representation theorem, which says that exchangeable random variables can be represented as independent random variables conditional on a latent variable. You can take that latent variable as a parameter in some parametric model, which now is a typical IID model.$^dagger$
But say that you enlarge the experiment, instead of doing the experiment only with students from your class, you do it also with students from some other class at another university. Now, the complete sample is no longer exchangeable, because you might know there are some demographic differences between the student bodies, say. But the two subsamples are still separately exchangeable. But then, constructing a model which contains an indicator variable coding for university, the arguments above again leads to an IID model.
Conclusion: It is better to ask oneself: Are my random variables exchangeable? than asking about independence directly. A book taking this route to construction of statistical models (within the Bayesian paradigm) is Bernardo & Smith.
$^dagger$ There are some technical points we left out.
add a comment |
First, not all methods rely on independence - e.g. paired t-tests, repeated measure ANOVA, multilevel models, generalized estimating equations and a whole array of time series methods do not. In fact, they rely on the data not being independent.
Second, we don't usually know events are independent, but it often makes a lot of sense to assume they are, because there is no plausible source of dependence. Suppose, for example, I am studying the relationship between political preference and various demographics. If I survey a bunch of people and the people are at least roughly randomly selected from some population, it doesn't seem that there is any way there could be dependence: My political preferences (and their relation to my demographics) are not related to some other random person's.
On the other hand, if we were interested in the role of being a husband or being a wife, we might study married couples. Then the data would certainly be dependent and we would need to use methods that account for this.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f384965%2findependence-of-events-in-real-life-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Often the question is the events independent? is the wrong question. The observation we want to analyze are represented in some model as random variables, and if we should model them as independent is a modeling decision.
A better question to ask is often: is the events exchangeable? This means that the random variables plays a symmetric role, there is apriori (given our state of knowledge) any reason to believe that, say, $X_1$ should probably be larger than $X_2$ or the opposite. This is typically the case in experiments, say, where the variables represents observations on randomly drawn people that we do not know much about (decidedly not to distinguish between them). Simple random sampling without replacement is a typical example which leads to exchangeability (but not independence).
The clue now is that there is a theorem, the deFinetti representation theorem, which says that exchangeable random variables can be represented as independent random variables conditional on a latent variable. You can take that latent variable as a parameter in some parametric model, which now is a typical IID model.$^dagger$
But say that you enlarge the experiment, instead of doing the experiment only with students from your class, you do it also with students from some other class at another university. Now, the complete sample is no longer exchangeable, because you might know there are some demographic differences between the student bodies, say. But the two subsamples are still separately exchangeable. But then, constructing a model which contains an indicator variable coding for university, the arguments above again leads to an IID model.
Conclusion: It is better to ask oneself: Are my random variables exchangeable? than asking about independence directly. A book taking this route to construction of statistical models (within the Bayesian paradigm) is Bernardo & Smith.
$^dagger$ There are some technical points we left out.
add a comment |
Often the question is the events independent? is the wrong question. The observation we want to analyze are represented in some model as random variables, and if we should model them as independent is a modeling decision.
A better question to ask is often: is the events exchangeable? This means that the random variables plays a symmetric role, there is apriori (given our state of knowledge) any reason to believe that, say, $X_1$ should probably be larger than $X_2$ or the opposite. This is typically the case in experiments, say, where the variables represents observations on randomly drawn people that we do not know much about (decidedly not to distinguish between them). Simple random sampling without replacement is a typical example which leads to exchangeability (but not independence).
The clue now is that there is a theorem, the deFinetti representation theorem, which says that exchangeable random variables can be represented as independent random variables conditional on a latent variable. You can take that latent variable as a parameter in some parametric model, which now is a typical IID model.$^dagger$
But say that you enlarge the experiment, instead of doing the experiment only with students from your class, you do it also with students from some other class at another university. Now, the complete sample is no longer exchangeable, because you might know there are some demographic differences between the student bodies, say. But the two subsamples are still separately exchangeable. But then, constructing a model which contains an indicator variable coding for university, the arguments above again leads to an IID model.
Conclusion: It is better to ask oneself: Are my random variables exchangeable? than asking about independence directly. A book taking this route to construction of statistical models (within the Bayesian paradigm) is Bernardo & Smith.
$^dagger$ There are some technical points we left out.
add a comment |
Often the question is the events independent? is the wrong question. The observation we want to analyze are represented in some model as random variables, and if we should model them as independent is a modeling decision.
A better question to ask is often: is the events exchangeable? This means that the random variables plays a symmetric role, there is apriori (given our state of knowledge) any reason to believe that, say, $X_1$ should probably be larger than $X_2$ or the opposite. This is typically the case in experiments, say, where the variables represents observations on randomly drawn people that we do not know much about (decidedly not to distinguish between them). Simple random sampling without replacement is a typical example which leads to exchangeability (but not independence).
The clue now is that there is a theorem, the deFinetti representation theorem, which says that exchangeable random variables can be represented as independent random variables conditional on a latent variable. You can take that latent variable as a parameter in some parametric model, which now is a typical IID model.$^dagger$
But say that you enlarge the experiment, instead of doing the experiment only with students from your class, you do it also with students from some other class at another university. Now, the complete sample is no longer exchangeable, because you might know there are some demographic differences between the student bodies, say. But the two subsamples are still separately exchangeable. But then, constructing a model which contains an indicator variable coding for university, the arguments above again leads to an IID model.
Conclusion: It is better to ask oneself: Are my random variables exchangeable? than asking about independence directly. A book taking this route to construction of statistical models (within the Bayesian paradigm) is Bernardo & Smith.
$^dagger$ There are some technical points we left out.
Often the question is the events independent? is the wrong question. The observation we want to analyze are represented in some model as random variables, and if we should model them as independent is a modeling decision.
A better question to ask is often: is the events exchangeable? This means that the random variables plays a symmetric role, there is apriori (given our state of knowledge) any reason to believe that, say, $X_1$ should probably be larger than $X_2$ or the opposite. This is typically the case in experiments, say, where the variables represents observations on randomly drawn people that we do not know much about (decidedly not to distinguish between them). Simple random sampling without replacement is a typical example which leads to exchangeability (but not independence).
The clue now is that there is a theorem, the deFinetti representation theorem, which says that exchangeable random variables can be represented as independent random variables conditional on a latent variable. You can take that latent variable as a parameter in some parametric model, which now is a typical IID model.$^dagger$
But say that you enlarge the experiment, instead of doing the experiment only with students from your class, you do it also with students from some other class at another university. Now, the complete sample is no longer exchangeable, because you might know there are some demographic differences between the student bodies, say. But the two subsamples are still separately exchangeable. But then, constructing a model which contains an indicator variable coding for university, the arguments above again leads to an IID model.
Conclusion: It is better to ask oneself: Are my random variables exchangeable? than asking about independence directly. A book taking this route to construction of statistical models (within the Bayesian paradigm) is Bernardo & Smith.
$^dagger$ There are some technical points we left out.
answered 3 hours ago
kjetil b halvorsen
28.5k980208
28.5k980208
add a comment |
add a comment |
First, not all methods rely on independence - e.g. paired t-tests, repeated measure ANOVA, multilevel models, generalized estimating equations and a whole array of time series methods do not. In fact, they rely on the data not being independent.
Second, we don't usually know events are independent, but it often makes a lot of sense to assume they are, because there is no plausible source of dependence. Suppose, for example, I am studying the relationship between political preference and various demographics. If I survey a bunch of people and the people are at least roughly randomly selected from some population, it doesn't seem that there is any way there could be dependence: My political preferences (and their relation to my demographics) are not related to some other random person's.
On the other hand, if we were interested in the role of being a husband or being a wife, we might study married couples. Then the data would certainly be dependent and we would need to use methods that account for this.
add a comment |
First, not all methods rely on independence - e.g. paired t-tests, repeated measure ANOVA, multilevel models, generalized estimating equations and a whole array of time series methods do not. In fact, they rely on the data not being independent.
Second, we don't usually know events are independent, but it often makes a lot of sense to assume they are, because there is no plausible source of dependence. Suppose, for example, I am studying the relationship between political preference and various demographics. If I survey a bunch of people and the people are at least roughly randomly selected from some population, it doesn't seem that there is any way there could be dependence: My political preferences (and their relation to my demographics) are not related to some other random person's.
On the other hand, if we were interested in the role of being a husband or being a wife, we might study married couples. Then the data would certainly be dependent and we would need to use methods that account for this.
add a comment |
First, not all methods rely on independence - e.g. paired t-tests, repeated measure ANOVA, multilevel models, generalized estimating equations and a whole array of time series methods do not. In fact, they rely on the data not being independent.
Second, we don't usually know events are independent, but it often makes a lot of sense to assume they are, because there is no plausible source of dependence. Suppose, for example, I am studying the relationship between political preference and various demographics. If I survey a bunch of people and the people are at least roughly randomly selected from some population, it doesn't seem that there is any way there could be dependence: My political preferences (and their relation to my demographics) are not related to some other random person's.
On the other hand, if we were interested in the role of being a husband or being a wife, we might study married couples. Then the data would certainly be dependent and we would need to use methods that account for this.
First, not all methods rely on independence - e.g. paired t-tests, repeated measure ANOVA, multilevel models, generalized estimating equations and a whole array of time series methods do not. In fact, they rely on the data not being independent.
Second, we don't usually know events are independent, but it often makes a lot of sense to assume they are, because there is no plausible source of dependence. Suppose, for example, I am studying the relationship between political preference and various demographics. If I survey a bunch of people and the people are at least roughly randomly selected from some population, it doesn't seem that there is any way there could be dependence: My political preferences (and their relation to my demographics) are not related to some other random person's.
On the other hand, if we were interested in the role of being a husband or being a wife, we might study married couples. Then the data would certainly be dependent and we would need to use methods that account for this.
answered 4 hours ago
Peter Flom♦
74.2k11105202
74.2k11105202
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f384965%2findependence-of-events-in-real-life-data%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown